What is training data
How AI researchers obtain the necessary training data
Large amounts of data are required to train artificial intelligence (AI) algorithms. Because they are not always available in the right form, researchers work with detours. At the Conference on Empirical Methods in NLP in early November, experts presented a wide range of research results in the field of natural language processing based on sophisticated concepts for data collection. That reports Technology Review online in “Tricks for collecting data”.
Microsoft researchers, for example, wanted better data for evaluating utterances in “mixed code”, ie alternating two languages. For example, “Spenglisch”, a mixture of Spanish and English, occurs frequently in the real world, but rarely in written texts. So the researchers entered English texts into a Spanish translation machine and pasted parts of the result back into the original - and they had as much Spenglish as they wanted.
AI researchers at Google, on the other hand, tried to automatically break long sentences into several short sentences with the same meaning so that they are easier to understand. They use Wikipedia as a data source for this - the editing history of the online encyclopedia contains plenty of examples of linguistic improvements through shorter sentences with the same content. The result of this evaluation was 60 times more examples of split sentences with 90 times more words than the previous references for this task. When the researchers trained a machine learning model with their new data, it was 91 percent accurate.
More about this at Technology Review online:
(sma)Read comments (15) Go to homepage
- Pointers for the first semester are important for placement
- Deep counterfeiting is illegal
- Why did byte magazine die
- Why is the Bitcoin price changing?
- How big is the LHC
- Which actor has the most beautiful eyes
- Is perspiration through lenses in plants possible?
- What is meant by conical hills
- IV antibiotics spare the gut microbiome
- Intel falls behind the TSMC
- Who is the owner of Bombardier
- Are cooked carrots good for dogs?
- Should I continue C ++ or start Python
- What is a federal prison
- Are there any good ones in Tolkien's Legendarium? Orcs
- Which companies are hiring freshman MBA students?
- What is the best time travel series
- How do I win Satta Matka
- How do fraudsters get away with it
- What is the best alternative to running
- Should you be wearing a mug while wrestling
- Are all organic materials flammable
- Is the beauty and the beast worth seeing
- Do you really need wings to fly?