Latest from MIT Tech Review – Synthetic data for AI

Last year, researchers at Data Science Nigeria noted that engineers looking to train computer-vision algorithms could choose from a wealth of data sets featuring Western clothing, but there were none for African clothing. The team addressed the imbalance by using AI to generate artificial images of African fashion—a whole new data set from scratch.

Such synthetic data sets—computer-generated samples with the same statistical characteristics as the genuine article—are growing more and more common in the data-hungry world of machine learning. These fakes can be used to train AIs in areas where real data is scarce or too sensitive to use, as in the case of medical records or personal financial data.

The idea of synthetic data isn’t new: driverless cars have been trained on virtual streets. But in the last year the technology has become widespread, with a raft of startups and universities offering such services. Datagen and Synthesis AI, for example, supply digital human faces on demand. Others provide synthetic data for finance and insurance. And the Synthetic Data Vault, a project launched in 2021 by MIT’s Data to AI Lab, provides open-source tools for creating a wide range of data types.

This boom in synthetic data sets is driven by generative adversarial networks (GANs), a type of AI that is adept at generating realistic but fake examples, whether of images or medical records.

Proponents claim that synthetic data avoids the bias that is rife in many data sets. But it will only be as unbiased as the real data used to generate it. A GAN trained on fewer Black faces than white, for example, may be able to create a synthetic data set with a higher proportion of Black faces, but those faces may end up being less lifelike given the limited original data.

Join us March 29-30 at EmTech Digital, our signature AI conference, to hear Unity’s Danny Lange talk about how the video game maker is using synthetic data.

Artificial Intelligence

Latest from MIT Tech Review – What comes next for AI copyright lawsuits?

Last week, the technology companies Anthropic and Meta each won landmark victories in two separate court cases that examined whether or not the firms had violated copyright when they trained their large language models on copyrighted books without permission. The rulings are the first we’ve seen to come out of copyright cases of this kind….

Artificial Intelligence

Latest from Google AI – Making ML models differentially private: Best practices and open challenges

Posted by Natalia Ponomareva and Alex Kurakin, Staff Software Engineers, Google Research Large machine learning (ML) models are ubiquitous in modern applications: from spam filters to recommender systems and virtual assistants. These models achieve remarkable performance partially due to the abundance of available training data. However, these data can sometimes contain private information, including personal…

Artificial Intelligence

O’Reilly Media – What Developers Actually Need to Know Right Now

The following article includes clips from a recent Live with Tim O’Reilly interview. You can watch the full version on the O’Reilly Media learning platform. Addy Osmani is one of my favorite people to talk with about the state of software engineering with AI. He spent 14 years leading Chrome’s developer experience team at Google,…

Artificial Intelligence

Latest from MIT Tech Review – Defense official reveals how AI chatbots could be used for targeting decisions

The US military might use generative AI systems to rank lists of targets and make recommendations about which to strike first, which would then be vetted by humans, according to a Defense official with knowledge of the matter. The disclosure about how the military may use AI chatbots comes as the Pentagon faces scrutiny over…

Artificial Intelligence

Latest from MIT Tech Review – The algorithms around us

A metronome ticks. A record spins. And as a feel-good pop track plays, a giant compactor slowly crushes a Jenga tower of material creations. Paint cans burst. Chess pieces topple. Camera lenses shatter. An alarm clock shrills and then goes silent. A guitar neck snaps. Even a toy emoji is not spared, its eyes popping…

Artificial Intelligence

Latest from MIT : Engineering household robots to have a little common sense

From wiping up spills to serving up food, robots are being taught to carry out increasingly complicated household tasks. Many such home-bot trainees are learning through imitation; they are programmed to copy the motions that a human physically guides them through. It turns out that robots are excellent mimics. But unless engineers also program them…

Similar Posts