Latest from MIT Tech Review – Synthetic data for AI

Last year, researchers at Data Science Nigeria noted that engineers looking to train computer-vision algorithms could choose from a wealth of data sets featuring Western clothing, but there were none for African clothing. The team addressed the imbalance by using AI to generate artificial images of African fashion—a whole new data set from scratch.

Such synthetic data sets—computer-generated samples with the same statistical characteristics as the genuine article—are growing more and more common in the data-hungry world of machine learning. These fakes can be used to train AIs in areas where real data is scarce or too sensitive to use, as in the case of medical records or personal financial data.

The idea of synthetic data isn’t new: driverless cars have been trained on virtual streets. But in the last year the technology has become widespread, with a raft of startups and universities offering such services. Datagen and Synthesis AI, for example, supply digital human faces on demand. Others provide synthetic data for finance and insurance. And the Synthetic Data Vault, a project launched in 2021 by MIT’s Data to AI Lab, provides open-source tools for creating a wide range of data types.

This boom in synthetic data sets is driven by generative adversarial networks (GANs), a type of AI that is adept at generating realistic but fake examples, whether of images or medical records.

Proponents claim that synthetic data avoids the bias that is rife in many data sets. But it will only be as unbiased as the real data used to generate it. A GAN trained on fewer Black faces than white, for example, may be able to create a synthetic data set with a higher proportion of Black faces, but those faces may end up being less lifelike given the limited original data.

Join us March 29-30 at EmTech Digital, our signature AI conference, to hear Unity’s Danny Lange talk about how the video game maker is using synthetic data.

Artificial Intelligence

Latest from MIT Tech Review – A deep-learning algorithm could detect earthquakes by filtering out city noise

Cities are loud places. Traffic, trains, and machinery generate a lot of noise. While it’s a mere inconvenience much of the time, it can become a deadly problem when it comes to detecting earthquakes. That’s because it’s difficult to spot the telltale signal of an approaching earthquake in seismic sensor data amid the general human-generated…

Artificial Intelligence

Latest from MIT : Search algorithm reveals nearly 200 new kinds of CRISPR systems

Microbial sequence databases contain a wealth of information about enzymes and other molecules that could be adapted for biotechnology. But these databases have grown so large in recent years that they’ve become difficult to search efficiently for enzymes of interest. Now, scientists at the McGovern Institute for Brain Research at MIT, the Broad Institute of…

Artificial Intelligence

Latest from Google AI – Emerging practices for Society-Centered AI

Posted by Anoop Sinha, Research Director, Technology & Society, and Yossi Matias, Vice President, Google Research The first of Google’s AI Principles is to “Be socially beneficial.” As AI practitioners, we’re inspired by the transformative potential of AI technologies to benefit society and our shared environment at a scale and swiftness that wasn’t possible before….

Artificial Intelligence

Latest from MIT : MIT faculty, instructors, students experiment with generative AI in teaching and learning

How can MIT’s community leverage generative AI to support learning and work on campus and beyond? At MIT’s Festival of Learning 2024, faculty and instructors, students, staff, and alumni exchanged perspectives about the digital tools and innovations they’re experimenting with in the classroom. Panelists agreed that generative AI should be used to scaffold — not…

Artificial Intelligence

Latest from Google AI – Advances in private training for production on-device language models

Posted by Zheng Xu, Research Scientist, and Yanxiang Zhang, Software Engineer, Google Language models (LMs) trained to predict the next word given input text are the key technology for many applications [1, 2]. In Gboard, LMs are used to improve users’ typing experience by supporting features like next word prediction (NWP), Smart Compose, smart completion…

Artificial Intelligence

Latest from Google AI – Optimizing Airline Tail Assignments for Cleaner Skies

Posted by Emily Masten, Software Engineer, Google Research, Operations Research Team Airlines around the world are exploring several tactics to meet aggressive CO2 commitments set by the International Civil Aviation Organization (ICAO). This effort has been emphasized in Europe, where aviation accounts for 13.9% of the transportation industry’s carbon emissions. The largest push comes from…

Similar Posts