Latest from MIT : Scaling audio-visual learning without labels

Researchers from MIT, the MIT-IBM Watson AI Lab, IBM Research, and elsewhere have developed a new technique for analyzing unlabeled audio and visual data that could improve the performance of machine-learning models used in applications like speech recognition and object detection. The work, for the first time, combines two architectures of self-supervised learning, contrastive learning…

Latest from MIT Tech Review – EmTech Next is happening June 13-15

EmTech Next, MIT Technology Review’s signature digital transformation conference, is June 13-15, 2023. This year’s event looks at the game-changing power of generative AI, the technology, and the legal implications of generated content. Leaders from OpenAI, Google, Meta, NVIDIA, and more are expected to discuss the future of AI. Join online June 13-15, 2023

Latest from Google AI – AVFormer: Injecting vision into frozen speech models for zero-shot AV-ASR

Posted by Arsha Nagrani and Paul Hongsuck Seo, Research Scientists, Google Research Automatic speech recognition (ASR) is a well-established technology that is widely adopted for various applications such as conference calls, streamed video transcription and voice commands. While the challenges for this technology are centered around noisy audio inputs, the visual stream in multimodal videos…

Latest from MIT Tech Review – Welcome to the new surreal. How AI-generated video is changing film.

The Frost nails its uncanny, disconcerting vibe in its first few shots. Vast icy mountains, a makeshift camp of military-style tents, a group of people huddled around a fire, barking dogs. It’s familiar stuff, yet weird enough to plant a growing seed of dread. There’s something wrong here. “Pass me the tail,” someone says. Cut…

Latest from Google AI – Retrieval-augmented visual-language pre-training

Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team Large-scale models, such as T5, GPT-3, PaLM, Flamingo and PaLI, have demonstrated the ability to store substantial amounts of knowledge when scaled to tens of billions of parameters and trained on large text and image datasets. These models achieve state-of-the-art…

Latest from Google AI – Large sequence models for software development activities

Posted by Petros Maniatis and Daniel Tarlow, Research Scientists, Google Software isn’t created in one dramatic step. It improves bit by bit, one little step at a time — editing, running unit tests, fixing build errors, addressing code reviews, editing some more, appeasing linters, and fixing more errors — until finally it becomes good enough…

Latest from MIT : New tool helps people choose the right method for evaluating AI models

When machine-learning models are deployed in real-world situations, perhaps to flag potential disease in X-rays for a radiologist to review, human users need to know when to trust the model’s predictions. But machine-learning models are so large and complex that even the scientists who design them don’t understand exactly how the models make predictions. So,…

Latest from MIT : A more effective way to train machines for uncertain, real-world situations

Someone learning to play tennis might hire a teacher to help them learn faster. Because this teacher is (hopefully) a great tennis player, there are times when trying to exactly mimic the teacher won’t help the student learn. Perhaps the teacher leaps high into the air to deftly return a volley. The student, unable to…

Latest from Google AI – Foundation models for reasoning on charts

Posted by Julian Eisenschlos, Research Software Engineer, Google Research Visual language is the form of communication that relies on pictorial symbols outside of text to convey information. It is ubiquitous in our digital life in the form of iconography, infographics, tables, plots, and charts, extending to the real world in street signs, comic books, food…