From their early days at MIT, and even before, Emma Liu ’22, MNG ’22, Yo-whan “John” Kim ’22, MNG ’22, and Clemente Ocejo ’21, MNG ’22 knew they wanted to perform computational research and explore artificial intelligence and machine learning. “Since high school, I’ve been into deep learning and was involved in projects,” says Kim, who participated in a Research Science Institute (RSI) summer program at MIT and Harvard University and went on to work on action recognition in videos using Microsoft’s Kinect.

As students in the Department of Electrical Engineering and Computer Science who recently graduated from the Master of Engineering (MEng) Thesis Program, Liu, Kim, and Ocejo have developed the skills to help guide application-focused projects. Working with the MIT-IBM Watson AI Lab, they have improved text classification with limited labeled data and designed machine-learning models for better long-term forecasting for product purchases. For Kim, “it was a very smooth transition and … a great opportunity for me to continue working in the field of deep learning and computer vision in the MIT-IBM Watson AI Lab.”

Modeling video

Collaborating with researchers from academia and industry, Kim designed, trained, and tested a deep learning model for recognizing actions across domains — in this case, video. His team specifically targeted the use of synthetic data from generated videos for training and ran prediction and inference tasks on real data, which is composed of different action classes. They wanted to see how pre-training models on synthetic videos, particularly simulations of, or game engine-generated, humans or humanoid actions stacked up to real data: publicly available videos scraped from the internet.

The reason for this research, Kim says, is that real videos can have issues, including representation bias, copyright, and/or ethical or personal sensitivity, e.g., videos of a car hitting people would be difficult to collect, or the use of people’s faces, real addresses, or license plates without consent. Kim is running experiments with 2D, 2.5D, and 3D video models, with the goal of creating domain-specific or even a large, general, synthetic video dataset that can be used for some transfer domains, where data are lacking. For instance, for applications to the construction industry, this could include running its action recognition on a building site. “I didn’t expect synthetically generated videos to perform on par with real videos,” he says. “I think that opens up a lot of different roles [for the work] in the future.”

Related work from others:  Latest from MIT : Scientists use generative AI to answer complex questions in physics

Despite a rocky start to the project gathering and generating data and running many models, Kim says he wouldn’t have done it any other way. “It was amazing how the lab members encouraged me: ‘It’s OK. You’ll have all the experiments and the fun part coming. Don’t stress too much.’” It was this structure that helped Kim take ownership of the work. “At the end, they gave me so much support and amazing ideas that help me carry out this project.”

Data labeling

Data scarcity was also a theme of Emma Liu’s work. “The overarching problem is that there’s all this data out there in the world, and for a lot of machine learning problems, you need that data to be labeled,” says Liu, “but then you have all this unlabeled data that’s available that you’re not really leveraging.”

Liu, with direction from her MIT and IBM group, worked to put that data to use, training text classification semi-supervised models (and combining aspects of them) to add pseudo labels to the unlabeled data, based on predictions and probabilities about which categories each piece of previously unlabeled data fits into. “Then the problem is that there’s been prior work that’s shown that you can’t always trust the probabilities; specifically, neural networks have been shown to be overconfident a lot of the time,” Liu points out.

Liu and her team addressed this by evaluating the accuracy and uncertainty of the models and recalibrated them to improve her self-training framework. The self-training and calibration step allowed her to have better confidence in the predictions. This pseudo labeled data, she says, could then be added to the pool of real data, expanding the dataset; this process could be repeated in a series of iterations.

Related work from others:  Latest from MIT Tech Review - Meta has created a way to watermark AI-generated speech

For Liu, her biggest takeaway wasn’t the product, but the process. “I learned a lot about being an independent researcher,” she says. As an undergraduate, Liu worked with IBM to develop machine learning methods to repurpose drugs already on the market and honed her decision-making ability. After collaborating with academic and industry researchers to acquire skills to ask pointed questions, seek out experts, digest and present scientific papers for relevant content, and test ideas, Liu and her cohort of MEng students working with the MIT-IBM Watson AI Lab felt they had confidence in their knowledge, freedom, and flexibility to dictate their own research’s direction. Taking on this key role, Liu says, “I feel like I had ownership over my project.”

Demand forecasting

After his time at MIT and with the MIT-IBM Watson AI Lab, Clemente Ocejo also came away with a sense of mastery, having built a strong foundation in AI techniques and timeseries methods beginning with his MIT Undergraduate Research Opportunities Program (UROP), where he met his MEng advisor. “You really have to be proactive in decision-making,” says Ocejo, “vocalizing it [your choices] as the researcher and letting people know that this is what you’re doing.”

Ocejo used his background in traditional timeseries methods for a collaboration with the lab, applying deep learning to better predict product demand forecasting in the medical field. Here, he designed, wrote, and trained a transformer, a specific machine learning model, which is typically used in natural-language processing and has the ability to learn very long-term dependencies. Ocejo and his team compared target forecast demands between months, learning dynamic connections and attention weights between product sales within a product family. They looked at identifier features, concerning the price and amount, as well as account features about who is purchasing the items or services. 

Related work from others:  Latest from Google AI - Predicting Text Readability from Scrolling Interactions

“One product does not necessarily impact the prediction made for another product in the moment of prediction. It just impacts the parameters during training that lead to that prediction,” says Ocejo. “Instead, we wanted to make it have a little more of a direct impact, so we added this layer that makes this connection and learns attention between all of the products in our dataset.”

In the long run, over a one-year prediction, MIT-IBM Watson AI Lab group was able to outperform the current model; more impressively, it did so in the short run (close to a fiscal quarter). Ocejo attributes this to the dynamic of his interdisciplinary team. “A lot of the people in my group were not necessarily very experienced in the deep learning aspect of things, but they had a lot of experience in the supply chain management, operations research, and optimization side, which is something that I don’t have that much experience in,” says Ocejo. “They were giving a lot of good high-level feedback of what to tackle next and … and knowing what the field of industry wanted to see or was looking to improve, so it was very helpful in streamlining my focus.”

For this work, a deluge of data didn’t make the difference for Ocejo and his team, but rather its structure and presentation. Oftentimes, large deep learning models require millions and millions of data points in order to make meaningful inferences; however, the MIT-IBM Watson AI Lab group demonstrated that outcomes and technique improvements can be application-specific. “It just shows that these models can learn something useful, in the right setting, with the right architecture, without needing an excess amount of data,” says Ocejo. “And then with an excess amount of data, it’ll only get better.”

Similar Posts