Latest from IBM Developer : Create a web app to interact with machine learning generated image captions

Summary

The introduction of the IBM Model Asset eXchange (MAX) that is hosted on the Machine Learning eXchange has given application developers without data science experience easy access to prebuilt machine learning models. This code pattern shows how simple it can be to create a web app that utilizes a MAX model. The web app uses the Image Caption Generator from MAX and creates a simple web UI that lets you filter images based on the descriptions given by the model.

Description

Every day 2.5 quintillion bytes of data are created, based on an IBM study. A lot of that data is unstructured data, such as large texts, audio recordings, and images. To do something useful with the data, you must first convert it into structured data.

This code pattern uses one of the models from the Model Asset Exchange, an exchange where developers can find and experiment with open source deep learning models. Specifically, it uses the Image Caption Generator to create a web application that captions images and lets you filter through images-based image content. The web application provides an interactive user interface that is backed by a lightweight Python server using Tornado. The server takes in images through the UI, sends them to a REST endpoint for the model, and displays the generated captions on the UI. The model’s REST endpoint is set up using the Docker image provided on MAX. The web UI displays the generated captions for each image as well as an interactive word cloud to filter images based on their caption.

When you have completed this code pattern, you understand how to:

Deploy a deep learning model with a REST endpoint
Generate captions for an image using the MAX Model’s REST API
Run a web application that uses the model’s REST API

Flow

The server sends default images to the Model API and receives caption data.
The user interacts with the Web UI that contains the default content and uploads the images.
The web UI requests caption data for the images from the server and updates the content when the data is returned.
The server sends the images to the Model API and receives caption data to return to the web UI.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.

Artificial Intelligence

Latest from MIT : 3 Questions: Inverting the problem of design

The process of computational design in mechanical engineering often begins with a problem or a goal, followed by an assessment of literature, resources, and systems available to address the issue. The Design Computation and Digital Engineering (DeCoDE) Lab at MIT instead explores the bounds of what is possible. Working with the MIT-IBM Watson AI Lab,…

Artificial Intelligence

Latest from MIT Tech Review – How to build a better AI benchmark

It’s not easy being one of Silicon Valley’s favorite benchmarks. SWE-Bench (pronounced “swee bench”) launched in November 2024 to evaluate an AI model’s coding skill, using more than 2,000 real-world programming problems pulled from the public GitHub repositories of 12 different Python-based projects. In the months since then, it’s quickly become one of the most…

Artificial Intelligence

Latest from MIT Tech Review – This is the reason Demis Hassabis started DeepMind

In March 2016 Demis Hassabis, CEO and cofounder of DeepMind, was in Seoul, South Korea, watching his company’s AI make history. AlphaGo, a computer program trained to master the ancient board game Go, played a five-game match against Lee Sedol, a top Korean pro with the second-highest number of international championship wins to his name…

Artificial Intelligence

Latest from MIT : Bringing meaning into technology deployment

In 15 TED Talk-style presentations, MIT faculty recently discussed their pioneering research that incorporates social, ethical, and technical considerations and expertise, each supported by seed grants established by the Social and Ethical Responsibilities of Computing (SERC), a cross-cutting initiative of the MIT Schwarzman College of Computing. The call for proposals last summer was met with…

Artificial Intelligence

Latest from MIT Tech Review – Google DeepMind is making its AI text watermark open source

Google DeepMind has developed a tool for identifying AI-generated text and is making it available open source. The tool, called SynthID, is part of a larger family of watermarking tools for generative AI outputs. The company unveiled a watermark for images last year, and it has since rolled out one for AI-generated video. In May,…

Artificial Intelligence

Latest from Google AI – MediaPipe FaceStylizer: On-device real-time few-shot face stylization

Posted by Haolin Jia, Software Engineer, and Qifei Wang, Senior Software Engineer, Core ML In recent years, we have witnessed rising interest across consumers and researchers in integrated augmented reality (AR) experiences using real-time face feature generation and editing functions in mobile applications, including short videos, virtual reality, and gaming. As a result, there is…

Summary

Description

Flow

Instructions

Similar Posts