Latest from IBM Developer : Create a web app to interact with machine learning generated image captions

Summary

The introduction of the IBM Model Asset eXchange (MAX) that is hosted on the Machine Learning eXchange has given application developers without data science experience easy access to prebuilt machine learning models. This code pattern shows how simple it can be to create a web app that utilizes a MAX model. The web app uses the Image Caption Generator from MAX and creates a simple web UI that lets you filter images based on the descriptions given by the model.

Description

Every day 2.5 quintillion bytes of data are created, based on an IBM study. A lot of that data is unstructured data, such as large texts, audio recordings, and images. To do something useful with the data, you must first convert it into structured data.

This code pattern uses one of the models from the Model Asset Exchange, an exchange where developers can find and experiment with open source deep learning models. Specifically, it uses the Image Caption Generator to create a web application that captions images and lets you filter through images-based image content. The web application provides an interactive user interface that is backed by a lightweight Python server using Tornado. The server takes in images through the UI, sends them to a REST endpoint for the model, and displays the generated captions on the UI. The model’s REST endpoint is set up using the Docker image provided on MAX. The web UI displays the generated captions for each image as well as an interactive word cloud to filter images based on their caption.

When you have completed this code pattern, you understand how to:

Deploy a deep learning model with a REST endpoint
Generate captions for an image using the MAX Model’s REST API
Run a web application that uses the model’s REST API

Flow

The server sends default images to the Model API and receives caption data.
The user interacts with the Web UI that contains the default content and uploads the images.
The web UI requests caption data for the images from the server and updates the content when the data is returned.
The server sends the images to the Model API and receives caption data to return to the web UI.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.

Artificial Intelligence

Latest from MIT Tech Review – Who’s going to save us from bad AI?

To receive The Algorithm in your inbox every Monday, sign up here. Welcome to the Algorithm! About damn time. That was the response from AI policy and ethics wonks to news last week that the Office of Science and Technology Policy, the White House’s science and technology advisory agency, had unveiled an AI Bill of Rights. The…

Artificial Intelligence

Latest from MIT : Is AI in the eye of the beholder?

Someone’s prior beliefs about an artificial intelligence agent, like a chatbot, have a significant effect on their interactions with that agent and their perception of its trustworthiness, empathy, and effectiveness, according to a new study. Researchers from MIT and Arizona State University found that priming users — by telling them that a conversational AI agent…

Artificial Intelligence

Latest from MIT Tech Review – A tiny new open-source AI model performs as well as powerful big ones

The Allen Institute for Artificial Intelligence (Ai2), a research nonprofit, is releasing a family of open-source multimodal language models, called Molmo, that it says perform as well as top proprietary models from OpenAI, Google, and Anthropic. The organization claims that its biggest Molmo model, which has 72 billion parameters, outperforms OpenAI’s GPT-4o, which is estimated…

Artificial Intelligence

Latest from MIT : Study reveals AI chatbots can detect race, but racial bias reduces response empathy

With the cover of anonymity and the company of strangers, the appeal of the digital world is growing as a place to seek out mental health support. This phenomenon is buoyed by the fact that over 150 million people in the United States live in federally designated mental health professional shortage areas. “I really need your…

Artificial Intelligence

Latest from MIT : An optimized solution for face recognition

The human brain seems to care a lot about faces. It’s dedicated a specific area to identifying them, and the neurons there are so good at their job that most of us can readily recognize thousands of individuals. With artificial intelligence, computers can now recognize faces with a similar efficiency — and neuroscientists at MIT’s…

Artificial Intelligence

Latest from MIT Tech Review – Google DeepMind’s new generative model makes Super Mario-like games from scratch

OpenAI’s recent reveal of its stunning generative model Sora pushed the envelope of what’s possible with text-to-video. Now Google DeepMind brings us text-to-video games. The new model, called Genie, can take a short description, a hand-drawn sketch or a photo and turn it into a playable video game in the style of classic 2D platformers…

Summary

Description

Flow

Instructions

Similar Posts