Latest from IBM Developer : Create a web app to interact with machine learning generated image captions

Summary

The introduction of the IBM Model Asset eXchange (MAX) that is hosted on the Machine Learning eXchange has given application developers without data science experience easy access to prebuilt machine learning models. This code pattern shows how simple it can be to create a web app that utilizes a MAX model. The web app uses the Image Caption Generator from MAX and creates a simple web UI that lets you filter images based on the descriptions given by the model.

Description

Every day 2.5 quintillion bytes of data are created, based on an IBM study. A lot of that data is unstructured data, such as large texts, audio recordings, and images. To do something useful with the data, you must first convert it into structured data.

This code pattern uses one of the models from the Model Asset Exchange, an exchange where developers can find and experiment with open source deep learning models. Specifically, it uses the Image Caption Generator to create a web application that captions images and lets you filter through images-based image content. The web application provides an interactive user interface that is backed by a lightweight Python server using Tornado. The server takes in images through the UI, sends them to a REST endpoint for the model, and displays the generated captions on the UI. The model’s REST endpoint is set up using the Docker image provided on MAX. The web UI displays the generated captions for each image as well as an interactive word cloud to filter images based on their caption.

When you have completed this code pattern, you understand how to:

Deploy a deep learning model with a REST endpoint
Generate captions for an image using the MAX Model’s REST API
Run a web application that uses the model’s REST API

Flow

The server sends default images to the Model API and receives caption data.
The user interacts with the Web UI that contains the default content and uploads the images.
The web UI requests caption data for the images from the server and updates the content when the data is returned.
The server sends the images to the Model API and receives caption data to return to the web UI.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.

Artificial Intelligence

Latest from MIT Tech Review – Anthropic’s new hybrid AI model can work on tasks autonomously for hours at a time

Anthropic has announced two new AI models that it claims represent a major step toward making AI agents truly useful. AI agents trained on Claude Opus 4, the company’s most powerful model to date, raise the bar for what such systems are capable of by tackling difficult tasks over extended periods of time and responding…

Artificial Intelligence

Latest from MIT Tech Review – Four ways to protect your art from AI

MIT Technology Review’s How To series helps you get things done. Since the start of the generative AI boom, artists have been worried about losing their livelihoods to AI tools. There have been plenty of examples of companies’ replacing human labor with computer programs. Most recently, Coca-Cola sparked controversy by creating a new Christmas ad…

Artificial Intelligence

Latest from Google AI – RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers

Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary…

Artificial Intelligence

Latest from Google AI – Guiding Frozen Language Models with Learned Soft Prompts

Posted by Brian Lester, AI Resident and Noah Constant, Senior Staff Software Engineer, Google Research Large pre-trained language models, which are continuing to grow in size, achieve state-of-art results on many natural language processing (NLP) benchmarks. Since the development of GPT and BERT, standard practice has been to fine-tune models on downstream tasks, which involves…

Artificial Intelligence

Latest from MIT Tech Review – It’s high time for more AI transparency

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. That was fast. In less than a week since Meta launched its AI model, LLaMA 2, startups and researchers have already used it to develop a chatbot and an AI assistant. It will be only a matter…

Artificial Intelligence

Latest from MIT Tech Review – Chatbots are surprisingly effective at debunking conspiracy theories

It’s become a truism that facts alone don’t change people’s minds. Perhaps nowhere is this more clear than when it comes to conspiracy theories: Many people believe that you can’t talk conspiracists out of their beliefs. But that’s not necessarily true. It turns out that many conspiracy believers do respond to evidence and arguments—information that…

Summary

Description

Flow

Instructions

Similar Posts