Latest from IBM Developer : Build a web app that recognizes yoga poses using a model from the Model Asset eXchange hosted on the Machine Learning eXchange

Summary

This IBM Developer code pattern explains how to detect human poses in a given image by using the Human Pose Estimator model from the Model Asset eXchange that is hosted on the Machine Learning eXchange. Using coordinates, the pose lines that are created by the model are assembled into full body poses for each of the humans that are detected in the image.

Description

The Human Pose Estimator model detects humans and their poses in a given image. The model first detects the human in the input image and then identifies the body parts, including the nose, neck, eyes, shoulders, elbows, wrists, hips, knees, and ankles. Next, each pair of associated body parts is connected by a “pose line,” as shown in the following image. A line might connect the left eye to the nose, while another might connect the nose to the neck.

Each pose line is represented by a list [x1, y1, x2, y2], where the first pair of coordinates (x1, y1) is the starting
point of the line for one body part, while the second pair of coordinates (x2, y2) is the ending point of the line for the
other associated body part. The pose lines are assembled into full body poses for each of the humans detected in the
image.

The model is based on the TF implementation of the OpenPose model. The code in this repository deploys the model as a web service in a Docker container.

Yogait, a yoga assistant that uses the Human Pose Estimator MAX Model to guess which yoga pose a user is performing, uses a pre-trained SVM to classify poses. Instead of using the Cartesian lines that the MAX model returns, Yogait uses a Polar representation to perform classification. This was done to make it much easier to classify poses. Instead of training the SVM on an x-y coordinate system, which would require translation and rotation when augmenting data, the polar representation relies only upon the location of the joints relative to the center of the estimated model.

The [x,y] coordinates are converted to [phi, rho] for each joint.

The SVM performs classification on a flattened version of the polar vectors. Compared to a Cartesian representation, this polar representation uses little data and can perform classification on a human in any part of a captured frame. If the Cartesian representation was to be used, then you would have to perform all of the poses in the center of the camera frame.

When you have completed the code pattern, you’ll understand how to:

Build a Docker image of the Human Pose Estimator MAX Model
Deploy a deep learning model with a REST endpoint
Generate a pose estimation for a person in a frame of video using the MAX Model’s REST API
Run a web application that uses the model’s REST API

Flow

The server sends the captured video frame-by-frame from the webcam to the model API.
The web UI requests the pose lines estimated for the frame from the server.
The server receives data from the model API and updates the result to the web UI.

Instructions

Find the detailed steps for this pattern in the README. Those steps show you how to:

Set up the MAX model.
Start the web app.
Run locally with a Python script.
Use a Jupyter Notebook.

Artificial Intelligence

Latest from Google AI – Language Models Perform Reasoning via Chain of Thought

Posted by Jason Wei and Denny Zhou, Research Scientists, Google Research, Brain team In recent years, scaling up the size of language models has been shown to be a reliable way to improve performance on a range of natural language processing (NLP) tasks. Today’s language models at the scale of 100B or more parameters achieve…

Artificial Intelligence

UC Berkeley – The Shift from Models to Compound AI Systems

AI caught everyone’s attention in 2023 with Large Language Models (LLMs) that can be instructed to perform general tasks, such as translation or coding, just by prompting. This naturally led to an intense focus on models as the primary ingredient in AI application development, with everyone wondering what capabilities new LLMs will bring. As more…

Artificial Intelligence

O’Reilly Media – The Sens-AI Framework: Teaching Developers to Think with AI

Developers are doing incredible things with AI. Tools like Copilot, ChatGPT, and Claude have rapidly become indispensable for developers, offering unprecedented speed and efficiency in tasks like writing code, debugging tricky behavior, generating tests, and exploring unfamiliar libraries and frameworks. When it works, it’s effective, and it feels incredibly satisfying. But if you’ve spent any…

Artificial Intelligence

Latest from Google AI – Real-time tracking of wildfire boundaries using satellite imagery

Posted by Zvika Ben-Haim and Omer Nevo, Software Engineers, Google Research As global temperatures rise, wildfires around the world are becoming more frequent and more dangerous. Their effects are felt by many communities as people evacuate their homes or suffer harm even from proximity to the fire and smoke. As part of Google’s mission to…

Artificial Intelligence

Latest from Google AI – Emerging practices for Society-Centered AI

Posted by Anoop Sinha, Research Director, Technology & Society, and Yossi Matias, Vice President, Google Research The first of Google’s AI Principles is to “Be socially beneficial.” As AI practitioners, we’re inspired by the transformative potential of AI technologies to benefit society and our shared environment at a scale and swiftness that wasn’t possible before….

Artificial Intelligence

O’Reilly Media – Beyond Imitation

The first AI image generation model I got to play around with was Midjourney v2 in summer 2022. A month earlier, OpenAI had launched DALL-E 2 in beta, and the results looked unbelievably magical. You could generate images in any art style simply by prompting an AI with the name of an artist. I didn’t…

Summary

Description

Flow

Instructions

Similar Posts