Latest from IBM Developer : A Python Flask audio search application

Note: This code pattern uses Watson Discovery V1 and will not work with Discovery V2. However, you can still use it to learn the Discovery features. Future plans include updating the code pattern to work with Discovery V2.

Summary

This code pattern explains how to create an application that you can use to search for a topic within video and audio files.

Description

While listening to a podcast or to video or audio files of courses, you often want to jump directly to the topic rather than listening to extraneous information. However, finding the topics and keywords in the entire recording can be challenging.

In this code pattern, create an application that you can use to search within the video or audio files. With the app, not only can you search, but you can also highlight the text where the search string or topic occurs in the file. The code pattern performs a natural language query search in audio files, and returns the results with the proper timeframe where your search topic is being discussed. This example uses an IBM® Watson Machine Learning introduction video to illustrate the process.

When you have completed the code pattern, you understand how to:

Prepare audio and video data and perform chunking to break it into smaller chunks to work with
Work with the Watson Speech to Text service through API calls to convert audio or video to text
Work with the Watson Discovery service through API calls to perform a search on text chunks
Create a Python Flask application and deploy it on IBM Cloud.

Related work from others:  Latest from Google AI - Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Flow

The user uploads the video or audio file on the UI.
The video or audio file is processed with the moviepy and pydub Python libraries, and is chunked to create smaller chunks to work with.
The user interacts with the Watson Speech to Text service through the provided application UI. The audio chunks are converted into text chunks with Watson Speech to Text.
The text chunks are uploaded on Watson Discovery by calling Watson Discovery APIs with Python SDKs.
The user performs a search query using Watson Discovery.
The results are shown on the UI.

Instructions

Get detailed steps in the readme file. Those steps show how to:

Clone the GitHub repository.
Create the Watson Speech to Text service.
Create a Watson Discovery instance.
Run the application locally.

Similar Posts