Latest from IBM Developer : Develop, train, and deploy a spam filter model on Hortonworks Data Platform using Watson Studio Local

Watson Studio Local is now part of IBM Cloud Pak for Data. Learn more Cloud Pak for Data.

Summary

This code pattern demonstrates how data scientists can leverage remote Spark clusters and compute environments to train and deploy a spam filter model. The model is built using natural language processing and machine learning algorithms and is used to classify whether a given text message is spam or not.

Description

This code pattern is a demonstration of how data scientists can leverage remote Spark clusters and compute environments from Hortonworks Data Platform (HDP) to train and deploy a spam filter model using Watson Studio Local

A spam filter is a classification model built using natural language processing and machine learning algorithms. The model is trained on an SMS spam collection dataset to classify whether a given text message is spam, or ham (not spam).

This code pattern provides multiple examples to tackle this problem, utilizing both local (Watson Studio Local) and remote (HDP cluster) resources.

After completing this code pattern, you’ll understand how to:

Load data into Spark DataFrames and use Spark’s machine learning library (MLlib) to develop, train and deploy the Spam Filter Model.
Load the data into pandas DataFrames and use Scikit-learn machine learning library to develop, train and deploy the Spam Filter Model.
Use the sparkmagics library to connect to the remote Spark service in the HDP cluster via the Hadoop Integration service.
Use the sparkmagics library to push the python virtual environment containing the Scikit-learn library to the remote HDP cluster via the Hadoop Integration service.
Package the Spam Filter model as a python egg and distribute the egg to the remote HDP cluster via the Hadoop Integration service.
Run the Spam Filter Model (both PySpark and Scikit-learn versions) in the remote HDP cluster utilizing the remote Spark context and the remote python virtual environment, all from within IBM Watson Studio Local.
Save the Spam Filter Model in the remote HDP cluster and import it back to Watson Studio Local and batch score, and evaluate the model.

Flow

The spam collection data set is loaded into Watson Studio Local as an asset.
The user interacts with the Jupyter notebooks by running them in Watson Studio Local.
Watson Studio Local can either use the resources available locally or utilize HDP cluster resources by connecting to Apache Livy, which is a part of the Hadoop Integration service.
Livy connects with the HDP cluster to run Apache Spark or access HDFS files.

Instructions

Get the detailed instructions in the README file. These steps will show you how to:

Clone the repo.
Create project in IBM Watson Studio Local.
Create project assets.
Commit changes to Watson Studio Local Master Repository.
Run the notebooks listed in each example.

Artificial Intelligence

Latest from MIT : MIT group releases white papers on governance of AI

Providing a resource for U.S. policymakers, a committee of MIT leaders and scholars has released a set of policy briefs that outlines a framework for the governance of artificial intelligence. The approach includes extending current regulatory and liability approaches in pursuit of a practical way to oversee AI. The aim of the papers is to…

Artificial Intelligence

Latest from MIT : When MIT’s interdisciplinary NEET program is a perfect fit

At an early age, Katie Spivakovsky learned to study the world from different angles. Dinner-table conversations at her family’s home in Menlo Park, California, often leaned toward topics like the Maillard reaction — the chemistry behind food browning — or the fascinating mysteries of prime numbers. Spivakovsky’s parents, one of whom studied physical chemistry and…

Artificial Intelligence

Latest from MIT : Helping nonexperts build advanced generative AI models

The impact of artificial intelligence will never be equitable if there’s only one company that builds and controls the models (not to mention the data that go into them). Unfortunately, today’s AI models are made up of billions of parameters that must be trained and tuned to maximize performance for each use case, putting the…

Artificial Intelligence

Latest from MIT Tech Review – Deep learning pioneer Geoffrey Hinton quits Google

Geoffrey Hinton, a VP and Engineering Fellow at Google—and a pioneer of deep learning who developed some of the most important techniques at the heart of modern AI—is leaving the company after 10 years, the New York Times reported today. According to the Times, Hinton says he has new fears about the technology he helped…

Artificial Intelligence

Latest from MIT Tech Review – Google DeepMind’s new AI agent uses large language models to crack real-world problems

Google DeepMind has once again used large language models to discover new solutions to long-standing problems in math and computer science. This time the firm has shown that its approach can not only tackle unsolved theoretical puzzles, but improve a range of important real-world processes as well. Google DeepMind’s new tool, called AlphaEvolve, uses the…

Artificial Intelligence

Latest from Google AI – Mixture-of-Experts with Expert Choice Routing

Posted by Yanqi Zhou, Research Scientist, Google Research Brain Team The capacity of a neural network to absorb information is limited by the number of its parameters, and as a consequence, finding more effective ways to increase model parameters has become a trend in deep learning research. Mixture-of-experts (MoE), a type of conditional computation where…

Summary

Description

Flow

Instructions

Similar Posts