Inside a co-working space in the Rosebank neighborhood of Johannesburg, Jade Abbott popped open a tab on her computer and prompted ChatGPT to count from 1 to 10 in isiZulu, a language spoken by more than 10 million people in her native South Africa. The results were “mixed and hilarious,” says Abbott, a computer scientist and researcher. 

Then she typed in a few sentences in isiZulu and asked the chatbot to translate them into English. Once again, the answers? Not even close. Although there have been efforts to include certain languages in AI models even when there is not much data available for training, to Abbott, these results show that the technology “really still isn’t capturing our languages.”  

Abbott’s experience mirrors the situation faced by Africans who don’t speak English. Many language models like ChatGPT do not perform well for languages with smaller numbers of speakers, especially African ones. But a new venture called Lelapa AI, a collaboration between Abbott and a biomedical engineer named Pelonomi Moiloa, is trying to use machine learning to create tools that specifically work for Africans.

Vulavula, a new AI tool that Lelapa released today, converts voice to text and detects names of people and places in written text (which could be useful for summarizing a document or searching for someone online). It can currently identify four languages spoken in South Africa—isiZulu, Afrikaans, Sesotho, and English—and the team is working to include other languages from across Africa. 

The tool can be used on its own or integrated into existing AI tools like ChatGPT and online conversational chatbots. The hope is that Vulavula, which means “speak” in Xitsonga, will make accessible those tools that don’t currently support African languages.

Related work from others:  Latest from MIT Tech Review - How AI taught Cassie the two-legged robot to run and jump

The lack of AI tools that work for African languages and recognize African names and places excludes African people from economic opportunities, says Moiloa, CEO and cofounder of Lelapa AI. For her, working to build Africa-centric AI solutions is a way to help others in Africa harness the immense potential benefits of AI technologies. “We are trying to solve real problems and put power back into the hands of our people,” she says.  

“We cannot wait for them”   

There are thousands of languages in the world, 1,000 to 2,000 of them in Africa alone: it’s estimated that the continent accounts for one-third of the world’s languages. But though native speakers of English make up just 5% of the global population, the language dominates the web—and has now come to dominate AI tools, too.  

Some efforts to correct this imbalance already exist. OpenAI’s GPT-4 has included minor languages like Icelandic. In February 2020, Google Translate started supporting five new languages spoken by about 75 million people. But the translations are shallow, the tool often gets African languages wrong, and it’s still a long way from an accurate digital representation of African languages, African AI researchers say.

Earlier this year, for example, the Ethiopian computer scientist Asmelash Teka Hadgu ran the same experiments that Abbott ran with ChatGPT at a premier African AI conference in Kigali, Rwanda. When he asked the chatbot questions in his mother tongue of Tigrinya, the answers he got were gibberish. “It generated words that don’t make any sense,” says Hadgu, who cofounded Lesan, a Berlin-based AI startup that is developing translation tools for Ethiopian languages. 

Related work from others:  Latest from MIT Tech Review - What are AI agents? 

Lelapa AI and Lesan are just two of the startups developing speech recognition tools for African languages. In February, Lelapa AI raised $2.5 million in seed funding, and the company plans for the next funding round in 2025. But African entrepreneurs say they face major hurdles, including lack of funding, limited access to investors, and difficulties in training AI to learn diverse African languages. “AI receives the least funding among African tech startups,” says Abake Adenle, the founder of AJALA, a  London-based startup that provides voice automation for African languages.  

The AI startups working to build products that support African languages often get ignored by investors, says Hadgu, owing to the small size of the potential market, a lack of political support, and poor internet infrastructure. However, Hadgu says small African startups including Lesan, GhanaNLP, and Lelapa AI are playing an important role: “Big tech companies do not give focus to our languages,” he says, “but we cannot wait for them.”  

A model for African AI  

Lelapa AI is trying to create a new paradigm for AI models in Africa, says Vukosi Marivate, a data scientist on the company’s AI team. Instead of tapping into the internet alone to collect data to train its model, like companies in the West, Lelapa AI works both online and offline with linguists and local communities to gather data, annotate it, and identify use cases where the tool might be problematic. 

Bonaventure Dossou, a researcher at Lelapa AI specializing in natural-language processing (NLP), says that working with linguists enables them to develop a model that’s context-specific and culturally relevant. “Embedding cultural sensitivity and linguistic perspectives makes the technological system better,” says Dossou. For example, the Lelapa AI team built sentiment and tone analysis algorithms tailored to specific languages. 

Related work from others:  Latest from MIT : After Amazon, an ambition to accelerate American manufacturing

Marivate and his colleagues at Lelapa AI envision a future in which AI technologies work for and represent Africans. In 2019, Marivate and Abbott established Masakhane, a grassroots initiative that aims to promote NLP research in African languages. The initiative now has thousands of volunteers, coders, and researchers working together to build Africa-centric NLP models. 

It matters that Vulavula and other AI tools are built by Africans for Africans, says Moiloa: “We’re the custodians of our languages. We should be the builders of technologies that work for our languages.”           

Similar Posts