When one of China’s biggest celebrities, Simon Gong—also known as Gong Jun—released a new music video in June 2022, it quickly attracted 15 million views on the country’s Twitter-like microblogging site Weibo. But the event also stood out for a different reason—one that only eagle-eyed fans might have noticed. The singer in the video was not Gong himself, but a digital replica created by Baidu, a “digital human” powered by artificial intelligence (AI). Likewise, the lyrics and melody were generated by AI, marking the recording as China’s first AI-generated content music video.

Deloitte defines digital humans as AI-powered virtual beings that can produce a whole range of human body language. In recent years, businesses focused on providing round-the-clock services, as well as the media and entertainment industry, are increasingly adopting this nascent technology, aiming to capture a growing market. And as digital humans increasingly populate other sectors like retail, health care, and finance, Emergen Research forecasts that the global market for digital humans will jump to about $530 billion in 2030, from $10 billion in 2020. 

Digital human created by Baidu AI Cloud and modeled after Chinese celebrity, Simon Gong.

“Rising demand is driving the boom of digital humans,” says Shiyan Li, head of the digital human and robotics business at Baidu, which created the digital model-actor, Gong. “In China alone, there are over 400 million ACGN (animation, comics, games, and novel) fans, and an enterprise market worth hundreds of billions of dollars centered on digital humans.” And according to a company that tracks business registrations, Qichacha, China now has more than 280,000 enterprises that engage in digital human-related activities.

A different kind of digital

The debut of Baidu’s digital celebrity may not seem like much at first, as the concept of “virtual idols” has been around for years. For example, US virtual influencer Lil Miquela has been appearing alongside real human celebrities in online advertisements and TV commercials since 2016, gaining over three million Instagram followers. However, there is something different about the virtual Chinese star: a digital human with the ability to listen, speak, and interact with real humans at a level never seen before. And Gong’s digital duties are not limited to singing. On the latest update of Baidu App, China’s leading search-plus-feed app, Gong appears on users’ phones, helping with searches and queries using the model-actor’s real voice. Since this interactive search experience was launched in 2021, it has boosted the number of voice search queries on Baidu App by 18.2%.

Related work from others:  Latest from MIT Tech Review - It’s easy to tamper with watermarks from AI-generated text

Baidu AI Cloud first began developing a digital employee in 2019 in collaboration with Shanghai Pudong Development (SPD) Bank. Subsequently, they focused their efforts on building a digital financial advisor to provide a service equivalent to that of a human bank representative when real-life employees were unavailable. Today, SPD Bank says more than 460,000 customers rely on digital humans for banking services and portfolio management each month. “Access to digital humans outside of regular business hours allows SPD Bank to offer 24/7 customer service at low cost and high efficiency,” says a bank representative.

More recently, a Baidu-created virtual anchor provided live commentary in sign language at the 2022 Beijing Winter Games for hearing-impaired viewers. In addition to looking like a real person, the avatar was empowered with speech recognition and sign-language interpretation abilities to ensure rapid and highly accurate input and output. With approximately 430 million people around the world experiencing “disabling” hearing loss, according to the World Health Organization, there is strong potential for this technology to be used to increase their ability to access a wide range of content.

A sign-language interpreter created by Baidu AI Cloud’s XiLing.

XiLing: A new generation on an AI platform

From entertainment to public services, digital humans are set to play a greater role in our daily lives. But behind their natural and effortless appearance is a complex web of new and emerging technologies pushing the boundaries of AI innovation.

Baidu AI Cloud’s digital celebrity and virtual sign-language anchors were created through XiLing, a new digital platform launched in 2021. At the Baidu World 2022 event held on June 21, the company announced a new capability on XiLing, which supports the creation of digital humans that can be livestream hosts who can sing, dance, and respond to comments in real-time—without ever needing a single break. XiLing is unique in its ability to support the entire process of creating a digital human from crafting a realistic persona to endowing it with conversational and content-generation skills. One of its most striking attributes is speed. The platform can generate a 3D avatar based on a real person in one to two weeks, while a 2D avatar can be made in just a matter of minutes. 

Related work from others:  UC Berkeley - Training Diffusion Models with Reinforcement Learning

In addition, using XiLing’s intelligent dialogue tools, creators can quickly customize a digital human’s conversational ability, letting it adapt and learn over time. This capability is powered by Baidu’s PLATO, a hundred-billion-parameter dialogue model that enables digital humans to participate in open-domain conversations—that is, to understand any topic and provide relevant responses. Highly accurate speech recognition and lip-syncing with above-98.5% accuracy allows the digital human to have smoother, more human-like interactions. “Use of advanced AI technologies will keep bringing down the cost of building digital humans and significantly improve their interactions with real humans,” says Li.

Just as every real human has their own set of skills and talents, so too does the new generation of digital humans. This can even include giving digital humans the ability to be creative themselves, thanks to the recent progress made by large AI models like Baidu’s ERNIE, which can generate texts and create realistic images when prompted. Digital humans designed to serve as brand spokespersons, for example, can independently create and post on social media, design posters, and perform in videos.

Spice up your virtual life in the metaverse

Digital humans and their virtual world do not merely represent reproductions of real humans and our physical world, but they could also create an entirely new medium of expression in next-generation social media worlds. “Web3 and the metaverse have sparked a wave of speculations in the tech field today about what the future may bring,” says Li. “A digital replica of self will be core to the metaverse. Digital people for customer service will continue to serve the metaverse with a better experience than a pure graphic interface.” 

Related work from others:  Latest from MIT Tech Review - Google DeepMind’s new generative model makes Super Mario-like games from scratch

Yanxia Lu, assistant research director at IDC China, says that the benefits are already clear. “Digital humans are already demonstrating clear business value in numerous fields today,” says Lu. “In the future, there will definitely be a large team of digital humans coexisting side-by-side with us in life and work.” 

In other words, virtual financial advisors and avatar translators are just the beginning. Today, digital humans with the capability to adapt and learn are already poised to dramatically expand access to essential services and more. As society transitions to a more digital world, digital humans are set to play a key role in accompanying us on this journey. 

This content was produced by Baidu. It was not written by MIT Technology Review’s editorial staff.

Similar Posts