Google AR & VR – Ask a techspert: How does Lens turn images to text?

When I was on holiday recently, I wanted to take notes from an ebook I was reading. But instead of taking audio notes or scribbling things down in a notebook, I used Lens to select a section of the book, copy it and paste it into a document. That got me curious: How did all that just happen on my phone? How does a camera recognize words in all their fonts and languages?

I decided to get to the root of the question and speak to Ana Manasovska, a Zurich-based software engineer who is one of the Googlers on the front line of converting an image into text.

Ana, tell us about your work in Lens

I’m involved with the text aspect, so making sure that the app can discern text and copy it for a search or translate it — with no typing needed. For example, if you point your phone’s camera at a poster in a foreign language, the app can translate the text on it. And for people who are blind or have low vision, it can read the text out loud. It’s pretty impressive.

So part of what my team does is get Lens to recognize not just the text, but also the structure of the text. We humans automatically understand writing that is separated into sentences and paragraphs, or blocks and columns, and know what goes together. It’s very difficult for a machine to distinguish that, though.

Is this machine learning then?

Yes. In other words, it uses systems (we call them models) that we’ve trained to discern characters and structure in images. A traditional computing system would have only a limited ability to do this. But our machine learning model has been built to “teach itself” on enormous datasets and is learning to distinguish text structures the same way a human would.

Can the system work with different languages?

Yes, it can recognize 30 scripts, including Cyrillic, Devanagari, Chinese and Arabic. It’s most accurate in Latin-alphabet languages at the moment, but even there, the many different types of fonts present challenges. Japanese and Chinese are tricky because they have lots of nuances in the characters. What seems like a small variation to the untrained eye can completely change the meaning.

What’s the most challenging part of your job?

There’s lots of complexity and ambiguity, which are challenging, so I’ve had to learn to navigate that. And it’s very fast paced; things are moving constantly and you have to ask a lot of questions and talk to a lot of people to get the answers you need.

When it comes to actual coding, what does that involve?

Mostly I use a programming language called C++, which enables you to run processing steps needed to take you from an image to a representation of words and structure.

Hmmm, I sort of understand. What does it look like?

This is what C++ looks like.

The code above shows the processing for extracting only the German from a section of text. So say the image showed German, French and Italian — only the German would be extracted for translation. Does that make sense?

Kind of! Tell me what you love about your job

It boils down to my lifelong love of solving problems. But I also really like that I’m building something I can use in my everyday life. I’m based in Zurich but don’t speak German well, so I use Lens for translation into English daily.

Google AR & VR – Discover Ukraine’s art, culture and history

Explore the rich and diverse Ukrainian culture, arts, and architecture, and the efforts underway to preserve it. Share via: Facebook Twitter LinkedIn Email Copy Link Print More Related work from others: Google AR & VR – Defend the Earth in a new immersive SPACE INVADERS game

Virtual Reality & Augmented Reality

Google AR & VR – Building and testing helpful AR experiences

Augmented reality (AR) is opening up new ways to interact with the world around us. It can help us quickly and easily access the information we need — like understanding another language or knowing how best to get from point A to point B. For example, we recently shared an early AR prototype we’ve been…

Virtual Reality & Augmented Reality

Google AR & VR – 5 tips to finish your holiday shopping with Chrome

We’re coming down to the wire with holiday shopping, and many of us are frantically searching online for last-minute stocking stuffers. Luckily, a few new features are coming to Chrome that will make these final rounds of shopping easier — helping you keep track of what you want to buy and finally hit “order.” Here…

Virtual Reality & Augmented Reality

Google AR & VR – Rediscover your city through a new Lens this summer

With warmer weather upon us and many places reopening in the U.K., it’s the perfect time to go out and reconnect with your surroundings. Whether it’s soaking up that panoramic view of a city skyline that you’ve really missed, or wondering what that interesting tree species was that you pass every day on your park…

Virtual Reality & Augmented Reality

Google AR & VR – New ways Maps is getting more immersive and sustainable

Announcing the launch of immersive view, updates to Live View, new EV charging tools, and glanceable directions on Maps. Share via: Facebook Twitter LinkedIn Email Copy Link Print More Related work from others: Google AR & VR – “The Mandalorian” in AR? This is the way.

Virtual Reality & Augmented Reality

Google AR & VR – Defend the Earth in a new immersive SPACE INVADERS game

Google and TAITO partner to release SPACE INVADERS: World Defense, a new immersive AR game that turns the world into a playground. Share via: Facebook Twitter LinkedIn Email Copy Link Print More Related work from others: Google AR & VR – Discover Ukraine’s art, culture and history

Ana, tell us about your work in Lens

Is this machine learning then?

Can the system work with different languages?

What’s the most challenging part of your job?

When it comes to actual coding, what does that involve?

Hmmm, I sort of understand. What does it look like?

Kind of! Tell me what you love about your job

Similar Posts