O’Reilly Media – Generative AI in the Real World: Stefania Druga on Designing for the Next Generation

How do you teach kids to use and build with AI? That’s what Stefania Druga works on. It’s important to be sensitive to their creativity, sense of fun, and desire to learn. When designing for kids, it’s important to design with them, not just for them. That’s a lesson that has important implications for adults, too. Join Stefania Druga and Ben Lorica to hear about AI for kids and what that has to say about AI for adults.

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Check out other episodes of this podcast on the O’Reilly learning platform.

Timestamps

0:00: Introduction to Stefania Druga, independent researcher and most recently a research scientist at DeepMind.

0:27: You’ve built AI education tools for young people, and after that, worked on multimodal AI at DeepMind. What have kids taught you about AI design?

0:48: It’s been quite a journey. I started working on AI education in 2015. I was on the Scratch team in the MIT Media Lab. I worked on Cognimates so kids could train custom models with images and texts. Kids would do things I would have never thought of, like build a model to identify weird hairlines or to recognize and give you backhanded compliments. They did things that are weird and quirky and fun and not necessarily utilitarian.

2:05: For young people, driving a car is fun. Having a self-driving car is not fun. They have lots of insights that could inspire adults.

2:25: You’ve noticed that a lot of the users of AI are Gen Z, but most tools aren’t designed with them in mind. What is the biggest disconnect?

2:47: We don’t have a knob for agency to control how much we delegate to the tools. Most of Gen Z use off-the-shelf AI products like ChatGPT, Gemini, and Claude. These tools have a baked-in assumption that they need to do the work rather than asking questions to help you do the work. I like a much more Socratic approach. A big part of learning is asking and being asked good questions. A huge role for generative AI is to use it as a tool that can teach you things, ask you questions; [it’s] something to brainstorm with, not a tool that you delegate work to.

4:25: There’s this big elephant in the room where we don’t have conversations or best practices for how to use AI.

4:42: You mentioned the Socratic approach. How do you implement the Socratic approach in the world of text interfaces?

4:57: In Cognimates, I created a copilot for kids coding. This copilot doesn’t do the coding. It asks them questions. If a kid asks, “How do I make the dude move?” the copilot will ask questions rather than saying, “Use this block and then that block.”

6:40: When I designed this, we started with a person behind the scenes, like the Wizard of Oz. Then we built the tool and realized that kids really want a system that can help them clarify their thinking. How do you break down a complex event into steps that are good computational units?

8:06: The third discovery was affirmations—whenever they did something that was cool, the copilot says something like “That’s awesome.” The kids would spend double the time coding because they had an infinitely patient copilot that would ask them questions, help them debug, and give them affirmations that would reinforce their creative identity.

8:46: With those design directions, I built the tool. I’m presenting a paper at the ACM IDC (Interaction Design for Children) conference that presents this work in more detail. I hope this example gets replicated.

9:26: Because these interactions and interfaces are evolving very fast, it’s important to understand what young people want, how they work and how they think, and design with them, not just for them.

9:44: The typical developer now, when they interact with these things, overspecifies the prompt. They describe so precisely. But what you’re describing is interesting because you’re learning, you’re building incrementally. We’ve gotten away from that as grown-ups.

10:28: It’s all about tinkerability and having the right level of abstraction. What are the right Lego blocks? A prompt is not tinkerable enough. It doesn’t allow for enough expressivity. It needs to be composable and allow the user to be in control.

11:17: What’s very exciting to me are multimodal [models] and things that can work on the phone. Young people spend a lot of time on their phones, and they’re just more accessible worldwide. We have open source models that are multimodal and can run on devices, so you don’t need to send your data to the cloud.

11:59: I worked recently on two multimodal mobile-first projects. The first was in math. We created a benchmark of misconceptions first. What are the mistakes middle schoolers can make when learning algebra? We tested to see if multimodal LLMs can pick up misconceptions based on pictures of kids’ handwritten exercises. We ran the results by teachers to see if they agreed. We confirmed that the teachers agreed. Then I built an app called MathMind that asks you questions as you solve problems. If it detects misconceptions; it proposes additional exercises.

14:41: For teachers, it’s useful to see how many people didn’t understand a concept before they move on.

15:17: Who is building the open weights models that you are using as your starting point?

15:26: I used a lot of the Gemma 3 models. The latest model, 3n, is multilingual and small enough to run on a phone or laptop. Llama has good small models. Mistral is another good one.

16:11: What about latency and battery consumption?

16:22: I haven’t done extensive tests for battery consumption, but I haven’t seen anything egregious.

16:35: Math is the perfect testbed in many ways, right? There’s a right and a wrong answer.

16:47: The future of multimodal AI will be neurosymbolic. There’s a part that the LLM does. The LLM is good at fuzzy logic. But there’s a formal system part, which is actually having concrete specifications. Math is good for that, because we know the ground truth. The question is how to create formal specifications in other domains. The most promising results are coming from this intersection of formal methods and large language models. One example is AlphaGeometry from DeepMind, because they were using a grammar to constrain the space of solutions.

18:16: Can you give us a sense for the size of the community working on these things? Is it mostly academic? Are there startups? Are there research grants?

18:52: The first community when I started was AI for K12. There’s an active community of researchers and educators. It was supported by NSF. It’s pretty diverse, with people from all over the world. And there’s also a Learning and Tools community focusing on math learning. Renaissance Philanthropy also funds a lot of initiatives.

20:18: What about Khan Academy?

20:20: Khan Academy is a great example. They wanted to Khanmigo to be about intrinsic motivation and understanding positive encouragement for the kids. But what I discovered was that the math was wrong—the early LLMs had problems with math.

22:28: Let’s say a month from now a foundation model gets really good at advanced math. How long until we can distill a small model so that you benefit on the phone?

23:04: There was a project, Minerva, that was an LLM specifically for math. A really good model that is always correct at math is not going to be a Transformer under the hood. It will be a Transformer together with tool use and an automatic theorem prover. We need to have a piece of the system that’s verifiable. How quickly can we make it work on a phone? That’s doable right now. There are open source systems like Unsloth that distills a model as soon as it’s available. Also the APIs are becoming more affordable. We can build those tools right now and make them run on edge devices.

25:05: Human in the loop for education means parents in the loop. What extra steps do you have to do to be comfortable that whatever you build is ready to be deployed and be scrutinized by parents.

25:34: The most common question I get is “What should I do with my child?” I get this question so often that I sat down and wrote a long handbook for parents. During the pandemic, I worked with the same community of families for two-and-a-half years. I saw how the parents were mediating the use of AI in the house. They learned through games how machine learning systems worked, about bias. There’s a lot of work to be done for families. Parents are overwhelmed. There’s a constant feel of not wanting your child to be left behind but also not wanting them on devices all the time. It’s important to make a plan to have conversations about how they are using AI, how they think about AI, coming from a place of curiosity.

28:12: We talked about implementing the Socratic method. One of the things people are talking about is multi-agents. At some point, some kid will be using a tool that orchestrates a bunch of agents. What kinds of innovations in UX are you seeing that will prepare us for this world?

28:53: The multi-agent part is interesting. When I was doing this study on the Scratch copilot, we had a design session at the end with the kids. This theme of agents and multiple agents emerged. Many of them wanted that, and wanted to run simulations. We talked about the Scratch community because it’s social learning, so I asked them what happens if some of the games are done by agents. Would you like to know that? It’s something they want, and something they want to be transparent about.

30:41: A hybrid online community that includes kids and agents isn’t science fiction. The technology already exists.

30:54: I’m collaborating with the folks who created a technology called Infinibranch that lets you create a lot of virtual environments where you can test agents and see agents in action. We’re clearly going to have agents that can take actions. I told them what kids wanted, and they said, “Let’s make it happen.” It’s definitely going to be an area of simulations and tools for thought. I think it’s one of the most exciting areas. You can run 10 experiments at once, or 100.

32:23: In the enterprise, a lot of enterprise people get ahead of themselves. Let’s get one agent working well first. A lot of the vendors are getting ahead of themselves.

32:49: Absolutely. It’s one thing to do a demo; it’s another thing to get it to work reliably.