O’Reilly Media – Getting the Right Answer from ChatGPT

A couple of days ago, I was thinking about what you needed to know to use ChatGPT (or Bing/Sydney, or any similar service). It’s easy to ask it questions, but we all know that these large language models frequently generate false answers. Which raises the question: If I ask ChatGPT something, how much do I need to know to determine whether the answer is correct?

So I did a quick experiment. As a short programming project, a number of years ago I made a list of all the prime numbers less than 100 million. I used this list to create a 16-digit number that was the product of two 8-digit primes (99999787 times 99999821 is 9999960800038127). I then asked ChatGPT whether this number was prime, and how it determined whether the number was prime.

ChatGPT correctly answered that this number was not prime. This is somewhat surprising because, if you’ve read much about ChatGPT, you know that math isn’t one of its strong points. (There’s probably a big list of prime numbers somewhere in its training set.) However, its reasoning was incorrect–and that’s a lot more interesting. ChatGPT gave me a bunch of Python code that implemented the Miller-Rabin primality test, and said that my number was divisible by 29. The code as given had a couple of basic syntactic errors–but that wasn’t the only problem. First, 9999960800038127 isn’t divisible by 29 (I’ll let you prove this to yourself). After fixing the obvious errors, the Python code looked like a correct implementation of Miller-Rabin–but the number that Miller-Rabin outputs isn’t a factor, it’s a “witness” that attests to the fact the number you’re testing isn’t prime. The number it outputs also isn’t 29. So ChatGPT didn’t actually run the program; not surprising, many commentators have noted that ChatGPT doesn’t run the code that it writes. It also misunderstood what the algorithm does and what its output means, and that’s a more serious error.

I then asked it to reconsider the rationale for its previous answer, and got a very polite apology for being incorrect, together with a different Python program. This program was correct from the start. It was a brute-force primality test that tried each integer (both odd and even!) smaller than the square root of the number under test. Neither elegant nor performant, but correct. But again, because ChatGPT doesn’t actually run the program, it gave me a new list of “prime factors”–none of which were correct. Interestingly, it included its expected (and incorrect) output in the code:

      n = 9999960800038127
      factors = factorize(n)
      print(factors) # prints [193, 518401, 3215031751]

I’m not claiming that ChatGPT is useless–far from it. It’s good at suggesting ways to solve a problem, and can lead you to the right solution, whether or not it gives you a correct answer. Miller-Rabin is interesting; I knew it existed, but wouldn’t have bothered to look it up if I wasn’t prompted. (That’s a nice irony: I was effectively prompted by ChatGPT.)

Getting back to the original question: ChatGPT is good at providing “answers” to questions, but if you need to know that an answer is correct, you must either be capable of solving the problem yourself, or doing the research you’d need to solve that problem. That’s probably a win, but you have to be wary. Don’t put ChatGPT in situations where correctness is an issue unless you’re willing and able to do the hard work yourself.

Latest from MIT : Robots play with play dough

The inner child in many of us feels an overwhelming sense of joy when stumbling across a pile of the fluorescent, rubbery mixture of water, salt, and flour that put goo on the map: play dough. (Even if this happens rarely in adulthood.) While manipulating play dough is fun and easy for 2-year-olds, the shapeless…

Artificial Intelligence

Latest from Google AI – A Multi-Axis Approach for Vision Transformer and MLP Models

Posted by Zhengzhong Tu and Yinxiao Li, Software Engineers, Google Research Convolutional neural networks have been the dominant machine learning architecture for computer vision since the introduction of AlexNet in 2012. Recently, inspired by the evolution of Transformers in natural language processing, attention mechanisms have been prominently incorporated into vision models. These attention methods boost…

Artificial Intelligence

Latest from MIT Tech Review – Reimagining cybersecurity in the era of AI and quantum

AI and quantum technologies are dramatically reconfiguring how cybersecurity functions, redefining the speed and scale with which digital defenders and their adversaries can operate. The weaponization of AI tools for cyberattacks is already proving a worthy opponent to current defenses. From reconnaissance to ransomware, cybercriminals can automate attacks faster than ever before with AI. This…

Artificial Intelligence

Latest from MIT : AI model can help determine where a patient’s cancer arose

For a small percentage of cancer patients, doctors are unable to determine where their cancer originated. This makes it much more difficult to choose a treatment for those patients, because many cancer drugs are typically developed for specific cancer types. A new approach developed by researchers at MIT and Dana-Farber Cancer Institute may make it…

Artificial Intelligence

Latest from MIT : What’s the right path for AI?

Who benefits from artificial intelligence? This basic question, which has been especially salient during the AI surge of the last few years, was front and center at a conference at MIT on Wednesday, as speakers and audience members grappled with the many dimensions of AI’s impact. In one of the conferences’s keynote talks, journalist Karen…

Artificial Intelligence

Latest from IBM Developer : Create a machine learning powered web app to answer questions

Summary In this code pattern, learn how to build a chatbot that answers a user’s questions by finding the answer in a college biology textbook. Description Ever found yourself wondering what mitochondria are? Perhaps you are curious about how neurons communicate with each other? A Google search works well to answer your questions, but what…

Similar Posts