O’Reilly Media – Getting the Right Answer from ChatGPT

A couple of days ago, I was thinking about what you needed to know to use ChatGPT (or Bing/Sydney, or any similar service). It’s easy to ask it questions, but we all know that these large language models frequently generate false answers. Which raises the question: If I ask ChatGPT something, how much do I need to know to determine whether the answer is correct?

So I did a quick experiment. As a short programming project, a number of years ago I made a list of all the prime numbers less than 100 million. I used this list to create a 16-digit number that was the product of two 8-digit primes (99999787 times 99999821 is 9999960800038127). I then asked ChatGPT whether this number was prime, and how it determined whether the number was prime.

ChatGPT correctly answered that this number was not prime. This is somewhat surprising because, if you’ve read much about ChatGPT, you know that math isn’t one of its strong points. (There’s probably a big list of prime numbers somewhere in its training set.) However, its reasoning was incorrect–and that’s a lot more interesting. ChatGPT gave me a bunch of Python code that implemented the Miller-Rabin primality test, and said that my number was divisible by 29. The code as given had a couple of basic syntactic errors–but that wasn’t the only problem. First, 9999960800038127 isn’t divisible by 29 (I’ll let you prove this to yourself). After fixing the obvious errors, the Python code looked like a correct implementation of Miller-Rabin–but the number that Miller-Rabin outputs isn’t a factor, it’s a “witness” that attests to the fact the number you’re testing isn’t prime. The number it outputs also isn’t 29. So ChatGPT didn’t actually run the program; not surprising, many commentators have noted that ChatGPT doesn’t run the code that it writes. It also misunderstood what the algorithm does and what its output means, and that’s a more serious error.

I then asked it to reconsider the rationale for its previous answer, and got a very polite apology for being incorrect, together with a different Python program. This program was correct from the start. It was a brute-force primality test that tried each integer (both odd and even!) smaller than the square root of the number under test. Neither elegant nor performant, but correct. But again, because ChatGPT doesn’t actually run the program, it gave me a new list of “prime factors”–none of which were correct. Interestingly, it included its expected (and incorrect) output in the code:

      n = 9999960800038127
      factors = factorize(n)
      print(factors) # prints [193, 518401, 3215031751]

I’m not claiming that ChatGPT is useless–far from it. It’s good at suggesting ways to solve a problem, and can lead you to the right solution, whether or not it gives you a correct answer. Miller-Rabin is interesting; I knew it existed, but wouldn’t have bothered to look it up if I wasn’t prompted. (That’s a nice irony: I was effectively prompted by ChatGPT.)

Getting back to the original question: ChatGPT is good at providing “answers” to questions, but if you need to know that an answer is correct, you must either be capable of solving the problem yourself, or doing the research you’d need to solve that problem. That’s probably a win, but you have to be wary. Don’t put ChatGPT in situations where correctness is an issue unless you’re willing and able to do the hard work yourself.

Latest from IBM Developer : Locate and count items with object detection

This code pattern is part of the Getting started with IBM Maximo Visual Inspection learning path. Level Topic Type 100 Introduction to computer vision Article 101 Introduction to IBM Maximo Visual Inspection Article 201 Build and deploy an IBM Maximo Visual Inspection model and use it in an iOS app Tutorial 202 Locate and count…

Artificial Intelligence

Latest from MIT Tech Review – How US AI policy might change under Trump

This story is from The Algorithm, our weekly newsletter on AI. To get it in your inbox first, sign up here. President Biden first witnessed the capabilities of ChatGPT in 2022 during a demo from Arati Prabhakar, the Director of the White House Office of Science and Technology Policy, in the oval office. That demo set…

Artificial Intelligence

Latest from Google AI – Lidar-Camera Deep Fusion for Multi-Modal 3D Detection

Posted by Yingwei Li, Student Researcher, Google Cloud and Adams Wei Yu, Research Scientist, Google Research, Brain Team LiDAR and visual cameras are two types of complementary sensors used for 3D object detection in autonomous vehicles and robots. LiDAR, which is a remote sensing technique that uses light in the form of a pulsed laser…

Artificial Intelligence

Latest from Google AI – Permutation-Invariant Neural Networks for Reinforcement Learning

Posted by David Ha, Staff Research Scientist and Yujin Tang, Research Software Engineer, Google Research, Tokyo <!– “The brain is able to use information coming from the skin as if it were coming from the eyes. We don’t see with the eyes or hear with the ears, these are just the receptors, seeing and hearing…

Artificial Intelligence

Latest from MIT Tech Review – AI companies have stopped warning you that their chatbots aren’t doctors

AI companies have now mostly abandoned the once-standard practice of including medical disclaimers and warnings in response to health questions, new research has found. In fact, many leading AI models will now not only answer health questions but even ask follow-ups and attempt a diagnosis. Such disclaimers serve an important reminder to people asking AI…

Artificial Intelligence

Latest from MIT : Technique could efficiently solve partial differential equations for numerous applications

In fields such as physics and engineering, partial differential equations (PDEs) are used to model complex physical processes to generate insight into how some of the most complicated physical and natural systems in the world function. To solve these difficult equations, researchers use high-fidelity numerical solvers, which can be very time-consuming and computationally expensive to…

Similar Posts