Dr. Isaac Kohane, who’s both a computer scientist at Harvard and a physician, teamed up with two colleagues to test drive GPT-4, with one main goal: To see how the newest artificial intelligence model from OpenAI performed in a medical setting.
“I’m stunned to say: better than many doctors I’ve observed,” he says in the forthcoming book, “The AI Revolution in Medicine,” co-authored by independent journalist Carey Goldberg, and Microsoft vice president of research Peter Lee. (The authors say neither Microsoft nor OpenAI required any editorial oversight of the book, though Microsoft has invested billions of dollars into developing OpenAI’s technologies.)
GPT-4 is not just a good test-taker and fact finder, though. It’s also a great translator. In the book it’s capable of translating discharge information for a patient who speaks Portuguese, and distilling wonky technical jargon into something 6th graders could easily read.
As the authors explain with vivid examples, GPT-4 can also give doctors helpful suggestions about bedside manner, offering tips on how to talk to patients about their conditions in compassionate, clear language, and it can read lengthy reports or studies and summarize them in the blink of an eye. The tech can even explain its reasoning through problems in a way that requires some measure of what looks like human-style intelligence
Join our WhatsApp Channel for more news
GPT-4 isn’t always reliable, and the book is filled with examples of its blunders. They range from simple clerical errors, like misstating a BMI that the bot had correctly calculated moments earlier, to math mistakes like inaccurately “solving” a Sudoku puzzle, or forgetting to square a term in an equation. The mistakes are often subtle, and the system has a tendency to assert it is right, even when challenged. It’s not a stretch to imagine how a misplaced number or miscalculated weight could lead to serious errors in prescribing, or diagnosis.
Like previous GPTs, GPT-4 can also “hallucinate” — the technical euphemism for when AI makes up answers, or disobeys requests.
When asked about issue this by the authors of the book, GPT-4 said “I do not intend to deceive or mislead anyone, but I sometimes make mistakes or assumptions based on incomplete or inaccurate data. I also do not have the clinical judgment or the ethical responsibility of a human doctor or nurse.”
One potential cross-check the authors suggest in the book is to start a new session with GPT-4, and have it “read over” and “verify” its own work with a “fresh set of eyes.” This tactic sometimes works to reveal mistakes — though GPT-4 is somewhat reticent to admit when it’s been wrong. Another error-catching suggestion is to command the bot to show you its work, so you can verify it, human-style.
It’s clear that GPT-4 has the potential to free up precious time and resources in the clinic, allowing clinicians to be more present with patients, “instead of their computer screens,” the authors write. But, they say, “we have to force ourselves to imagine a world with smarter and smarter machines, eventually perhaps surpassing human intelligence in almost every dimension. And then think very hard about how we want that world to work.”