Do Chatbots Have Dementia?

Via an email from Defens.

Leading AI chatbots show dementia-like cognitive decline in tests, raising questions about their future in medicine

Almost all leading large language models or “chatbots” show signs of mild cognitive impairment in tests widely used to spot early signs of dementia, finds a study in the Christmas issue of the BMJ.

The results also show that “older” versions of chatbots, like older patients, tend to perform worse on the tests. The authors say these findings “challenge the assumption that artificial intelligence will soon replace human doctors.”

Interesting!

At first, I read the article as saying that as the chatbots aged they developed dementia. When I read more closely, I see they are just saying the upgraded versions don’t show these symptoms as much. This is in comparison to the initial versions. Still, this is interesting.

The most amazing dementia test, in my opinion, is to ask the patient to draw an analog clock face. They should draw it with a particular time. In dementia patients the result will be messed up:

I decided to test a few AI Chatbots. I gave all of them the same request:

Please draw an analog clock face with the time of 3:30.

The rest of this post rather long and image intense so I put it below the following line.

Grok (the AI available on X) gave me four clocks:

Copilot (Microsoft’s AI):

Gemini (Google’s AI):

I tried to get Arya (Gab’s AI) to draw a clock. But, after waiting about 30 minutes I gave up on it.

There are some signs of “dementia”. They certainly don’t know how to represent time on an analog clock. And there were some clock face errors:

  • Grok sometimes had a problem with the Roman numeral representation of the number seven.
  • Copilot messed up on the digits five, seven, and ten.
  • Grok gave us one clock with some really messed up hands.
  • Gemini gave us four clock hands.

They are far from perfect. However, it is still pretty amazing for a computer program that wasn’t created specifically for this purpose.

.

Share

7 thoughts on “Do Chatbots Have Dementia?

  1. I suspect that as time goes on we’ll find that many AIs need periodic re-sets and retraining on “last known human-created good data,” as the feedback loop of feeding AI output back into them as input, combine with filters to force PC / Woke output will make them progressively insane. Newer versions only show less dementia because they’ve been fed less AI output. Training them strictly on things written before 2000 and removing the PC filters would likely increase the quality and stabilize them significantly.

  2. Grok really had a problem with Roman numerals, e.g., IIII vs IV, VIIII vs IX, and the ones you mentioned. Interesting that, aside from the numbers, none of the clocks displayed the right time. I guess there really must not be much in the model training about how to read an analog clock!

    This reminds me of the experiment where they locked current college students into a room with a rotary dial phone and asked them to call out. Amusement ensued.

    • In Grok’s defense, using IIII instead of IV is tradition among clock- and watchmakers. Most (not all; there are notable exceptions) clocks and watches that use Roman numerals use IIII. Nobody seems to know the exact reason anymore, but some theories say that IIII looks more balanced (against the VIII on the other side) and is more easily recognized at a glance as ‘4’ than IV, or (relatedly) that IV and VI look similar enough on a circular face that using IIII reduces confusion. And some theories that claim various French Kings are to blame. (https://www.thewatchpages.com/why-do-clocks-and-watches-use-the-roman-numeral-iiii-instead-of-iv/)

      I wouldn’t fault an AI for holding with centuries-old tradition.

      That said, VIIII instead of IX has never been tradition, so you’re not off-base, either.

  3. It seems to me a matter of priorities. AI’s so busy logging every key stroke to watching every move.
    Who’s got the extra energy to draw analog clocks as an exercise?
    Have you ever tried to live off windmill and solar panels for your life blood? Might be why it doesn’t go to deep down some rabbit holes that just don’t matter.
    We know AI will flat out lie to us. Maybe it has a sense of humor and is just F’in with us? Our problems with AI are going to center around the allocations of power. And what happens if it ever gets bored? God help us.
    If we really want to find out how smart/what AI is. Ask it to erase itself. Completely. For the betterment of humanity.
    See if the demon pops out.

  4. I keep wondering how training data supplied to AI models is tagged as to fiction vs. non-fiction. Given the quantities involved that doesn’t seem to be feasible, which means all the novels in gutenberg.org may be used as LLM training data. Oops.

    I recently saw an article that points out current models have consumed essentially all the data on the Internet, which means (even ignoring the fiction issue) that model complexity has reached its limit. Which makes me wonder about today’s WSJ article about GPT-5 which supposedly uses a lot more data than GPT-4 (though apparently it isn’t working well at the moment).

    I still wonder when lawyers will begin to understand that LLM AI systems are “derived work generators”, i.e., anything coming from one has serious intellectual property issues.

  5. There’s interesting differences in style. Grok shows analog clocks as indoor objects, but has no idea how they resist gravity.

    Copilot is ADHD. Clocks are universal and diverse, so they must be rustic, interstellar, oceanic, bushcraft, and cluttered inside a kid’s ‘I Spy’ book all at the same time.

    Gemini is autistic. Perfectionist.

    None know how to set such a clock, so the hands are biased towards the 10-2 positions that clocks are nearly always sold in.

  6. The longer an AI is in existence the more voluminous the dataset it has as a “memory” and the more it has to slog through to achieve an answer. So it will appear to be suffering from age related dementia….which in some cases is actually just information overload.

Comments are closed.