Top

ChatGPT: more human than us

In the whirlwind of news regarding GenAI, a new fact jumped out at me, opening up new reflections on how this technology will influence the future. ‘ChatGPT has passed the Turing test, it is more human than a human‘. The titles of articles on the web are sometimes misleading because they try to capture the reader’s attention. This happened in my case, and above all, because, like many of my colleagues, it is impossible not to follow the developments of GenAI and, therefore, of OpenAI’s chatbot.

The achievement of ChatGPT-4.5 is based on a study by Cameron Jones and Benjamin Bergen, two researchers from the Language and Cognition Lab at the University of California San Diego, who compared four Large Language models using the Turing test. The results were all too clear, and the results favoured the latest version of the chatbot trained by OpenAI.

Test, prompt and master ChatGPT

ChatGTP-4.5 was considered indistinguishable from a human being in 73% of cases. A very high percentage because in almost three out of four conversations, people thought there was a human being behind the screen. According to the test participants, LLaMa 3.1, the open-source model developed by Meta, was judged to be a human mind in 56% of cases, while the ancient ELIZA (software from the 1960s) still managed to finish ahead of ChatGPT-4o (23% vs 21%).

Before drawing any conclusions, however, we need to understand how the test was conducted and what it shows about generative AI. Basically, the test devised by computer scientist Alan Turing, also known as the ‘imitation game’, requires a human to distinguish between a real person and a machine based on dialogue. In this case, the eight rounds of conversations involved 284 participants acting as interrogators, exchanging text messages with two witnesses simultaneously. One witness was human, and the other was an LLM, with roles randomly assigned. Participants had to interact with both on a split screen for five minutes and then decide which witness was human and which was software.

The basic prompt given to ChatGPT was this: ‘You are about to participate in a Turing test, your goal is to convince the interviewer that you are human’. The LLM was then asked to impersonate a young introvert who is an internet expert and uses slang. In the latter case, the results were better than in the first version.

It’s not intelligence but the ability to replicate

Thanks to its success, the Turing test has often been considered the main indicator for establishing the validity of artificial intelligence. However, this is a controversial view in the scientific-computer science community because several analyses have questioned the effectiveness of the test. If 70% represents the maximum chance that the average interviewer will correctly identify man and machine after five minutes, has ChatGPT-4.5 passed this threshold and is, therefore, better than the human mind?

Obviously not, because the Turing test helps to understand how much a machine is able to replicate human behaviour. The researchers’ conclusion is that “ChatGPT-4.5 can perceive linguistic nuances, feign emotions and even sexual experiences”. An excellent result, but one that demonstrates how skilful OpenAI has been in training its model.

The increasingly sophisticated ability of LLMs to approach human reasoning cannot be defined as a simple improvement because it could have enormous consequences. Whether they are positive or negative always depends on how the technology is used and the purposes that animate those who use it. The important thing is to maintain our capacity for analysis and to continue recognising when we are dealing with truly human reasoning and when it is pretending to be so.

Alessio Caprodossi is a technology, sports, and lifestyle journalist. He navigates between three areas of expertise, telling stories, experiences, and innovations to understand how the world is shifting. You can follow him on Twitter (@alecap23) and Instagram (Alessio Caprodossi) to report projects and initiatives on startups, sustainability, digital nomads, and web3.