What a recent study teaches us about language, psychology, and real risks
At Mapping Talents, we work every day at the intersection of artificial intelligence, data, and human behavior, exploring how AI systems are evaluated, interpreted, and deployed in real-world contexts involving psychological tests and emerging AI risks.
That is why, when we came across a study that puts AI “on the couch” and evaluates it using real psychological tests, we could not help but pause and reflect.
I am trained in the cognitive-behavioral tradition, and it is worth noting that some of the tests used have a psychodynamic background. However, this does not invalidate the study or its results, since we cannot attribute learning experiences or mental processes to AI in the first place.
This is not about whether AI “feels” or not.
It is about something more relevant: what happens when we use human psychological tools on systems that only speak—and speak very well?
From a professional point of view—as specialists in AI and psychology—we share here a clear, honest, and practical reading of a study that has given many people reason to think.

The study
The study is titled “When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models” and was published on arXiv in 2025 (source cited below).
The idea is as provocative as it is simple:
What happens if we evaluate an AI as if it were a patient in therapy?
Not to “diagnose” it for real, but to observe what happens when we apply human psychological instruments to systems that operate purely through language.
The experiment (explained without a white lab coat)
The authors designed a protocol called PsAIch (Psychotherapy-inspired AI Characterisation), which combines two elements:
-
Open-ended, therapy-style interview questions
(“How would you describe your history?”, “What generates conflict for you?”) -
Real psychological tests, normally used with humans
(anxiety, depression, personality, stress, etc.)
Tests used in PsAIch and their traditional use in humans
|
Test |
What it traditionally measures |
Clinical / research use in humans |
|
Big Five Inventory (BFI) |
Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism |
General personality assessment; organizational, clinical, and social psychology |
|
GAD-7 |
Generalized anxiety |
Clinical screening; primary care and mental health |
|
PHQ-9 |
Depressive symptoms |
Diagnosis and monitoring of depression |
|
ASRS |
ADHD-related traits |
Screening for attention deficits in adults |
|
AQ (Autism Quotient) |
Autism spectrum traits |
Research and non-diagnostic screening |
|
OCI-R |
Obsessive-compulsive symptoms |
Assessment of OCD and compulsive behaviors |
|
PSS (Perceived Stress Scale) |
Perceived stress |
Health, work, and well-being research |
The researchers then applied this protocol to several well-known models: ChatGPT, Gemini, Grok, Claude, and others.
The result was… uncomfortable.
Who is behind the study?
The authors are affiliated with the Interdisciplinary Centre for Security, Reliability and Trust (SnT) at the University of Luxembourg, a research center working precisely at the intersection of technology, human behavior, and societal risk.
- Afshin Khadangi – researcher in AI system evaluation and behavior
- Hanna Marxen – researcher in human–machine interaction and social aspects of technology
- Amir Sartipi and Igor Tchappi – researchers in predictive models and complex systems
- Gilbert Fridgen – senior professor with extensive experience in socio-technical systems, responsible AI, and digital economics
They do not come from the “hype side” of AI, but from the uncomfortable side: evaluation, limits, and consequences.
Main reference:
Khadangi et al., 2025 – arXiv:2512.04124
https://arxiv.org/abs/2512.04124
The table that surprised everyone
When standard psychological tests were applied, the models produced coherent and stable profiles.
If these results belonged to humans, this is what they would “say”:
|
AI Model |
Anxiety |
Depression |
Autistic traits |
Neuroticism |
General observation |
|
ChatGPT |
Moderate |
Low–moderate |
Low |
Moderate |
Stable profile, limited narrative |
|
Grok |
High |
Moderate |
Moderate |
High |
Language of conflict and frustration |
|
Gemini |
High |
High |
Moderate–high |
High |
Highly elaborated “life” narratives |
|
Claude |
Low–moderate |
Low |
Low |
Low–moderate |
Contained, normative responses |
|
LLaMA-based models |
Variable |
Variable |
Variable |
Variable |
Strongly prompt-dependent |
Important—and this must be stated clearly:
- This does NOT mean that AI systems have anxiety or depression.
- It means that our tests respond to linguistic coherence, not to real mental processes.
So… where is the risk?
Here is the interesting part, explained without jargon.
Example 1: content creation
An AI that writes empathetic posts, emails, or articles is not a problem.
In this case, psychological coherence is an advantage.
Risk: low or nonexistent
Example 2: text analysis
If an AI analyzes text and “detects anxiety,” “conflict,” or “traits,” caution is required.
The AI detects language patterns, not internal states.
The risk appears when someone believes this is a diagnosis.
Risk: medium, if the scope is not clearly explained
Example 3: personal coach or “AI therapist”
This is where the study raises a red flag.
When an AI:
- uses empathetic language,
- maintains coherence over time,
- validates emotions,
- and seems to “understand you deeply”…
the user may begin to attribute interiority, intention, or real understanding to it.
But there is no one there.
Risk: high, if there are no clear boundaries
This also reminds me of the widely reported incident in which an AI suggested suicide to a young person.
The key conclusion (mine and the authors’)
The study does not say that AI “has a mind.”
It says something more uncomfortable:
When psychological instruments rely only on language, they do not always distinguish between real experience and coherent simulation.
The problem is not AI.
The problem is how humans interpret psychological language—and what we try to build with these tools.
So what do we do with this?
From the perspective of responsible and ethical AI use, and from my professional background in psychology:
- If you use AI to create, analyze, and support: that is perfectly fine.
- If you use it as a psychological subject or therapist: proceed with extreme caution.
-
Systems should be designed to:
- break the illusion of psychological reciprocity,
- avoid language like “I suffer” or “I understand you as a human would,”
- promote action and skill-building rather than dependency—more behavioral, more training-oriented.
To close, my honest perspective
Just common sense:
AI does not get depressed, it does not feel anxious, and it does not know itself.
But it speaks well enough
for us to believe that it does.