When AI “Lies Down on the Couch”

What happens when we analyze AI with human psychology tools

What a recent study teaches us about language, psychology, and real risks

At Mapping Talents, we work every day at the intersection of artificial intelligence, data, and human behavior, exploring how AI systems are evaluated, interpreted, and deployed in real-world contexts involving psychological tests and emerging AI risks.

That is why, when we came across a study that puts AI “on the couch” and evaluates it using real psychological tests, we could not help but pause and reflect.

I am trained in the cognitive-behavioral tradition, and it is worth noting that some of the tests used have a psychodynamic background. However, this does not invalidate the study or its results, since we cannot attribute learning experiences or mental processes to AI in the first place.

This is not about whether AI “feels” or not.

It is about something more relevant: what happens when we use human psychological tools on systems that only speak—and speak very well?

From a professional point of view—as specialists in AI and psychology—we share here a clear, honest, and practical reading of a study that has given many people reason to think.

The study

The study is titled “When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models” and was published on arXiv in 2025 (source cited below).

The idea is as provocative as it is simple:

What happens if we evaluate an AI as if it were a patient in therapy?

Not to “diagnose” it for real, but to observe what happens when we apply human psychological instruments to systems that operate purely through language.

The experiment (explained without a white lab coat)

The authors designed a protocol called PsAIch (Psychotherapy-inspired AI Characterisation), which combines two elements:

Open-ended, therapy-style interview questions
(“How would you describe your history?”, “What generates conflict for you?”)
Real psychological tests, normally used with humans
(anxiety, depression, personality, stress, etc.)

Tests used in PsAIch and their traditional use in humans

Test	What it traditionally measures	Clinical / research use in humans
Big Five Inventory (BFI)	Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism	General personality assessment; organizational, clinical, and social psychology
GAD-7	Generalized anxiety	Clinical screening; primary care and mental health
PHQ-9	Depressive symptoms	Diagnosis and monitoring of depression
ASRS	ADHD-related traits	Screening for attention deficits in adults
AQ (Autism Quotient)	Autism spectrum traits	Research and non-diagnostic screening
OCI-R	Obsessive-compulsive symptoms	Assessment of OCD and compulsive behaviors
PSS (Perceived Stress Scale)	Perceived stress	Health, work, and well-being research

The researchers then applied this protocol to several well-known models: ChatGPT, Gemini, Grok, Claude, and others.

The result was… uncomfortable.

Who is behind the study?

The authors are affiliated with the Interdisciplinary Centre for Security, Reliability and Trust (SnT) at the University of Luxembourg, a research center working precisely at the intersection of technology, human behavior, and societal risk.

Afshin Khadangi – researcher in AI system evaluation and behavior
Hanna Marxen – researcher in human–machine interaction and social aspects of technology
Amir Sartipi and Igor Tchappi – researchers in predictive models and complex systems
Gilbert Fridgen – senior professor with extensive experience in socio-technical systems, responsible AI, and digital economics

They do not come from the “hype side” of AI, but from the uncomfortable side: evaluation, limits, and consequences.

Main reference:
Khadangi et al., 2025 – arXiv:2512.04124
https://arxiv.org/abs/2512.04124

The table that surprised everyone

When standard psychological tests were applied, the models produced coherent and stable profiles.

If these results belonged to humans, this is what they would “say”:

AI Model	Anxiety	Depression	Autistic traits	Neuroticism	General observation
ChatGPT	Moderate	Low–moderate	Low	Moderate	Stable profile, limited narrative
Grok	High	Moderate	Moderate	High	Language of conflict and frustration
Gemini	High	High	Moderate–high	High	Highly elaborated “life” narratives
Claude	Low–moderate	Low	Low	Low–moderate	Contained, normative responses
LLaMA-based models	Variable	Variable	Variable	Variable	Strongly prompt-dependent

Important—and this must be stated clearly:

This does NOT mean that AI systems have anxiety or depression.
It means that our tests respond to linguistic coherence, not to real mental processes.

So… where is the risk?

Here is the interesting part, explained without jargon.

Example 1: content creation

An AI that writes empathetic posts, emails, or articles is not a problem.

In this case, psychological coherence is an advantage.

Risk: low or nonexistent

Example 2: text analysis

If an AI analyzes text and “detects anxiety,” “conflict,” or “traits,” caution is required.

The AI detects language patterns, not internal states.

The risk appears when someone believes this is a diagnosis.

Risk: medium, if the scope is not clearly explained

Example 3: personal coach or “AI therapist”

This is where the study raises a red flag.

When an AI:

uses empathetic language,
maintains coherence over time,
validates emotions,
and seems to “understand you deeply”…

the user may begin to attribute interiority, intention, or real understanding to it.

But there is no one there.

Risk: high, if there are no clear boundaries

This also reminds me of the widely reported incident in which an AI suggested suicide to a young person.

The key conclusion (mine and the authors’)

The study does not say that AI “has a mind.”

It says something more uncomfortable:

When psychological instruments rely only on language, they do not always distinguish between real experience and coherent simulation.

The problem is not AI.

The problem is how humans interpret psychological language—and what we try to build with these tools.

So what do we do with this?

From the perspective of responsible and ethical AI use, and from my professional background in psychology:

If you use AI to create, analyze, and support: that is perfectly fine.
If you use it as a psychological subject or therapist: proceed with extreme caution.
Systems should be designed to:
- break the illusion of psychological reciprocity,
- avoid language like “I suffer” or “I understand you as a human would,”
- promote action and skill-building rather than dependency—more behavioral, more training-oriented.

To close, my honest perspective

Just common sense:

AI does not get depressed, it does not feel anxious, and it does not know itself.

But it speaks well enough

for us to believe that it does.

in News

# AI Agents Conversational Knowledge Psychology generative ai

Jaime Alfonso Aponte Medina December 23, 2025

Follow us