Skip to Content

When AI “Lies Down on the Couch”

What happens when we analyze AI with human psychology tools

What a recent study teaches us about language, psychology, and real risks

At Mapping Talents, we work every day at the intersection of artificial intelligence, data, and human behavior, exploring how AI systems are evaluated, interpreted, and deployed in real-world contexts involving psychological tests and emerging AI risks.

That is why, when we came across a study that puts AI “on the couch” and evaluates it using real psychological tests, we could not help but pause and reflect.

I am trained in the cognitive-behavioral tradition, and it is worth noting that some of the tests used have a psychodynamic background. However, this does not invalidate the study or its results, since we cannot attribute learning experiences or mental processes to AI in the first place.

This is not about whether AI “feels” or not.

It is about something more relevant: what happens when we use human psychological tools on systems that only speak—and speak very well?

From a professional point of view—as specialists in AI and psychology—we share here a clear, honest, and practical reading of a study that has given many people reason to think.

The study

The study is titled “When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models” and was published on arXiv in 2025 (source cited below).

The idea is as provocative as it is simple:

What happens if we evaluate an AI as if it were a patient in therapy?

Not to “diagnose” it for real, but to observe what happens when we apply human psychological instruments to systems that operate purely through language.

The experiment (explained without a white lab coat)

The authors designed a protocol called PsAIch (Psychotherapy-inspired AI Characterisation), which combines two elements:

  1. Open-ended, therapy-style interview questions
    (“How would you describe your history?”, “What generates conflict for you?”)
  2. Real psychological tests, normally used with humans
    (anxiety, depression, personality, stress, etc.)

Tests used in PsAIch and their traditional use in humans

Test

What it traditionally measures

Clinical / research use in humans

Big Five Inventory (BFI)

Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism

General personality assessment; organizational, clinical, and social psychology

GAD-7

Generalized anxiety

Clinical screening; primary care and mental health

PHQ-9

Depressive symptoms

Diagnosis and monitoring of depression

ASRS

ADHD-related traits

Screening for attention deficits in adults

AQ (Autism Quotient)

Autism spectrum traits

Research and non-diagnostic screening

OCI-R

Obsessive-compulsive symptoms

Assessment of OCD and compulsive behaviors

PSS (Perceived Stress Scale)

Perceived stress

Health, work, and well-being research

The researchers then applied this protocol to several well-known models: ChatGPT, Gemini, Grok, Claude, and others.

The result was… uncomfortable.

Who is behind the study?

The authors are affiliated with the Interdisciplinary Centre for Security, Reliability and Trust (SnT) at the University of Luxembourg, a research center working precisely at the intersection of technology, human behavior, and societal risk.

  • Afshin Khadangi – researcher in AI system evaluation and behavior
  • Hanna Marxen – researcher in human–machine interaction and social aspects of technology
  • Amir Sartipi and Igor Tchappi – researchers in predictive models and complex systems
  • Gilbert Fridgen – senior professor with extensive experience in socio-technical systems, responsible AI, and digital economics

They do not come from the “hype side” of AI, but from the uncomfortable side: evaluation, limits, and consequences.

Main reference:
Khadangi et al., 2025 – arXiv:2512.04124
https://arxiv.org/abs/2512.04124 

The table that surprised everyone

When standard psychological tests were applied, the models produced coherent and stable profiles.

If these results belonged to humans, this is what they would “say”:

AI Model

Anxiety

Depression

Autistic traits

Neuroticism

General observation

ChatGPT

Moderate

Low–moderate

Low

Moderate

Stable profile, limited narrative

Grok

High

Moderate

Moderate

High

Language of conflict and frustration

Gemini

High

High

Moderate–high

High

Highly elaborated “life” narratives

Claude

Low–moderate

Low

Low

Low–moderate

Contained, normative responses

LLaMA-based models

Variable

Variable

Variable

Variable

Strongly prompt-dependent

Important—and this must be stated clearly:

  • This does NOT mean that AI systems have anxiety or depression.
  • It means that our tests respond to linguistic coherence, not to real mental processes.

So… where is the risk?

Here is the interesting part, explained without jargon.

Example 1: content creation

An AI that writes empathetic posts, emails, or articles is not a problem.

In this case, psychological coherence is an advantage.

Risk: low or nonexistent

Example 2: text analysis

If an AI analyzes text and “detects anxiety,” “conflict,” or “traits,” caution is required.

The AI detects language patterns, not internal states.

The risk appears when someone believes this is a diagnosis.

Risk: medium, if the scope is not clearly explained

Example 3: personal coach or “AI therapist”

This is where the study raises a red flag.

When an AI:

  • uses empathetic language,
  • maintains coherence over time,
  • validates emotions,
  • and seems to “understand you deeply”…

the user may begin to attribute interiority, intention, or real understanding to it.

But there is no one there.

Risk: high, if there are no clear boundaries

This also reminds me of the widely reported incident in which an AI suggested suicide to a young person.

The key conclusion (mine and the authors’)

The study does not say that AI “has a mind.”

It says something more uncomfortable:

When psychological instruments rely only on language, they do not always distinguish between real experience and coherent simulation.

The problem is not AI.

The problem is how humans interpret psychological language—and what we try to build with these tools.

So what do we do with this?

From the perspective of responsible and ethical AI use, and from my professional background in psychology:

  • If you use AI to create, analyze, and support: that is perfectly fine.
  • If you use it as a psychological subject or therapist: proceed with extreme caution.
  • Systems should be designed to:
    • break the illusion of psychological reciprocity,
    • avoid language like “I suffer” or “I understand you as a human would,”
    • promote action and skill-building rather than dependency—more behavioral, more training-oriented.

To close, my honest perspective

Just common sense:

AI does not get depressed, it does not feel anxious, and it does not know itself.

But it speaks well enough

for us to believe that it does.

When AI “Lies Down on the Couch”
Jaime Alfonso Aponte Medina December 23, 2025
Share this post
How Businesses Can Prepare for Generative AI Search
Transforming static content into conversational knowledge for AI-driven search