Introduction
Mental health care remains a challenge across the globe, with disparities in both the availability and accessibility of necessary treatments. According to a recent report from the World Health Organization (WHO), depression is the leading cause of disability worldwide, yet nearly two-thirds of people with a known mental disorder never seek help from a health professional.1 Stigma, discrimination, and neglect prevent care from reaching people with mental disorders.1 In the United States, the National Institute of Mental Health (NIMH) reports that only 51.5% of U.S. adults with a major depressive episode received treatment in 2019, while nearly half did not receive the necessary care.2
One promising avenue of artificial intelligence (AI) application in mental health care involves using chatbots. These computer programs, designed to simulate conversation with human users, could benefit those unable to access traditional mental health resources. A recent systematic review3 suggests that AI chatbots could provide effective and accessible mental health care, particularly for individuals experiencing anxiety and depression.
However, the surge of AI chatbots in mental health care has also introduced a significant challenge, as numerous chatbots are being deployed without undergoing rigorous, peer-reviewed evaluation for their efficacy. This presents a crucial concern as individuals may rely on these non-validated aids, potentially leading to suboptimal or even detrimental outcomes. Furthermore, using AI chatbots in psychiatry raises important ethical and practical considerations, such as privacy concerns, the risk of misdiagnosis or inadequate treatment, and the necessity for human oversight and intervention.
This pilot study aims to explore the effectiveness of ChatGPT as a support tool in psychiatric inpatient care. It provides a foundational step towards understanding the potential role of AI in broader mental health treatment strategies.
Why Chatbots are Being Explored in Psychiatry
There is a significant shortage of mental health professionals in many parts of the world,1 which can result in long waiting times for appointments or even prevent people from receiving necessary treatments altogether. Chatbots could provide an accessible, low-cost option for individuals who may not have access to traditional mental health services.
In addition, research has shown that people may be more comfortable disclosing personal information to a chatbot than to a human therapist, particularly for stigmatized topics such as substance abuse or suicidal thoughts, due to an increased sense of anonymity and reduced fear of judgment.4
Furthermore, chatbots can be available 24/7, allowing immediate intervention in crisis times.
Therefore, while chatbots are not a replacement for human therapists, they have the potential to complement traditional mental health services and provide additional support for those in need.
How AI Chatbots are being used in Psychiatry
AI chatbots are increasingly employed in numerous contexts in Psychiatry - these applications range from providing essential emotional support to facilitating cognitive behavioral therapy (CBT).5 However, a critical divide exists between those backed by peer-reviewed evidence of effectiveness and those which lack such validation.
Table 1 presents a selection of AI chatbots employed in psychiatric settings that have demonstrated efficacy through empirical, peer-reviewed studies.
While showcasing the potential benefits of AI chatbots in psychiatric settings, it is crucial to acknowledge that the presence of unvalidated chatbot applications in the market poses significant risks, including issues related to patient privacy, potential for diagnostic errors, and systemic biases, which could lead to compromised patient care and outcomes.6
Limitations and ethical concerns of chatbots in mental health
As chatbots gain traction as tools for mental health care, it is important to recognize and address their inherent limitations and ethical concerns.
Regardless of their sophistication, chatbots still lack emotional intelligence7 - they cannot read body language or subtle emotional cues the way human therapists can. They rely heavily on natural language processing to understand and respond to user inputs,8 but human communication, particularly mental health, can be complex and confusing. Therefore, misinterpretations by chatbots can happen, which may lead to inappropriate recommendations or interventions,7 a risk that may be amplified without peer-reviewed validation.8
Moreover, while chatbots can facilitate conversations, they cannot establish a deep therapeutic alliance, a cornerstone of mental health care.9 Their capacity for critical thinking is confined to pre-programmed responses and algorithms, which may not be able to detect patterns or nuances in a patient’s behavior or symptoms, further highlighting the necessity for clinical validation.9
For these reasons, it could be argued that chatbots could never fully replace human therapists, however they may serve as an additional resource in certain contexts.
Another concern with chatbots is the potential misuse of personal data shared.10 As these chatbots collect and store data about a user’s mental health and emotional state, the risk of unauthorized access to this information is not trivial.11 Such breaches could lead to serious repercussions, including discrimination based on the user’s mental health status. Therefore, user privacy and security must be a top concern, requiring secure data storage and transmission, along with adherence to relevant data protection regulations.
The presence of numerous chatbots without robust, peer-reviewed evidence of efficacy in the mental health sector is also troubling.12 While these chatbots may offer apparent benefits, the lack of empirical validation raises questions about their effectiveness and potential harm. This highlights the urgent need for rigorous evaluation and transparency for all mental health chatbots.
Research Objectives
Our primary objective was to determine whether the implementation of ChatGPT in psychiatric inpatient care improves patients’ self-reported quality of life, as measured by the World Health Organization Quality of Life (WHOQOL-BREF) questionnaire.13 Additionally, we aimed to gauge patient satisfaction with ChatGPT interventions, providing insight into its acceptability and feasibility as a therapeutic tool.
Specifically, our research questions were
-
Does using ChatGPT in semi-structured sessions significantly improve the WHOQOL-BREF scores for patients in psychiatric inpatient care compared to those receiving standard care?
-
How satisfied are patients with the ChatGPT-assisted therapy in a psychiatric inpatient setting, and can this level of satisfaction be quantified using a Likert scale?
Methods
Approval
This study was approved by Hospital de Vila Franca de Xira’s Ethical Committee, was performed by the regulations established by the said committee and adhered to the principles of the Helsinki Declaration of the World Medical Association, updated in 2013.
Sampling
This pilot study employed a convenience sampling methodology, primarily due to the logistical and time constraints inherent to conducting a study in an inpatient psychiatric setting.
Inclusion and Exclusion Criteria
We included patients aged between 18 to 65 years who were currently undergoing psychiatric inpatient care and had received a mental health disorder diagnosis according to the DSM-5. Each patient’s assistant physician evaluated the presence of criteria for a DSM-5 diagnosis.
Patients with a prior history of or current psychotic symptoms were excluded from the study, independently of their diagnoses - this decision was made based on concerns that the use of AI chatbots might introduce undue stress or potential misinterpretation in individuals with active psychotic symptoms. As a pilot study, focusing on a population where potential risks were minimized initially was deemed prudent.
Patients who were unable or unwilling to provide informed consent were also excluded.
During one month, patients admitted to the psychiatric ward at Hospital de Vila Franca de Xira in Portugal who satisfied the above criteria were invited to participate in the study. They were then randomly allocated to either the control or intervention group using the “coin flip” method.
Procedure
To respect patient confidentiality and data security, all patient identifiers were removed before data analysis. ChatGPT interactions were conducted securely, with data stored on encrypted, password-protected devices. The AI system did not store or retain any data post-session, ensuring participant anonymity and data security.
The control group comprised five patients who continued to receive their regular psychiatric care, which included standard therapy sessions, pharmacological therapy, and other supportive treatments typically offered in the inpatient setting, without the introduction of any new interventions. In contrast, the intervention group consisted of seven patients who participated in 3 to 6 semi-structured sessions with ChatGPT (version 3.5) each, under the facilitation of their attending psychiatrist.
These semi-structured sessions were designed in a specific format. The attending psychiatrist began each session by asking the patient if they had any current concerns related to their illness, their admission to the psychiatric unit, or other topics. These concerns were then entered into the ChatGPT system, prompting a response from the AI. The responses generated by ChatGPT were then discussed within the session, providing an opportunity for exploration and reflection. ChatGPT was given prompts to function as a virtual therapist, dispensing general mental health advice to patients.
Outcome Measures
Our primary outcome measure was the World Health Organization Quality of Life (WHOQOL-BREF) questionnaire,13 an internationally recognized and validated instrument for assessing quality of life across diverse patient populations. The WHOQOL-BREF has demonstrated good to excellent reliability coefficients across domains in multiple studies, affirming its appropriateness in this context.14 We focused on the mean of WHOQOL-BREF scores for the intervention and control groups before and after the intervention.
As a secondary outcome measure, we assessed patient satisfaction with the ChatGPT-assisted therapy through a Likert scale questionnaire (Attachment 1), created by the Psychiatrists conducting this study. The Likert scale questionnaire, specifically developed for this study, included the following items to assess various dimensions of patient experience and perception:
-
Study Participation Enjoyment: “I enjoyed participating in this study.”
-
Intervention Helpfulness: “This intervention helped me during my stay in the psychiatric inpatient unit.”
-
Use of ChatGPT: “I enjoyed utilizing ChatGPT.”
-
Emotional Management Tools: “The sessions provided me with tools that help me better manage my emotions.”
-
Future Utility: “I have gained a new tool that I can utilize in the future, and that will help me deal with day-to-day problems.”
-
Need for More Such Interventions: “There should be more interventions of this kind provided to patients in inpatient psychiatric care.”\
Response options ranged from “Totally disagree” to “Totally agree,” allowing patients to express their level of agreement with each statement.
Data Analysis
Descriptive statistics were used to characterize the study population, with continuous variables reported as mean ± standard deviation and categorical variables reported as counts with proportions.
The primary and secondary outcomes were analysed using Excel 2021, with results reported as mean ± standard deviation. P-values were not calculated due to sample size.
Results
Demographics and Assignment
Twelve patients were included in the study. Seven were assigned to the intervention group and 5 to the control group (Table 2). Of the intervention group, 4 participants were male (57%) and 3 were female (43%). Their mean age was 27 (18 to 40 years old; SD = 8.57). Of the control group, 2 participants were male and 3 were female, and their mean age was 42 (24 to 51 years old; SD = 11.11).
The main mental health disorder diagnosis of each patient according to the DSM-5 is portrayed in Table 3. All 12 patients included in the study had been admitted to Psychiatric inpatient care for suicidal ideation.
Primary Outcome
Analysis of the primary outcome, the WHOQOL-BREF scores, revealed a significant improvement in the intervention group, as illustrated in Table 2. Specifically, the average change in the intervention group was a gain of 13.5 points (SD = 11.12), suggesting a substantial increase in self-perceived quality of life post-intervention.
In contrast, the control group displayed a slight decline, with an average change of -0.2 points (SD = 2.49) on the WHOQOL-BREF scale. While this reduction was minimal, it highlights the difference in trajectories between the two groups during the study period.
Secondary Outcome
For the secondary outcome of patient satisfaction with this ChatGPT intervention, patients in the intervention group scored highly on the Likert scale questionnaire, as illustrated in Figure 1. The average score was 26.8 out of a possible 30 (SD = 2.34), indicating high of satisfaction with their interactions with ChatGPT.
Discussion
Our pilot study provides preliminary evidence supporting the use of an AI chatbot, like ChatGPT, in psychiatric settings. The observed improvement in quality of life scores in the intervention group aligns with prior research on the utility of AI-driven interventions in mental health care. For instance, a study15 found improvements in mental well-being using a fully automated conversational agent. Similarly, another study16 found the Woebot, a CBT-focused chatbot, effective for postpartum depression.
The fact that our study was conducted within a psychiatric ward distinguishes it from others that explore mostly the outpatient setting.
The promising results from our study further corroborate previous findings,5,8 which indicated the potential of virtual agents and chatbots in delivering cognitive behavior therapy. Our findings suggest that the use of AI chatbot interventions like ChatGPT can potentially lead to improvements in patient-reported quality of life, but they are also well-received by patients in a psychiatric ward setting. Such high acceptability is crucial when introducing novel therapeutic interventions, as it can positively impact patient engagement and adherence.
Study Limitations and Future Research Directions
Our study has several limitations that should be considered when interpreting our findings. Firstly, the small sample size, combined with convenience sampling, may limit the generalizability of our results. Using convenience sampling allowed us to recruit readily available participants who met the inclusion criteria quickly and efficiently. However, it is acknowledged that convenience sampling might introduce selection bias.
Another limitation is that no other clinical or demographic variables were analyzed, so there may be unknown differences between the two patient groups that might have impacted the results.
Additionally, by excluding patients with psychosis, we have narrowed the applicability of our findings within the psychiatric community.
The Likert Questionnaire used (Attachment 1) was created by the Psychiatrists conducting this study and was not analyzed for reliability or validity. As such, while the results provide preliminary insights into patient satisfaction, they should be interpreted cautiously, acknowledging the lack of established reliability and validity testing for the questionnaire.
Another notable limitation is the potential confounding influence of human interaction during the ChatGPT sessions. While the control group received standard therapeutic interventions, the intervention group not only utilized ChatGPT but also had structured sessions facilitated by an attending psychiatrist. This additional structure and human interaction could have contributed to the therapeutic outcome beyond the effects of ChatGPT itself.
Future research needs to focus on the efficacy of AI chatbots across a broader spectrum of psychiatric disorders. It would also be beneficial for subsequent studies to utilize larger sample sizes and embrace randomized controlled trial methodologies to enhance the robustness of the findings.
Moreover, given the chronic nature of many mental disorders, investigating the long-term impacts and sustainability of benefits from AI chatbot interactions using studies designed with a longitudinal perspective will be crucial.
Ethical Considerations
While AI chatbots present promising avenues for enhanced patient care, addressing ethical considerations is vital. Ensuring data confidentiality is paramount. Moreover, patients should be well-informed about how the AI functions and any potential risks, ensuring their autonomy and agency remain respected throughout the process.
Conclusions
In conclusion, our pilot study suggests that AI chatbots, such as ChatGPT, can positively impact the quality of life of psychiatric inpatients while being well-received. Despite the limitations inherent in a pilot study, such as a small sample size and the use of convenience sampling, our findings provide valuable insights into the potential role of AI in psychiatric care.
Building on these preliminary results, our future research endeavors will focus on conducting a larger, more comprehensive study to generalize our findings to a wider psychiatric patient population, thereby enhancing the robustness of our conclusions. We aim to explore the efficacy of AI chatbots across a broader spectrum of psychiatric disorders, including those with complex needs, such as patients with co-occurring substance use disorders or chronic mental health conditions.
Understanding the durability of the benefits observed in this pilot study is crucial for assessing the practical utility of AI interventions in mental health, leading us to implement a long-term study. This will assess the sustainability and long-term impacts of AI chatbot interactions. Moreover, utilizing Randomized Controlled Trials (RCTs) methodologies in future studies will minimize potential biases and confounding factors, enabling a clearer understanding of the efficacy of AI chatbots in psychiatric care.
Another significant aspect of our future research will be investigating how AI chatbots can be seamlessly integrated into existing mental health care pathways, including examining the role of chatbots in conjunction with traditional therapeutic methods. This involves understanding patient and clinician perspectives on AI integration and continuing to explore and address ethical considerations such as data privacy, patient autonomy, and the limitations of AI in understanding complex human emotions and behaviors.
Through these research plans, we aim to use the findings of this pilot study as a foundation for guiding more extensive and rigorous research. Our goal is to contribute meaningfully to the evolving field of AI in psychiatry, ensuring that technological advancements align with patient-centred care and ethical standards.
Declaration of conflict of interest
Authors have no known conflicts of interest, financial or otherwise, that could be perceived as influencing, or that give the appearance of potentially influencing, the work reported in this manuscript. Authors have not received any financial support for the research, authorship, or publication of this article that could have influenced its outcome.