I have not failed. I’ve just found 10,000 ways that won’t work (Thomas A. Edison).

Introduction

Medical trainees are stimulated to conduct research projects, either by intrinsic motivation, training requirements or to improve career perspectives. Experience with research during training helps to bridge the gap between research and clinical practice.1 There is a strong need for clinicians who can translate clinical questions into research hypotheses and translate research findings into clinical practice.1 When trainees take their first steps in clinical research, they may encounter several challenges. Identifying a straightforward clinical question and a research hypothesis, selecting a feasible method for collecting and analyzing the data within the time frame of their training, and reporting results all require knowledge, skills, support, and resources.

An exciting and promising development that may lead to more accessible, better, and quicker ways for clinicians and trainees to combine clinical work and science is the use of routinely collected data. The availability of and accessibility to routinely collected data makes it theoretically possible that clinicians can scientifically test a clinical impression or finding from previous research in their own practice. In the long term, this might lead to new predictors of diagnosis, course of disease or effect of treatment.2 Routinely collected data, sometimes also referred to as or associated with ‘big data’, implies notions of making use of already collected and/or real-time clinical data, in combination with other data sources, leading to datasets that are sufficiently large to detect patterns at group level.2 Clinicians can use the knowledge that results from the patterns at a group level to treat individual patients.

In this essay, we discuss and reflect on a project based on a simple clinical question but that, despite easy access to routinely collected data, failed to provide answers. Potential pitfalls of being able to test different hypotheses quickly and efficiently are discussed, including the importance of a priori hypotheses generation, an increased chance of random findings, and the risk that only positive findings are shared. We emphasize the importance of a clear and solid study rationale and research plan that preferably is submitted for pre-registration. We highlight the importance of sharing and pre-printing the research process and findings, even in the case the findings reject the research hypothesis (negative findings) or when the researchers were not able to answer their research question.

Case study: Given names at birth and child and adolescent psychiatric conditions

From clinical hypothesis to the primary research question

In the Netherlands, as in several other countries and cultures, at birth, parents choose a given name for their child that will be most used in daily life. Parents choose names that refer to a family tradition or specific meaning or names that just sound appealing to them.3 In our daily clinical work, healthcare professionals seemed to associate some of the children’s given names with developmental disorders such as ADHD, oppositional defiant disorder, and autism spectrum disorder (ASD). If these alleged associations are based on prejudices, it is crucial to be aware of that, because they might obscure the diagnosis. If the alleged associations are based on an actual association, given names might be helpful as a predictive factor in diagnostics.

A retrospective exploratory study was conducted in the child and adolescent psychiatric department of the University Medical Center of Utrecht to investigate the relationship between first name and DSM classification based on routinely collected data. Our clinical question was: Are some (characteristics of) given names associated with a higher prevalence of child and adolescent developmental disorders? This clinical question was translated to the following primary research hypothesis: certain names are more common amongst patients with a primary DSM classification of development disorder than amongst the total patient population of our child and adolescent psychiatric department.

We consulted a senior statistical researcher from the Innovation team of the psychiatry department of the University Medical Center of Utrecht and an expert in research on given names from the University of Utrecht for advice on our research plan.

Available data, methods, and results for our primary research question

Data were available from 6989 patients aged 0-18 years old. As given names in the Netherlands are primarily different for boys and girls,4 and most of our sample consisted of boys (78%), the research question was specified to boys. First, the information about given names was combined with information about DSM classification. Then, we replaced names by a name-code linked to the patient’s classification so that individual patient information could not be traced back by the researcher. We compared the given names of 3992 boys with a primary DSM classification of autism spectrum disorder (ASD), Attention Deficit and Hyperactivity Disorder (ADHD), or Disruptive Behaviour Disorder (combines oppositional behavior disorder and conduct disorder) with the given names of all boys in our study population. We found a few names that were more common in the ASD group (3 given names) and the ADHD / behavioural disorders group (3 given names). However, despite the large research population, due to the enormous diversity of given names, the frequency of occurrence for each given name was low. The most common name in our sample appeared 50 times, and 880 names appeared only once. These findings showed that to investigate a relationship between any given name and a diagnosis, many tests would be needed. The current study population was not large enough to demonstrate relationships after correction for multiple testing to reduce the risk of chance or false positive findings.

Secondary attempt and adjusted research question

In a second attempt, we took inspiration from previous connections in the literature on given names and psychiatry.3,5–7 Several studies showed a possible association between having a unique or rare name and the risk of psychopathology. Having a unique name would be associated with ‘psychoneurosis’ in 3320 Harvard students,5 with more severe symptoms in 1682 boys admitted to a psychiatric clinic,6 and with psychosis in psychiatric patients.7 Research from Shovel et al. showed that Israeli children with ADHD, compared to a control group, had names that were more often unique, more often expressed activity and contained fewer syllables.3 An alternative research question was formulated aimed at reducing the number of variables by categorizing first names. The hypothesis was that the names of children with developmental disorders would differ from those of children in the general Dutch population; and that children with developmental disorders have shorter names than names in other diagnostic groups within our study population.

First, we explored whether the most frequently occurring given names from our population corresponded to the most popular names in the entire Dutch population. For this purpose, we compared the most frequently occurring names within our population with a list of the 100 most popular boys’ names born in the same period in the Netherlands. The names in the top 3 of both lists were equal; only the order differed. The top 30 of both lists were essentially the same. In conclusion, with this exploratory analysis, we did not find substantial differences between the most frequently occurring names in our research population and the entire Dutch population. This finding argues against the hypothesis that the first names of children with developmental disorders would differ from those of children in the general population. However, it is circumstantial evidence and cannot be used to reject our original hypothesis reliably. Next, we attempted to investigate whether unique names occurred more frequently in our research population compared to the Dutch population. This was not feasible because we would have to provide our 880 unique names to the Voornamenbank of the Meertens Institute that holds the data base of first names in the Netherlands.4 Such a transfer of data would only be ethically feasible after anonymization which of course is not an option for given names, as they could be traced back to individual patients.

Finally, to determine whether the length of names in the number of letters and syllables differed between the diagnostic groups (ASS, ADHD/ODD and others) of the study sample population, we used ANOVA. A Tukey’s honestly significant difference (HSD) post hoc test showed no significant differences in the mean number of letters and syllables between investigated diagnostic groups (p>0.05). Although we rejected our hypothesis about differences in the length of names with this finding, again it is circumstantial evidence that cannot be used to reject our original hypothesis reliably.

Contemplation

This example shows that, despite a clear and straightforward clinically motivated question and easy access to a large amount of routinely collected data from one’s practice, it may not be possible to answer the original research question correctly.

We encountered various obstacles and challenges, which are partly specific and related to the nature of the variable (diversity and number of first names), partly related to the use of routinely collected data and partly inherent in scientific research in general.

The great diversity of first names makes it challenging to show associations between a specific given name and a psychiatric condition. Moreover, the great diversity of first names implied many low-frequency names, making anonymity (literally ‘without a name’) complex, which prevented us from providing our data to the national first name database, the Voornamenbank of the Meertens Institute,4 to compare frequencies of occurrence. A larger dataset could provide a solution to both objections. However, naming is, among other things, influenced by the language of communication, zeitgeist, culture, region, and religion. Therefore, possible connections only apply within a particular subgroup, which limits the possibilities for expanding the dataset.8

Challenges when using routinely collected data in general

First, research that uses routinely collected data is sometimes difficult to interpret because the data is usually not explicitly collected for answering the clinical question. Therefore, the researcher should contemplate in advance whether the available data is suitable for measuring the concepts of interest. If there is a discrepancy between the ideal data and the available data, the researcher must consider this when interpreting the results. In our example, it was concluded this was not a barrier, because in the measurement and registration of first names, except for typos, no variation is likely.

Second, the accessibility of large datasets, making it possible to quickly and easily test many different hypotheses, entails a greater risk of chance findings.9 Whereas in prospective research pre-registration is often a requirement for publication, this is less common in retrospective research. The lack of requirement for pre-registration for retrospective research means that the researcher is less controlled by the predetermination of his or her research question, hypothesis and research method, which increases the risk of a ‘fishing expedition’ and random findings.10 In addition, pre-registration of research with routinely collected data would reduce the risk of publication bias, with only significant research results reported.

Finally, in the iterative cycle of collecting, analyzing, and adjusting parameters that are routinely collected, feedback from clinicians is essential. Clinicians discovering gaps in routinely collected data can help determine which data needs to be collected routinely and stimulate and facilitate the collection of the necessary information to better answer their questions.

Challenges in merging scientific research with daily clinical practice in general

A pitfall in research is the temptation to formulate the research question so that the answer can be provided with the available data or methods, with the risk that the results do not answer the original clinical question. We tried various alternative research strategies but concluded that the results were insufficiently in line with the main question and were therefore not suitable for supporting or rejecting the original hypothesis. This emphasizes the importance of a clear a priori hypothesis and research plan and the persistence and consistency to hold on to that a priori hypothesis and avoid fishing expeditions.

Moreover, in clinical practice, the focus usually is on causal relationships. In scientific research in general and psychiatry in particular, this is complicated as there are often no simple, linear but complex and multifactorial relationships. Our research question was about a simple relationship between two variables (first name and diagnostics). However, if we had demonstrated a link, questions about the nature of the association and any confounders would have arisen. Demonstrating causality requires prospective interventional research, which is impossible in the relationship between first name and developmental disorder. Thus, a found link would not clarify whether an increased risk of a developmental disorder is associated with having a specific first name (for example, because shared genetic vulnerability in parents leads to a preference for specific names) or whether having a specific first name can lead to developing a developmental disorder (for example, through interaction with the environment).

Conclusion

Studies in which the researcher concludes that it cannot answer the research question are probably common but are rarely published. We believe that people can learn valuable lessons from such research projects. It provides a complete picture of the research process in general that involves trial and error, provides insight into obstacles in the research and provides starting points for discussion. Negative findings can provide essential insights and sharing them fits as a responsibility to the society that funds this research and ensures that fellow researchers can save themselves the time and effort of this research or can build on previous attempts.11

Based on our experiences, we conclude that, even when trying to answer a seemingly simple clinical question with routinely collected data, it is good to invest in a reasonable a priori hypothesis, to invest in pre-registration, to publish negative findings systematically and to describe the entire research process when looking for alternative strategies. If all else fails, it may be valuable to define our failures and give them the name of challenges and learned lessons. Thus, we can find success in stimulating psychiatric trainees and early career colleagues to try to bring daily clinical practice and scientific research closer together in the future.

Take-away lessons

  • Even when trying to answer a seemingly simple clinical question with available data, it is crucial to invest in a reasonable a-priori hypothesis and pre-registration.

  • Access to routinely collected data may facilitate psychiatric trainees to build research knowledge and skills that help bridge the gap between research and clinical practice.

  • Research projects with negative findings or in which researcher conclude that they cannot answer the research question should also be published.

  • From a failed research project, we can learn valuable lessons.


Declaration of interest

None.

Acknowledgments

None.

Role of the funding source

None.

Ethics approval

For retrospective research with anonymized data from medical files no ethical approval is necessary because it does not fall under the so-called law for Medical Scientific Research (wet Medisch Wetenschappelijk onderzoek).