Beyond Silos: Integrating Biostatistics, Data Science, and Engineering to generate rigorous evidence
Interview with Eric J. Daza, DrPH, MPS - Founder @Stats-of-1
Introduction
The promise of personalized medicine lies in the ability to generate robust evidence. How is evidence generated and what are the best practices of collaboration between biostatisticians, data scientists, and the engineering team?
Today, I had the opportunity to interview Eric J. Daza, a renowned biostatistician and an advocate of personalized medicine through n-of-1 trials and single-case designs.
Eric Jay's diverse experiences in academia, industry, and digital health data science provide a unique perspective on the evolving landscape of data science and evidence-based medicine.
Hello Eric Jay, we appreciate your participation in the Digital Medicine Beats Newsletter. You possess extensive expertise and achievements in the field of biostatistics. Could you tell us more about your professional path?
I started as a biology major with a focus on neurobiology in school. Towards the end of my undergraduate studies, I developed an interest in statistics. My college introduced a one-year master's program in Applied Statistics, which I successfully pursued. This program laid the foundation for my career in biostatistics, where I worked as a biostatistician (SAS programmer) at a pharmaceutical company for five years. Wanting to delve deeper into statistical concepts, I decided to pursue a doctoral program in biostatistics, which lasted for eight years.
During this time, I got interested in the field of data science and completed a three-year postdoc, focusing on digital health data science. This journey eventually led me to my recent role in the digital health data science industry (at Evidation Health), where I found great fulfillment.
Could you provide more information about n-of-1 trials? And what motivated you to advocate for their use?
I became interested in n-of-1 trials for two main reasons. Firstly, during the completion of my doctorate, I was exploring potential applications for my background in causal inference. A prominent figure in the field, Professor Susan Murphy, mentioned n-of-1 trials during a keynote lecture, sparking my interest.
Additionally, a close family member struggles with irritable bowel syndrome, a condition known for its idiosyncratic triggers and symptoms, making it an ideal case for n-of-1 trials. The variability in triggers and responses among individuals with conditions like irritable bowel syndrome or migraines underscores the need for personalized treatment approaches. This personalized medicine concept resonated with me, especially in cases where the average effect of a treatment may not be reflective of individual responses.
For instance, while a specific trigger like coffee may worsen migraine severity for one person, it could alleviate symptoms for another. This variability highlights the importance of tailored interventions through n-of-1 trials.
Inspired by these insights, I transformed my research interests into a newsletter (Stats-of-1) and Podcast to raise awareness about the significance of n-of-1 trials and single-case designs in advancing personalized medicine approaches.
Many organizations lack the internal resources (early-stage startups) or established rigorous evidence-generation practices (growth-stage startups). Which practical advice would you give to the data leaders working in these organizations?
I emphasize the importance of adhering to good statistical practices in analysis and analytics. Good statistical practice helps you prevent overfitting. This advice is particularly relevant in environments like early-stage startups or growth-stage startups where data scientists may outnumber statisticians.
It is essential to highlight that rigor in statistical analysis goes beyond just modeling; it also involves diligent tracking of data changes and decision-making processes to guard against overfitting.
From a business standpoint, what this means is you will give your clients much more careful advice or conclusions and you won't make them overconfident and misinterpret the results.
Furthermore, it is crucial to address the false notion that statistical analysis can be treated less rigorously than engineering processes. While engineering workflows prioritize meticulous checks and testing due to the unforgiving nature of logic and computer code, statistical analysis should be approached with a similar level of diligence. The fallacy is that it's okay if the statistical analysis is more lax. And that's true, too. That's true to a certain degree.
But just as engineering processes prioritize reproducibility, it is essential to conduct statistical analysis in a structured and monitored manner to ensure the replicability and consistency of scientific conclusions.
Approaching statistical analysis akin to an engineering pipeline involves maintaining a structured framework while allowing for exploration, similar to the principles of the Scrum or Kanban methodologies for software development. By closely monitoring the analytics process with the same rigor as engineering workflows, organizations can ensure that their statistical analyses are replicable, leading to reliable scientific conclusions that ultimately support the business objectives.
Do you have any best practices for collaboration between the data, engineering, and product teams? Is there something that can be scaled across other organizations?
One effective best practice for improving collaboration among data, engineering, and product teams is to integrate processes into the workflow that involves the analysts, analytics engineers, data engineers, and product teams as closely as possible. It is essential to establish linkages and processes that all teams will utilize, rather than imposing a specific mindset on teams that may not align with their roles.
For instance, incorporating tools like dbt into the analytics engineering pipeline can greatly benefit the data analysis pipeline as well. This integration enables a better understanding of the data origins of the variables and pathways to the final dashboards used by the product team.
It also helps statistical analysis, because it helps you to monitor when you run each analysis, the data used, and the versions.
Tools such as Airflow and dbt should not be limited to data and analytics engineering but should also be implemented in data science, data analysis, and product development to enhance collaboration and efficiency across the organization.
What counterintuitive advice would you give to yourself, looking back at your journey?
The counterintuitive advice I would give myself is to embrace the idea that career paths are often non-linear and that it's okay not to have everything figured out early on.
In today's work environment, it's common to have a more agile approach to career progression, where moving between different roles and organizations is a normal part of the process. I like the concept presented in the book: "The Startup of You", written by Reid Hoffman: adopting an entrepreneurial mindset, even without starting your own business, can be beneficial in navigating this dynamic landscape.
I think that's the counterintuitive advice I would give to younger professionals.
Before we wrap up, is there anything you would like to share with the community?
When I began my career, the concept of data science did not exist in the way we know it today. Approximately 20 years ago, I ventured into the realm of quantitative fields, particularly biostatistics. Data science was still in its infancy back then.
As I reflect on my journey, one question that often arises in my discussions with colleagues and friends is: How should a comprehensive training program for data scientists be structured?
Personally, I believe such a program should encompass elements of consulting, business acumen, and managerial skills, alongside a strong foundation in computer science and statistics as key pillars. I am curious to learn what perspectives the community, especially those who operate outside my statistical background, might bring to this discussion.
Key Takeaways:
On how to generate rigorous evidence: it is paramount to adopt good statistical practices in analysis and analytics to prevent overfitting, especially in environments where statisticians may be outnumbered by data scientists. Carry out statistical analysis with the same rigor and diligence as engineering processes to ensure the replicability and reliability of scientific conclusions.
On the Collaboration between the Data and Product Team: Improve collaboration among data, engineering, and product teams by integrating processes. Furthermore, incorporating tools like dbt and Airflow into the analytics engineering pipeline can greatly benefit the data analysis pipeline as well, as it helps you to monitor when you run each analysis, the data used, and the versions.
On n-of-1 Trials: n-of-1 trials and single-case designs can be powerful tools to address the variability in triggers and responses among individuals with conditions like irritable bowel syndrome and migraines
On career advice: Embrace the non-linear nature of career paths and be open to evolving roles and organizations. Adopt an entrepreneurial mindset, as suggested in "The Startup of You" by Reid Hoffman, to navigate the dynamic landscape of career progression.
A question for the community: How should a comprehensive training program for data scientists be structured? Eric Jay suggests a comprehensive training program for data scientists should include elements of consulting, business acumen, and managerial skills in addition to a strong foundation in computer science and statistics, inviting perspectives from the community on structuring such programs.
Data practitioners interested in learning more about n-of-1 trials can read more at Eric Jay’s newsletter Stat-of-1.
Eric Jay is actively seeking his next opportunity in roles such as Associate Director of Biostatistics, Data Science Manager, and Lead Data Scientist. It is a unique chance for companies to have a renowned biostatistician in their team. More info on LikedIn.