How to Translate Big Data and DevOps Into Clinical Genetics Applications

5 min readJun 18, 2022

Presented by: Angelina Uno-Antonison, Software Architect — Genomics, at the UAB School of Medicine

Women Who Code Podcast 40 | Spotify — iTunes — Google — YouTube — Text

Women Who Code Talks Tech: In celebration of DNA Day April 25th! “How to Translate Big Data and DevOps Into Clinical Genetics Applications” Presented by: Angelina Uno-Antonison, Software Architect — Genomics, at the UAB School of Medicine. This talk was featured at CONNECT Digital 2020.

We are a multi-disciplinary team of software engineers, bio-pharmacists, geneticists, and data scientists. Our goal at the center is to develop efficient and effective software to analyze data, to come up with novel information that provides benefits for patients suffering from rare diseases, undiagnosed or misdiagnosed diseases, and the families and the people that take care of them. We focus primarily on interpreting the molecular variation within a patient.

We want to help clinical decision-making in the present. We don’t want to work on software that’s going to help people five years from now, we want to work on software that helps people right now. Our goal is to make precision medicine as cheap as possible, so it can be accessible. The hospital reached out to us and needed help to come up with a centralized inventory system for all the different labs popping up, doing a combination of commercial and local lab testing for COVID-19. We were able to make an application from conception to production within six days.

A genome is essentially a code to describe who you are and how you’re built biologically. It has some information like your eye and your hair color, or it can go even to something about how likely you can have nose bleeds. It is comprised of 3.2 billion characters in length, and it’s super high-density storage. For one gram of DNA, you can hold up to approximately one billion terabytes of data and it’s really robust ’cause it can last a really long time.

We can sequence fragments that are thousands of years old, and it’s organized by 46 chromosomes. There are twenty-two autosomal pairs plus an XX for females or XY for males. If you put them all together, and line them all up, it’s approximately 1000 trips from the Earth to the Sun. The house is code written in computers, CPUs understand zero and ones, and that’s it. Our DNA is written in four different types of nucleotides called As, Ts, Cs, and Gs. They are: Adenine, thymine, cytosine, and guanine. These bonds are in pairs and they run on a double helix pattern.

How does all that translate into precision medicine? On one side we have a patient, on the typical path that you would imagine when you go to a physician. You get medical data and they research to figure out what’s wrong. On the other side, we take a sample from the patient, it does not have to be blood, and you put it into what’s called the sequencing machine. And then that machine outputs really, really large files of As, Cs, Ts, and Gs and quality metrics. We process that on what’s called the secondary analysis or secondary pipeline to reduce that data to get it consolidated for analysts or a physician to take a look at to understand what are the molecular variations of what could be causing your disease.

How does the secondary pipeline work? You have your DNA, you put this in a sequencing machine and it cuts it up into a bunch of tiny pieces, and then it outputs those reads on what’s called a FASTQ file. FASTQ is not in order, there is software to put it in order. Once we have it aligned, then we look for the differences between the reference and those patients that you just sequenced. Those differences are put in what’s called the Variant Call File, which is sort of a standardized format. The entire variant analysis team decides which variants are believed to be causing the disease.

These are kind of the criteria we have when we implement and do things within the Lab center. The very first thing we do is we come up with something called our manual of operations. It has our charter and mission statement. It clearly defines expectations and conduct in the center and how we expect people to treat each other and behave. We define our core values as diversity, teamwork, respect, excellence, integrity, and then tenacity. When we interview people to join our group, we send them a version of our charter and mission statement so they know exactly who we are and who they’re interviewing. We have colleagues that are from marginalized communities, and we want to ensure they are protected and safe in their work environment.

The next thing that we do is have a DevOps Cluster. We wanted to eliminate the time it takes to set up an environment to use. We are really lucky that UAB has a high-performance computing cluster that UAB Research Compute manages. We get to play with making source control lists and use Terraform to spin up our DMs and manage the life cycle of those on OpenStack. We provision those machines using Ansible, and then when we want to spin up all our applications we use Docker Compose. We add container orchestration so that we can do the lab deployments and health checks. We use a combination of traffic to route the requests and then Docker’s Form to manage the life cycle. We apply continuous integration and continuous deployment.

We rely on the previous work we’ve done to generate metrics so we can help answer questions about what we’ve been doing. Then it’s all automated, again, our packaging, our releasing, our deploying process. That removes a lot of manual work, so we can do more time programming or collaborating. We used Jenkins to manage this pipeline.

We had to acknowledge that the quality of a product doesn’t need to be the same for every project, but it needs to be defined so that we can all agree on some parameters. We came up with, according to the project’s visibility, different quality metrics. We added to projecting the daily test capacity. We were able to add where they can report the tests that were run and tests that were positive according to the testing platform.

This hospital has a centralized tool to project testing capabilities and share testing results with hospital administration. And I really don’t think we could have accomplished this if we had not done all that upfront investment in our DevOps process and positive work culture.

How to Translate Big Data and DevOps Into Clinical Genetics Applications

Written by Women Who Code

No responses yet