Is it possible to find cures for 100 rare diseases in a decade or less? Yes, says Blake Borgeson, CTO of Recursion Pharmaceuticals. The company is making a big bet on the power of machine learning to speed discovery, and the bet seems to be paying off.
Borgeson took us inside the company's ongoing work to use machine learning and data visualizations to develop new drugs fast. He also explained how open source software helps his team achieve those goals.
The Enterprisers Project (TEP): Recursion Pharmaceutical is built on the idea that machine learning can accelerate the process of drug discovery. How does this work?
Borgeson: The entire field of drug discovery is built around a premise that you need hundreds of scientists spending millions of dollars over many years, focused on ways of finding potential treatments for a single disease. Recursion has built a platform combining tons of biology experiments captured rapidly and cheaply via imaging, plus machine learning to extract information and make decisions based on tens of millions of images, to find treatments for many diseases in parallel.
We've specifically adopted an open source visualization framework that operates on a Python platform. That makes it easy for our biologists to identify genetic disease markers and assess drug efficacy when visualizing cell data, allowing for faster time-to-value for pharmaceutical companies.
Our data scientists are responsible for evaluating hundreds of genetic diseases, ranging from one evaluation per month to thousands. By using this visualization tool, they are able to stitch together multiple cell images into a plate heatmap. This allows biologists to intuitively zero in on anomalies and see detailed data for both diseased and healthy cell layers. Biologists are then able to easily discern between disease phenotypes and experimental artifacts. With direct access to data and images, biologists can introduce changes to the experiment and see how diseases react.
TEP: What are the challenges in using machine learning to find and test drugs?
Borgeson: In order to be able to answer a variety of biological questions from the same experimental methods, we extract massive amounts of high-dimensional information from tens of millions of cellular images. Automating decision-making in a 1,000-dimensional feature space is a challenging research problem, with various approaches having their own benefits and caveats. And with no clear source of truth in making our drug discovery decisions, we can't simply set up a supervised learning task, but instead need a combination of unsupervised approaches and human exploration and interaction. By boiling the high-dimensional data down to application-specific, two-dimensional interactive data visualizations, our biologists can explore our data and results together with our data scientists. They can tune our automated methods to make more and more effective decisions over time, learning as we go.
TEP: Recursion is taking on rare genetic diseases. Are these diseases that other pharma companies are not addressing because of the relatively small number of patients?
Borgeson: That's exactly right. We focus on researching diseases that large pharma companies ignore because of the low number of people affected by them. While each of these diseases affects on average only a few thousand individuals, making the typical drug discovery investment of resources infeasible, together these rare genetic diseases affect many millions of adults and children, in the United States alone. Our parallel imaging method allows us to tackle a massive set of these diseases at once.
TEP: You've set a goal to find treatments for 100 diseases in 10 years. How did you arrive at this goal? How is it going?
Borgeson: We knew going into this that our goal was to make the biggest impact on human well-being possible, and we wanted an ambitious goal so we'd stretch ourselves further and move faster. We also knew that there are literally thousands of diseases in this category with no currently available treatments, so while treating 100 diseases would be a tremendous achievement – faster and larger impact than many believe possible – we believe it's only the start towards dramatically reducing human suffering.
We have internally identified disease models for 40 diseases in the past 18 months using the methods and tools described earlier, and have physically tested thousands of drugs against 20 of those models. By the end of 2018, we aim to bring 20 new candidate treatments to the Investigational New Drug (IND) stage, the FDA milestone marking the start of clinical trials. This rapid progress is only possible due to our imaging, automation, machine learning and visualization methods.
TEP: What are the benefits of using open source for this project?
Borgeson: Open source tools have a number of really compelling advantages. For one thing, often the best developers and data scientists have become familiar with them over years of working with them due to their public availability, and/or due to personal preferences favoring open source software development and use.
Another advantage is their interoperability. While some solution providers attempt to build all-in-one platforms, open source projects are lightweight and built to play nicely with many other tools through generic interfaces and formats, opening up the space for exploration and innovation.
TEP: Your work is an example of effective altruism – using available resources to help as many people as possible. How does that fuel your results?
Borgeson: Many companies are striving to be "a good X type of company" or "better than their competitors." Instead, we set the goal of having the biggest impact and doing the most good we possibly can. We look at the world and its problems, look at our abilities, resources, and connections, and try to see how, in our case, drug discovery really can just be done dramatically more efficiently. If we were taking an incremental improvement approach, we'd never set the goals or come up with the innovations that have powered our success so far.