By: Sam Sinai

As 2024 comes to a close, I look back at one of Dyno’s most exciting technical developments. Earlier this year, Dyno introduced LEAPSM: Low-shot Efficient Accelerated Performance. With as few as 19 designs in one experimental batch, LEAP can achieve performance improvement equivalent to what previously required at least an additional round of high-throughput in vivo experiment (animal testing). Moreover, LEAP is approaching a success rate in designing high-functioning proteins where it would be feasible to directly test its computational designs in single-candidate validation experiments. This future would dramatically cut the time needed for therapeutic development and eventually reduce the cost of treatment for patients.  

The Promise of Machine Learning in Therapeutics

A key promise of machine learning (ML) in biology is to relieve researchers of the burden of laborious and costly experiments, empowering them to solve higher leverage problems. Designing proteins and testing their efficacy in vivo typically demands multiple rounds of complex trials. Dyno (and the field) hopes that ML can cut through some of this, discerning the underlying principles of protein design and generating sequences with desired functionalities. We want therapeutics that translate to in vivo settings, i.e. that work in humans, and we want to find them fast. It is important to note that the real bottleneck for speed is not how many good looking sequences you can propose in silico (purely computationally), but how many of the ones you test will actually translate to real treatments. With LEAP, we were able to discover high performing AAVs in silico, with a very high per-attempt translation rate in non-human primates, effectively cutting out the need for at least one high-scale animal experiment.

The Challenge of Designing Complex Proteins like AAVs

In this post, I show a case study on Dyno’s favorite protein: The Adeno-Associated Virus (AAV) capsid. The AAV capsid is a complex protein with structural and enzymatic parts that needs to fold, assemble, package a genome, evade the body’s defenses, enter specific cells and deliver its DNA cargo in the nucleus in order for a therapy to be effective. A capsid should also avoid entering off-target organs such as the liver, where it can cause toxicity. The success or failures of these functions is determined by the protein sequence of the capsid. Effective delivery of genes into specific organs and cells has been a challenging bottleneck of gene therapy. 

The anatomy of a 60 year problem. A ~735 amino-acid chain is the building block for the 60-meric AAV capsid. This sequence determines whether the intravenously-administered capsid can be manufactured, avoid neutralization and off-target transduction, cross the blood-brain barrier, and deliver its cargo to neurons.

Designing capsids with specific traits—such as efficient targeting of brain cells and avoidance of liver tissue—requires predicting the success of each step of the process depicted above, either through mechanistic understanding of the biology (“white-box”) or by employing ML methods trained on experimental endpoints (“black-box”; prediction without explicit knowledge of the mechanism). Despite years of progress, a comprehensive mechanistic picture for AAVs remains distant. Importantly, understanding the mechanisms that enable a particular sequence to perform well does not guarantee insights into how another sequence might achieve the same (or better) phenotypic outcomes through different means.

On the machine learning side, the capsid’s behavior is poorly predicted by zero-shot metrics extracted from structure-based and protein language models. Generally speaking these methods are far from good proxies for predicting function for complex proteins, and even for simpler in vitro (in lab media, but not in organisms) experiments tend to have low accuracy as the designs diverge from natural protein sequences.   

Dyno adopts a “grey-box” approach to designing AAV capsids with novel properties. While primarily relying on black-box models and empirical data, we incorporate mechanistic knowledge through models and experiments when available. 

Current Approach to AAV Discovery

Consider a typical process of in vivo capsid discovery. Usually, 10,000 to 10 million pooled sequences are screened in mice or non-human primates (NHP) to identify the best-performing capsids in terms of tissue targeting. This high-throughput discovery screen is then usually repeated, using insights from previous rounds to refine the sequence design. Then, a smaller subset of 10 to 100 sequences is selected for a follow-up validation experiment, which typically leads to a final group of 1 to 5 candidates that undergo single-capsid testing in primates. This final stage—involving histological analysis necessary for clinical trial clearance—is expensive and reserved for only the most promising candidates. For in vivo AAV studies, each experiment could cost upwards of a million dollars and can take nine months to complete.

LEAP: Pushing the Boundary of Possibility with Many Models, but Few Attempts

In LEAP, we train a mixture of tens of partially independent and calibrated models: predictors, filters, and generators, to propose and sift through a large set of virtual sequences. Models range from those pre-trained on public data, to fine-tuned or trained from scratch on different slices or attributes of Dyno’s internal data. Our generative methods can propose a high performing, diverse set of candidates and our filtering ensembles can accurately eliminate sequences that are unlikely to be extremely good. Designing capsids that are much better than anything you have trained on is hard to get right. As articulated well here:

“When designing objects to achieve novel property values with machine learning, one faces a fundamental challenge: how to push past the frontier of current knowledge, distilled from the training data into the model, in a manner that rationally controls the risk of failure. If one trusts learned models too much in extrapolation, one is likely to design rubbish. In contrast, if one does not extrapolate, one cannot find novelty” – Fannjiang & Listgarten, 2023

To use an analogy, it is as if you train an AI system on tweets, all with less than 1000 likes, and ask it to propose a tweet that is liked 10,000 times, with as few as 10 attempts. This type of “out-of-distribution” extrapolation, finding high performance (often across multiple dimensions) you do not observe in your training data, is the most critical dimension in therapeutics design, and importantly current frontier models are relatively weak at optimizing in this dimension. 

Because of our confidence in the LEAP’s proposals, Dyno can bypass earlier rounds of experimental de-risking and directly test designs in high-stakes experiments. This is replacing years of experiments with days of in silico design and compute. 

Results in the Primate Brain

I show the results of one of our first campaigns using LEAP below. In this experiment, we deployed LEAP to design capsids that target the brain. This was a medium throughput validation experiment (10s of unique capsids measured together in each animal) that measures in vivo performance of capsids with high resolution (including cell-type specificity). Our measurement endpoints are:

  1. Packaging: whether the capsid successfully assembles and packages its genome.
  2. Transduction: whether we detect a higher frequency of transduction events, measured (with careful statistical and experimental controls) through mRNA readouts within the brain.
  3. De-targeting: whether we detect lower abundance of viral DNA in the liver, relevant for the safety profile. 

Due to the high per-design cost of these experimental rounds (e.g. each capsid is independently manufactured by our lab), our standard approach has been to test variants that have performed really well in at least one previous high-throughput (100K-1M designs) experiment. We also task our in-house protein experts to modify those variants in small but rational ways to improve their targeting. The most risky bet is to just design variants we never observed before with ML and test them in such high labor experiments directly. That’s exactly what we did. We allocated 19 LEAP-designed capsids, which we had never measured before, along with our standard approach (all designed on the AAV9 backbone). The capsids designed by LEAP were at least four and often 7-10 (non-contiguous) mutations away from any samples measured in the training set (i.e. virtually impossible to find either at random or rationally). The outcome was striking: not only were a majority of our LEAP designs functional, but about half of them improved on the best known design. The best of these designs improved as much over the previous round as can be expected from a successful high throughput or directed evolution round. It is also noteworthy that at design time, there is some uncertainty about which high-throughput measured variant will be the absolute best in follow up (the 1x benchmark), and therefore, LEAP designs are derived from a collection of diverse sequences, rather than many modifications to the benchmark we compare against. 

LEAP’s performance in the NHP (cyno) brain. LEAP-designed capsids (red) compared to top capsids drawn from a previous discovery round (grey) and rationally improved capsids (purple). (Upper) Empirical Cumulative distribution function (CDF) showing the proportion of capsids from each category with brain transduction rates greater than the value indicated on the horizontal axis. Green shaded region indicates brain transduction values higher than any capsid drawn from the previous discovery round. (Lower) Scatter plot showing brain transduction and liver detargeting rates for each successfully packaged capsid. 

To summarize:

  • 17 out of 19, of the LEAP-designed capsids packaged successfully. This is a very high packaging rate considering that most single mutations to capsids break packaging (the very first property needed for every downstream task).  
  • 9 out of 19 LEAP capsids outperformed any previously observed sequence in terms of brain transduction, and all expert designs. Most of these also show more liver de-targeting, potentially improving the safety of a future therapy. 
  • Overall the brain transduction improvement compared to the previous best design was 6 fold, while also achieving much better liver detargeting. Comparatively, typical directed evolution or standard high throughput screens for capsid optimization achieve 2-5 fold improvement in a single round, but with sequence budgets up to 10M samples.  

While I only discuss the result of one experiment, we’ve found LEAP to be consistently high-yield and high-performance across multiple trials. With about 50% of variants improving on the best sample in the training set, with as few as 5 designs, one can already expect (with high probability) to find at least 1 design that improves on anything that was observed before. 

Today operating LEAP still requires careful supervision by ML experts, and the number of candidates we can evaluate in silico by our large ensemble of models (some slow at inference) is in the millions. A major direction of improvement is to enable large scale in silico screens, which would be enabled by higher compute deployment and better inference. As we make progress there, we are also working to reduce the expertise needed to operate a LEAP-enabled campaign.  

Will LEAP work for your biological sequence? 

Fundamentally, most techniques we use within LEAP are not specific to AAVs. While some models take advantage of the large datasets we’ve collected within Dyno, many are trained on smaller or public datasets. Performance quickly improves with more data, but our approach is robust to low-data regimes (though it is not a zero-shot system, i.e it requires at least some data in the relevant domain). In short, we expect LEAP to generalize for other biological sequences. This is why we are excited about our ML technology and its future impact on gene therapy and other relevant therapeutic domains. 

In the coming year, we are exploring partnering with a limited number of companies to apply the techniques we’ve developed with LEAP to help solve in vivo design problems. If you are curious whether LEAP could be applicable to your own sequence design problems, I encourage you to get in touch (info@dynotx.com).

Special thanks to: David Brookes, Abhishaike Mahajan, Stephen Malina, Alice Tirard and Eric Kelsic for helpful comments on this post. 

Cloud native tools such as Kubernetes and Argo Workflows are improving productivity, accelerating innovation, and increasing operational efficiency. These tools reduce the burden of infrastructure management by enabling efficient and scalable management of complex computing tasks.

At Dyno Therapeutics, we have not only built a proprietary engine that leverages Kubernetes and Argo Workflows for our ML-guided design of AAV capsids, but we have also contributed to the open source community by developing and releasing Hera. Hera is a project that simplifies access to Argo Workflows, and we use it to design and execute complex workflows for vector design, biological data processing, and large scale data ingestion.

To learn more about how we’re leveraging the Cloud Native Computing Foundation ecosystem to scale our gene therapy research efforts at Dyno Therapeutics, check out our abstract and full talk below on “Scaling Gene Therapy Research with Argo Workflows and Hera” from ArgoCon 2023.

Abstract:
The use of cloud native tools such as Kubernetes and Argo Workflows is becoming increasingly popular across various domains, including gene therapy. These tools enable efficient and scalable management of complex computing tasks, allowing researchers and engineers to focus on their core product rather than infrastructure management. This has led to improved productivity, increased innovation, and increased operational efficiency. At Dyno Therapeutics, we use our proprietary engine called Dynet to generate and consume massive amounts of data to design and test vectors used for the delivery of gene therapy. Dyno leverages Kubernetes, Argo Workflows, and Hera to define, orchestrate, and execute complex workflows used for vector design, biological data processing, and large scale data ingestion. This talk will showcase novel applications of Argo Workflows and Kubernetes from a field as novel as gene therapy, and illustrate how tech products, such as Hera, from the Cloud Native Computing Foundation ecosystem help scale gene therapy research and engineering efforts.

Full talk: https://www.youtube.com/watch?v=h2TEw8kd1Ds&t=1s&ab_channel=CNCF%5BCloudNativeComputingFoundation%5D

At Dyno, we believe in collective innovation on our journey of empowering a diverse team of the best problem solvers to drive cutting edge science towards improving patient health. We’re proud to bring together minds from science, technology and business to propel advancements in gene therapy delivery. Our collaborative approach is not only limited to capsids but also within our science and engineering progress. Today, we are excited to share a library for easy workflow construction and submission that we have developed internally and found useful more broadly.

Introduction

Rapid experimentation, scalable machine learning, and computational biology workflows are core to Dyno’s capacity to engineer capsids. Dyno’s engineering team is responsible for building the infrastructure and tooling required to solve all three of these problems. A key component of this is being able to execute computational workflows on demand.

Executing such workflows, particularly larger ones with more steps, requires infrastructure that can flexibly scale in proportion to demand. The rest of this blog post discusses our journey towards building a framework, Hera, which enables fast, scalable remote execution of computational workflows.

From the beginning, we knew we wanted to use Kubernetes for applications that run in Docker containers. Containers are software environments that isolate the dependencies of Dyno’s applications into entities that are easily deployable to different kinds of environments. 

Kubernetes is a container orchestration platform that can scale containerized applications to thousands of nodes, potentially running in multiple availability zones, with a mix of compute resources (e.g NVIDIA K80, T4, or V100 GPUs) easily. While Kubernetes is a flexible way to run containers, it is challenging to write the necessary configurations to execute workflows on Kubernetes directly. Therefore, we sought a higher level framework to enable the execution of tasks in sequence, in parallel, and as dependency graphs.

The search for a workflow engine is never an easy task. We evaluated several workflow engines, such as Airflow. For our base framework, we ultimately landed on Argo Workflows, in large part because we wanted something that was Kubernetes native.

Argo Workflows is an open source framework for orchestrating multi-step workflows in parallel on top of Kubernetes. At Dyno, we decided to adopt Argo mainly for its Kubernetes-native characteristics and its container-focused design. This means that we can take advantage of all of the features of Kubernetes and its integrations into the Google Compute Engine ecosystem, providing us access to resources to run Dyno’s specialized machine learning and computational biology containers!

However, as we discuss more in the Journey to Hera section, neither raw Argo Workflows nor the existing frameworks on top of it really served our needs. Raw Argo was powerful but required specifying workflows in YAML rather than Python. And the existing libraries built on top of Argo were a combination of unintuitive, hard to debug, and broken Python abstractions. So, despite being wary of not-invented-here syndrome, we decided to build our own simpler and more accessible solution. 

Hera

The Argo was constructed by the shipwright Argus, and its crew were specially protected by the goddess Hera.

To facilitate adoption of Argo Workflows and transition more of Dyno’s scientific workflows to Kubernetes, we created Hera! Hera is a simple Argo Workflows Python SDK that aims to make workflow construction and submission easy and accessible to everyone! Hera abstracts away workflow setup details while still maintaining a consistent vocabulary with Argo Workflows. For example, Hera parallels Argo in its usage of terms such as “workflow” and “task” to stay consistent with terminology of the Argo UI. This enables users to continue to take advantage of Argo features such as the Argo UI for monitoring execution and logging.

Python functions are first class citizens in Hera – they are the atomic units (execution payload) that are submitted for remote execution. The framework makes it easy to wrap execution payloads into Argo Workflow tasks, set dependencies, resources, etc. Putting Python functions front-and-center allows developers to focus on their business logic to be executed remotely rather than workflow setup. The Github repository provides a concrete example illustrating Hera’s intuitive specification language relative to other frameworks. By comparison to a library such as Couler and the Argo Python DSL, Hera significantly reduces the amount of code you have to write while allowing the user to focus almost exclusively on specifying their logic in vanilla Python!

Hera in detail

First, we start with an example of a workflow that processes a diamond shaped graph, with parallel steps for each task:

from hera.task import Task
from hera.workflow import Workflow
from hera.workflow_service import WorkflowService

def say(msg: str):
    """This can be any Python code you want! 

    As long as your container satisfies the dependencies :)
    """
    print(msg)

# 'my-argo-server.com' is the domain of your Argo server
# and 'my-auth-token' is your Bearer authentication token!
ws = WorkflowService('my-argo-server.com', 'my-auth-token')
w = Workflow('parallel-diamonds', ws)

a = Task('A', say, [{'msg': 'This is task A!'}])
b = Task('B', say, [{'msg': 'This is task B!'}])
c = Task('C', say, [{'msg': 'This is task C!'}])
d = Task('D', say, [{'msg': 'This is task D!'}])

a.next(b).next(d)  # `a >> b >> d` does the same thing!
a.next(c).next(d)  # `a >> c >> d` also!

w.add_tasks(a, b, c, d)
w.submit()

Notice how much simpler it is to write this workflow in Hera relative to the Argo Python DSL or Couler.Users only need to specify the name of a task, the Python function they want to run, and the specific input the function should receive. Task chaining is also easy and intuitive but also supports arbitrarily complex dependencies specified as code. 

Internally, Hera uses the auto-generated OpenAPI Argo Python SDK. There are higher level, strongly typed objects written using Pydantic, such as Task and Workflow, which wrap Argo specific objects, but offer high level interfaces for easy specification. In addition, there are resource objects such as Volume, ExistingVolume, and Resources, that can be stitched together to construct the resources of a Task. For example, a Resources object can be constructed to dictate that a task needs to use 4 CPUs, 16Gi of RAM, 1 NVIDIA K80 GPU, and a volume of 50Gi. This can be then passed to a Task, then submitted as part of a workflow. By default, tasks are assigned 1 CPU and 4Gi of RAM.

from hera.resources import Resources
from hera.task import Task
from hera.volume import Volume
from hera.workflow import Workflow
from hera.workflow_service import WorkflowService

def do():
    import os

    # will print a list where /vol will have 50Gi!
    print(f'This is a task that requires a lot of storage! '
           'Available storage:\n{os.popen("df -h").read()}') 

# TODO: replace the domain and token with your own
ws = WorkflowService('my-argo-server.com', 'my-auth-token')
w = Workflow('volume-provision', ws)
d = Task('do', do, resources=Resources(volume=Volume(size='50Gi', mount_path='/vol')))

w.add_task(d)
w.submit()

Journey to Hera

Raw Argo Workflows

As mentioned already, we started our journey to Hera using Argo Workflows directly. Initially, the introduction of Argo Workflows to Dyno led to a massive increase in the scalability of computational biology and machine learning jobs! Whereas previously, we’d been stuck waiting for sequential processes to execute in Jupyter notebooks, or independent VMs, using Argo enabled us to regularly run jobs across thousands of containers on Kubernetes. These jobs range from processing raw, sequencing data, training hundreds of machine learning models in parallel on single or multiple GPUs, and executing regular cron jobs for infrastructure maintenance. 

However, Argo’s introduction to Dyno came with its own set of challenges. Argo Workflows are typically configured through YAML files, which have a particular syntax for structuring workflow templates, steps, graphs, parallelism, etc, which is just as cumbersome as managing Kubernetes. We mostly use Python at Dyno so we wanted a Python library for scheduling Argo workflows.

Argo Python DSL

Our first Argo workflow framework was a library called the Argo Python DSL, a now archived repository that is part of Argo Labs. While it allowed Dyno to extract immediate value out of Argo Workflows, it came with several challenges. First, we (the engineering team) struggled to motivate Aaviators outside of Engineering to use it… for good reasons! The DSL’s abstractions mostly mirrored Argo’s itself, requiring users to understand low level Argo Workflows concepts such as V1Container, or V1alpha1Parameter. Rather than understand these Argo-specific concepts, scientists want to focus on specifying workflow tasks and their dependencies, and the logic of a machine learning or computational biology experiment. The overall consequences of surfacing these Argo specific concepts results in a significant increase in workflow setup complexity, as illustrated by the following example:

class DagDiamond(Workflow):

    @task
    @parameter(name="message", value="A")
    def A(self, message: V1alpha1Parameter) -> V1alpha1Template:
        return self.echo(message=message)

    @task
    @parameter(name="message", value="B")
    @dependencies(["A"])
    def B(self, message: V1alpha1Parameter) -> V1alpha1Template:
        return self.echo(message=message)

The DSL also had other quirks that confused non-expert users. As an example, obtaining parameter values to a function used as a step in workflow uses core Argo Workflows syntax specified inside a string such as {{inputs.parameters.message}} when using V1alpha1ScriptTemplate, which allows users to submit code scripts. Not only is this challenging to remember but it also breaks type checking and any other error-preventing tooling.

For these and other reasons, we eventually started looking into alternatives to the DSL that would make developing workflows easier for scientific users.

Couler

The second library that Dyno tried for Argo workflow construction and submission was Couler, which is an active project that aims to provide a unified interface for Argo Workflows, Tekton Pipelines, and Apache Airflow (although they currently only support Argo Workflows). While useful, it still posed challenges because of the graph task dependency interface – it is hard to set up a chain of tasks to be executed as part of a graph. There are several Couler abstractions, such as dag, that impose requirements like the use of lambda functions in Python, which make it challenging to understand workflow setup and dependencies:

def diamond():
    couler.dag(
        [
         [lambda: job(name="A")],
         [lambda: job(name="A"), lambda: job(name="B")],  # A -> B
         [lambda: job(name="A"), lambda: job(name="C")],  # A -> C
         [lambda: job(name="B"), lambda: job(name="D")],  # B -> D
         [lambda: job(name="C"), lambda: job(name="D")],  # C -> D 
   ])

Notice how job A has to be specified multiple times in order to set up the dependency for job B and C. In addition, notice how one has to set up lists for the Couler dag in a specific structure, which can further confuse users. Another problem we have encountered with Couler is the absence of a workflow submitter object that would allow Dyno to submit workflows to the Argo server accessible at a specific internal domain. By comparison, Couler requires users to have a Kubernetes configuration, typically located at ~/.kube/config, so workflow submissions can occur through the Argo Kubernetes deployment (accessed via kubectl). From an authentication and user experience standpoint, this was a “no go” for Dyno from the start – we do not want our scientists to face the need to learn Kubernetes concepts, the use of kubectl, or gcloud for performing context switching between different Google Cloud projects. What we needed was an easy way to submit workflows, perform authentication with a server, and abstract away complexity associated with the use of Argo objects.

Conclusion

Hera makes Argo Workflow construction and submission much more accessible. Given the internal success we’ve had with Hera, we decided to open-source the library. In collaboration with the core maintainers of Argo Workflows (Alex Collins, Saravanan Balasubramanian, Yuan Tang, and Jesse Suen), Dyno released Hera under Argo Labs. You can also watch the initial release presentation on the Argo Workflows Community Meeting from October 2021.

If you’d like to work on infrastructure challenges such as supporting highly scalable machine learning and computational biology workflows on a variety of resources, support the open-source development of Hera and Argo Workflows, and contribute to modernizing the biotech application tech stack via the introduction of DevOps principles into scientific pursuits, check out our Careers page. We are always on the lookout for great engineers!

 

Dyno Therapeutics

Flaviu Vadan

Senior Software Engineer

Dyno Therapeutics

Stephen Malina

Machine Learning Scientist

Dyno empowers a diverse team of the best problem-solvers to drive cutting-edge science toward improving patient health. Our mission is to build the ideal capsid and solve the challenges of vivo gene delivery. We call ourselves AAViators and are engaged in Collective Innovation as we aim to maximize our impact on patients through the development of groundbreaking technologies.

AAViators assembled together in the summer of 2021

Our journey so far has been one of rigor and joy. We are so honored that the 2021 NEVY Awards Academy has nominated Dyno for the “Emerging Company of the Year” Award. Recently we’ve also been recognized with disruptive potential in the biopharma industry.

Our data driven approach is a key enabler and differentiator. With our CapsidMapTM Platform, Dyno is building the world’s most informative and diverse library of capsid measurements, and an algorithm that learns to efficiently explore sequence space in search of the best capsids for  therapeutic delivery. (Read more about how our platform is delivering the promise of gene therapy here)

Our other core differentiator is our focus on productive teamwork. Everytime a new AAViator accepts an offer to join the team we reiterate our foundation of trust, healthy conflict, commitment, accountability and attention to results. Onboarding at Dyno starts by reading Patrick Lencioni’s The Five Dysfunctions of a Team: A Leadership Fable, which enables discussions about these aspects of productive teamwork across the whole company. For example, this common language helps us explore the importance of being vulnerable and open about our weaknesses and mistakes, in order to build the trust required for teams to flourish. In recognizing the importance of trust we also realize how difficult it is to achieve.

We must therefore be open to give and receive feedback (both positive and negative) so that we improve trust at an individual, team, and organizational level. Similar best-practices apply for each concept all the way through attention to results (for more explanation, I highly recommend reading Lencioni’s book).

Scientific breakthroughs can emerge from the most uncanny places. We are structured in a way that makes it easier for anyone (regardless of hierarchy or discipline) at any level to propose a change to our platform, any of which could eventually lead to a scientific breakthrough.

Today we are over 60 full time AAViators. We continue to climb to meet the needs of every potential partner, and in the process hope to justify the accolades we have received. As we reach 150 strong we look forward to welcoming those with fresh ideas interested in catalyzing our Collective Innovation.

If you are looking for a hard working team-focused company where you can connect your creative ideas to real-world impact contact me directly or take a look at our Careers page!

What are AAVs? and what makes them good candidates for gene therapy?

By: Christopher Reardon, Stephen Malina, and Eryney Marrogi

Introduction

In the first part of our three part series introducing AAV as a gene therapy vector, we talked about basic AAV vector biology. In this post, we’re going to take a step back to answer the question of “Why AAV?” and look at some opportunities in the AAV engineering space.

Why AAV?

Viral vectors are one of the three main classes of gene therapy delivery vehicles. The other two are lipid nanoparticles (LNPs), now famous for their roles in mRNA vaccines, and plasmid electroporation. Relative to these other options, viral vectors have the advantage of higher delivery efficiency, better targeting, and a long clinical history starting with their use in vaccines and more recently, in gene therapy.

Three main viral vectors exist within the viral vector gene therapy landscape [1]: lentiviruses, adenoviruses, and AAVs. With respect to their use in gene therapy, we can assess viral vectors in terms of their safety, production robustness and scalability, ability to target specific tissues (tropism), and packaging capacity. Each makes various trade-offs with respect to these characteristics. 

Lentiviruses are single-stranded RNA viruses originally derived from natural retroviruses. When used for gene therapy, lentiviruses have the advantage of a large package capacity (~9 kilobases) but present challenges related to genome integration [2]. Adenoviruses are double-stranded DNA viruses best known for causing the common cold. The newest generation of adenoviruses can package large transgenes of length up to ~36kb and combine high transduction efficiency with scalable viral production. However, they tend to trigger strong immune responses in humans, which limits their applicability for gene therapy applications, particularly in immunocompromised patients [1].

Compared to adenovirus and lentivirus, AAV offers a low immune profile, relative safety, long-lasting expression in non-dividing cells, and a long scientific and clinical history. Two AAV-based gene therapies — Luxturna® and Zolgensma® — have already been approved by the FDA, and many more [4] are in various stages of clinical trials. However, AAV can only package ~4.7 kilobases of linear single-stranded DNA, meaning packaging full genes encoding large proteins (such as dystrophin) currently requires gene size optimization and/or spreading a gene across multiple vectors.

Opportunities in AAV engineering

While it’s impressive how well natural variants of AAV work for gene therapy, they lack certain features that would make them even more useful. As mentioned above, they can only package ~4.7kb, so fitting transgenes larger than 4.7kb into a vector requires clever, but error-prone, strategies. In addition, although different natural variants do weakly target certain tissue types, this targeting has not been optimized for precision. Finally, although the AAV capsid does not trigger a strong immune response, natural immunity to it can develop, which makes repeated dosing difficult by increasing the risk of potential side effects and reducing dosing efficiency.

To overcome these challenges, researchers have focused on improving the AAV capsid protein and designing promoters that modulate expression of the delivered gene. The first of these, capsid design, involves modifying the capsid gene to change its function. Existing work in this category has focused on:

  • Increasing packaging capacity beyond ~4.7kb,
  • Targeting (and de-targeting) specific tissues such as the brain [3] or liver, and
  • Improving the ability of the capsid to evade the immune system, even after repeated administration.

However, any approach to solving these problems must grapple with narrowing the space of possible capsid variants. Our goal in design is to find a set of promising variants that’s small enough to test in vitro/vivo (scale) while maximizing the fraction of capsids in this set which are viable (efficiency). However, achieving this poses a substantial challenge given that even a single VP1 monomer has 20^735 possible variants. Traditional approaches typically optimize for one of scale vs. efficiency.

 

For instance, directed evolution and unbiased random mutagenesis approaches test as many as 10^10 variants, which, while large, amounts to a drop in the bucket of an incredibly vast space that is many orders of magnitude larger. Additionally, in the case of the “massive” random mutagenesis libraries, the majority of tested variants end up nonviable, amounting to an extremely inefficient search. On the other hand, rational design approaches make targeted mutations to small sub-regions of the capsid based on hard-won biological knowledge, increasing efficiency at the cost of reducing scale. Rational design approaches excel at discovering small local perturbations to the capsid that increase fitness but lack the scale and throughput to identify mutations that take advantage of unexpected, potentially nonlinear effects of mutations to disjointed capsid sub-regions. At Dyno, we hope to find the right balance of efficiency and scale by combining machine-guided design with high-throughput screening. Although our high-throughput screens typically test between 10^4 and 10^5 variants per library, as we’ve shown in previous work, our machine learning models increase the efficiency of these libraries by vastly increasing the proportion of viable variants in the library [5, 6]. By directly modeling the relationship between sequence and function, we can further guide our search towards capsids that not only produce but also have optimized desired properties.

Conclusion

Taking a step back, it’s important to put AAV engineering in context with respect to the larger project of improving our ability to control cellular outcomes and behavior. Allowing ourselves to speculate a little, we can think of AAV as an impressive but imperfect starting point for a future generic in vivo gene delivery system. Looked at through this lens, any deficiency of AAV that prevents it from safely delivering genes (repeatedly) in all patients with arbitrary tissue and cell type specificity in a low dosing regime presents an opportunity for AAV engineering.

As we said in our prior post, this wouldn’t be a company blog post if we didn’t mention that we’re currently hiring for a range of technical and non-technical roles. If you’re excited about working with us, please apply and for the authors’ sake, mention that our blog post played a role in getting you excited about Dyno!

Special thanks to: Adam Poulin-Kerstein, Alex Brown, Cherry Gao, Eric Kelsic, Heikki Turunen, Jeff Gerold, and Sam Sinai for helpful comments. 

[1]: Bulcha, J.T., Wang, Y., Ma, H. et al. Viral vector platforms within the gene therapy landscape. Sig Transduct Target Ther 6, 53 (2021). https://doi.org/10.1038/s41392-021-00487-6

[2]: Milone, M.C., O’Doherty, U. Clinical use of lentiviral vectors. Leukemia 32, 1529–1541 (2018). https://doi.org/10.1038/s41375-018-0106-0 

[3]: Ravindra Kumar, S., Miles, T.F., Chen, X. et al. Multiplexed Cre-dependent selection yields systemic AAVs for targeting distinct brain cell types. Nat Methods 17, 541–550 (2020). https://doi.org/10.1038/s41592-020-0799-7

[4]: Kuzmin, Dmitry A., et al. “The clinical landscape for AAV gene therapies.” Nature reviews. Drug Discovery (2021).

[5]: Sinai, Sam, et al. “Generative AAV capsid diversification by latent interpolation.” bioRxiv (2021).

[6]: Bryant, D.H., Bashir, A., Sinai, S. et al. Deep diversification of an AAV capsid protein by machine learning. Nat Biotechnol 39, 691–696 (2021). https://doi.org/10.1038/s41587-020-00793-4

What are AAVs? and what makes them good candidates for gene therapy?

By: Stephen Malina, Eryney Marrogi, and Christopher Reardon.

In two previous posts, we introduced gene therapy, a method for curing genetic diseases by providing healthy copies of defective genes, and Adeno-associated virus (AAV) capsids, the gene therapy delivery system Dyno focuses on. In those posts, we also discussed how natural variants of AAV did not evolve for the specialized functions to which we now seek to apply them, which is why Dyno is applying machine learning and high-throughput techniques to better engineer AAV.

This post (the first in a three part series on AAV) provides an overview of AAV as a gene therapy vector, focusing primarily on the genetic and protein structure of an AAV capsid. Before diving in, we want to note upfront that this series is not intended to be a comprehensive overview of AAV research. Instead, it’s better thought of as an overview of selected AAV knowledge we at Dyno have learned and found helpful while working in this space. While all readers are welcome, in writing this post, we’ve tried to optimize for readers who may have a minimal biology background but no specific knowledge of virus biology and/or AAV but are eager to learn. In addition to summarizing AAV biology and engineering basics, we’ve given our perspective on some promising directions in AAV engineering and a sneak peek at Dyno’s AAV engineering workflow.

With that, let’s dive in.

AAV Vector Biology

Genome

Viruses evolved to infect cells and leverage their cellular machinery to make copies of themselves, and AAV is no exception. In order to accomplish this AAV first infects a cell, allowing its genetic material to get shuttled into the nucleus for transcription. Second, cellular machinery translates AAV’s viral genome into proteins which assemble into a viable viral shell with a copy of the viral genome packaged inside it. What distinguishes AAV from many other viruses is 1) its small genome size and 2) its inability to replicate itself in the absence of a helper virus, hence why the third step, “replicate exponentially and destroy the cell” is missing.

As a single-stranded DNA virus, AAV’s genome comprises 4.7 kb of linear DNA. This short DNA amazingly codes for at least 9 distinctly expressed proteins, which span a total unrolled length of over 12 kb. To achieve this, the genome also takes advantage of staggered and alternative start sites and splicing to create multiple proteins. In practice, this means that the same DNA nucleotides often simultaneously encode different segments of multiple proteins in the AAV genome. AAV’s small size is a double-edged sword from our perspective as AAV engineers. On one hand, its simplicity makes understanding its function more tractable. On the other hand, its heavy use of DNA-overlap-tricks makes engineering more difficult because a single nucleotide mutation can impact multiple proteins.

As mentioned, AAV lacks the full set of genes required to make copies of itself in a cell (it is replication incompetent). In addition to relying on the genetics and cellular machinery of its host organism, it needs the genes of the Adenovirus (hence the name adeno-associated virus) in order to replicate. This replication incompetence makes AAV useful for gene therapy. Gene therapy vectors are intended to deliver therapeutic genes without making copies of themselves. Whereas other vectors require careful engineering to handicap their ability to replicate in vivo, with AAV we get this property automatically.

Amongst the genes that are involved in AAV’s functions, the two most critical are rep and cap. The rep gene encodes four proteins and plays a key role in enabling AAV replication. For the purpose of gene therapy, rep is only used during the AAV production stage. The cap gene, short for capsid, encodes three vector proteins VP1, VP2 and VP3, which combine to form the shell of the AAV capsid (see next section for additional details). In addition to its coding regions, the ends of the AAV ssDNA sequence are flanked by two inverted terminal repeats (ITRs) made up by repeated sequences that self-complement, thus allowing the structure to fold and provide stability to each end of the genome as a defense against degradation. The ITRs also play a key role in integration and rescue to and from the host cells, loading of the genome into the AAV capsid particle, and even act as promoters for second strand synthesis and protein expression (source).

When converting a natural AAV into a gene therapy vector, its genome is dissected, manipulated, and reassembled to make room to include therapeutic genes (transgenes). Research into the basic biology of AAV has enabled transformation of the natural AAV genome into a safe and effective gene therapy vector.

 

Structure

 

Model of AAV with all monomers assembled (left) and a single composite of the three AAV capsid monomers (VP1/2/3) (right), with coloration by secondary structure.

AAV’s prominence in gene therapy can also be understood by examining its structure (visually depicted above). Out of its two genes, the cap gene has a larger influence on the structure-function relationship. To start, cap expresses structural proteins VP1, VP2 and VP3, which interact to form the viral capsid from sixty copies of the VP monomers. Beyond contributing to the iconic icosahedral form, cap plays a critical role in determining tropism, the virus’ ability to infect a particular cell or organ type. Due to the capsid protein’s importance in capsid attachment to cellular receptors involved in tissue tropism, manipulation of the cap gene appears to be the best route to the selective tropism needed for gene therapy applications. 

 

Function

AAV engineering focuses heavily on addressing the interaction of AAV with host cells in an attempt to manipulate viral entry, i.e. transduction. While each AAV serotype might interact with unique receptor proteins, all serotypes (natural variants) follow the same general mechanism (depicted diagrammatically below) for entering and transducing a cell. For example, one of the most studied serotypes, AAV2, identifies viable target cells through cellular receptor heparan sulfate proteoglycan (HSPG). Once the capsid finds a valid binding site, AAV2 enters the cell via cell-mediated endocytosis through clathrin-coated pits, and eventually escapes its endosome somewhere within the cytoplasm to attempt entry into the nucleus. Finally, the viral vector enters the nucleus through the nuclear pore complex, where transcription takes place [1]. 

After establishing itself within the nucleus, AAV relies on host-cell mechanisms for both genome replication and, in the case of engineered viruses, transgene expression (conversion into proteins). When designing capsids for therapeutic use, engineering the capsid can help the AAV find a target host cell, and engineering other components of the AAV can further improve the therapeutic. For example, AAV constructs can be selectively expressed within specific cell types by careful promoter expression, allowing increased control of engineered viruses in a therapeutic context. The right promoter, within the context of AAV, can dictate when and where an AAV construct starts expressing, ultimately contributing to making the therapy safe and effective. Promoter selection, coupled with capsid engineering are our main tools when making AAV a powerful tool for gene therapy.

Source: Engineering adeno-associated virus vectors for gene therapy [2]

Conclusion

During its 50 year history, the field of AAV biology has built up an impressive edifice of knowledge about AAV’s genome, structure, function, and engineering. We’ve only scratched the surface of each of these topics in this post, but we have hopefully provided enough of an overview to help you understand the potential of engineered AAV as a gene therapy vector and some of the challenges the field needs to overcome in order to realize this potential. 

Finally, this wouldn’t be a company blog post if we didn’t mention that we’re currently hiring for a range of technical and non-technical roles. If you’re excited about working with us, please apply and for the authors’ sake, mention that our blog post played a role in getting you excited about Dyno!

Special thanks to: Adam Poulin-Kerstein, Alex Brown, Cherry Gao, Eric Kelsic, Heikki Turunen, Jeff Gerold, and Sam Sinai for helpful comments. 

Bibliography

[1]: Martini, S. V., P. R. M. Rocco, and M. M. Morales. “Adeno-associated virus for cystic fibrosis gene therapy.” Brazilian Journal of Medical and Biological Research 44 (2011): 1097-1104.

[2]: Li, C., Samulski, R.J. Engineering adeno-associated virus vectors for gene therapy. Nat Rev Genet 21, 255–272 (2020). https://doi.org/10.1038/s41576-019-0205-4

 

At Dyno, we are empowering a diverse team of the best problem-solvers to drive cutting-edge science toward improving patient health.

This is easy to say but impossible to do without innovating on how biotech companies approach work. At Dyno, we believe that by nurturing a healthy work experience that promotes courage, openness, respect, focus, and commitment our employees, or AAViators, will improve patient health in a state of joy. Today, as a 50-employee company, we work in a Scrum-inspired Agile framework, which we will scale as we grow to 150 employees in the next few years.

We seek to improve patient health by optimizing gene vectors (e.g., capsids) so our partners can deliver safer and more efficacious gene therapies. We do this through a closed-loop process which combines machine learning with in vivo & in vitro experimentation: With each iteration we get closer to capsids that move the needle on what is possible in gene therapy.

While it would be nice to simply dream success into reality, in practice, it requires effective teamwork. The typical biotech approach is to use traditional project management, a sequential process of initiation, planning, execution, monitoring & controlling, and finally, project closure. This approach requires having done similar work in the past so one can anticipate most of what is needed to be successful, as well as having stable technology one can rely on. While this works really well in manufacturing and construction, it often falls short in high-tech industries.

Starting in the 1980s, the pitfalls of traditional project management began to be examined as described in The New New Product Development Game. What followed was the slow adoption of Agile work practices across a growing set of industries involved in complex work. Complex means the work you are undertaking likely hasn’t been done before, everything you need to do to be successful isn’t known, and you still need to develop technology which doesn’t currently exist. Agile, practiced through a light-weight framework called Scrum, overcomes complexity by guiding teams to approach work with an empirical, incremental, and lean mindset. As teams routinely apply what they learn, and prioritize what to do next, they are able to minimize wasted effort, quickly take advantage of new learnings, and complete projects which have never been done before.

Since our work at Dyno is highly complex, traditional project management, although more familiar, just doesn’t fit the nature of our work or our AAViators expectations. By the time a project manager finishes drafting one detailed plan, the team will have learned something which significantly alters the said plan. What’s worse is the psychological toll continuous change has on a team when using traditional project management—which aims to minimize the occurrence of change! The result is that change begins to feel like failure, so much that it’s tempting to try to prevent change, ultimately leading to poorer outcomes.

Next, I’ll highlight our approach with its key roles and responsibilities.

We plan, execute, and learn in two-week increments called Sprints, which begin with planning … But before we focus on planning the next two weeks, we take a step back and refine our roadmap together. This roadmap acts to visualize how we think we can achieve our company goals over the next 12-18 months. We find it important to understand where our efforts today will take us and to have confidence we are balancing our desire for success with a sustainable experience. After we commit to our revised roadmap, every team, across both R&D and corporate, does Sprint Planning together. We are open about the goals we are committing ourselves to accomplish over the next two weeks, while accounting for known constraints like taking time to vacation or develop ourselves. We make these commitments extremely clear by documenting  “Definitions of Done”, or brief statements of what it means for each deliverable to be officially completed.

We execute … Every workday during the Sprint, each team meets for 15 minutes to see how the Sprint is progressing and adapts it as needed; at the end of each day, all team leads meet for another 15 minutes and discuss any crossteam impacts which need to be raised and resolved so that Sprint goals can be achieved. Individuals and teams have significant flexibility over the course of the Sprint to accomplish what they committed to during planning.

We learn … At the end of the Sprint, every team conducts a Sprint Review to determine what was done, what was not done and why, and to decide what they should do next. This reinforces accountability in a healthy way. The Sprint Review is followed by a Sprint Retrospective where each team inspects how they are working as a team and commits to doing something to improve teamwork in the next Sprint. A Review and Retrospective is also conducted at the Company level to ensure broad transparency and commitment to evolving how Scrum is used at Dyno.

Of course, our approach would never work without the focused involvement of three key roles: The Team who collectively determines the work that needs to be done, the Product Owner who leads the team and prioritizes work, and the Agile Captain who facilitates the framework while promoting teamwork. The Agile Captain is a volunteer from the team, some stick with the role, other teams go through a role rotation, either way the role is vital for team success in our Agile framework.

Our approach requires even the most involved AAViators to spend less than 10% of their time within the framework events, which is an important metric we respect as we look for ways to improve how we work together. We constantly ask ourselves: How can we make our framework better while keeping the time required under 10%? Additionally, when teams have unsuccessful Sprints it’s psychologically nice for them to know that they can start afresh in the next Sprint, now with the added benefit of their learnings. Finally, each Sprint presents an opportunity to pause, take stock of the current situation, and refocus the teams’ and company’s efforts on the most important priorities. We get 24 opportunities a year to do this; that far exceeds what you could expect within a traditional project management framework.  

Since our first Sprint, our framework has continually evolved to achieve our desired outcomes. We do not expect the way we work today to enable success in a company with 150 employees; however, we have absolute confidence that we can evolve our framework and maintain a healthy work experience which promotes courage, openness, respect, focus, and commitment.

We’re excited about the progress we’ve made thus far, and enthusiastically welcome others to join us on our awesome journey! Come learn more at the upcoming 8 June 2021 webinar, Biotech and Scrum: Rethinking How Biotech Innovates in the 21st Century.

Please join us at the 2021 Dyno Therapeutics ASGCT After-Party, May 13th beginning at 7p!

The event is hosted in an ohyay virtual space that can be accessed using this link. You’ll be prompted to create a free account before popping into our curated venue. Come say hi, play a round of chess, setup in any number of cozy spots for a longer conversation, or help solve a jigsaw puzzle – we would love to see you!

Be sure to continue following our story on twitter @Dyno_Tx and on our blog.

Dyno’s Data Science team hosted its first class of interns amidst the socially distant heat of summer 2020. Five outstanding individuals joined us to help tackle challenges in computational biology, machine learning, software engineering, and, of course, remote company building.  

These individuals left lasting contributions to our science and our culture. This post is an opportunity to hear their stories.

Our 2020 class of summer interns consisted of Stephen Malina (Graduate student at Columbia University), Tiwalayo Aina (rising senior at MIT), Stewart Slocum (rising senior at Johns Hopkins University), Jeff Chen (rising junior at MIT), and Maxwell Kazman (rising junior at Georgia Tech). Since interning at Dyno, Stephen decided to join us full time, Tiwa and Stewy have decided to go to graduate school, and Jeff and Max are continuing their undergraduate studies. 

What brought you to Dyno? How did it compare to what you expected?

Stewy: Going into my senior year and looking towards the future, I wanted to spend my summer with a smaller company in a machine learning role. Dyno was exciting to me for two reasons: because the problem space is interesting and impactful, and because I liked the company’s attitude. This summer, I worked mainly on sequence proposal strategies for the AAV cap gene, but also got to do some supervised model development and build ML infrastructure. Dyno’s work is on the frontiers of machine learning and biology, which gave me the chance to think deeply about how to do things that haven’t been done before. I had an absolute blast! I learned a ton and had a great time collaborating with others, especially my mentor and the other interns. The environment was very open, and I had a lot of fun working with such capable people. 

Jeff: I’ve been interested in the application of CS in the biotech space, much more seriously the last year or so and so when I learned about the novelty of Dyno’s data and experimental process, I was excited to have a chance to be a part of Dyno, especially in its high growth cycle. 

Max: Before I started my internship, I was expecting an environment similar to an academic research lab, as many AAViators came from this background. However, the culture at Dyno reminds me much more of a lean tech startup and everyone is excited to play a part in bringing our technology to the market. Much of the work is very academically rigorous and exploratory, but I also got to work on engineering and infrastructure problems, something I wasn’t expecting as much. Working at Dyno has been super exciting and has exposed me to a variety of interesting projects and challenges.

Tiwa: Coming from a background in quantitative finance, I had some experience building models, and I knew I wanted to apply that experience to another field. After meeting some data scientists from Dyno, I was introduced to Dyno’s mission—in particular, I learned about the novelty and difficulty of the problems Dyno aims to solve, and I was immediately interested. The experience was just as challenging as I anticipated and enhanced by the opportunity to meet with mentors and collaborate with other interns.

 

What did you enjoy most about being here at Dyno? 

Stephen: Besides the people, one thing I really like about Dyno is that we’re working on a problem that I feel is both important and part of a larger societal project in which I’m personally very invested. Regarding the former, I’m, I think rationally, optimistic about the prospect for engineering better AAVs to help prevent and cure genetic disease. When I let myself indulge my fantastical side, I imagine a future in which gene therapy is so safe & cheap that it’s ubiquitous and widely applied towards improving human health. Regarding the societal project I mentioned, I believe that understanding and learning to engineer biology is one of the most exciting long-term goals of the 21st century and view Dyno as one of the companies building a toolkit for doing this more efficiently and safely. 

Stewy: I really enjoyed being part of a company like Dyno that pushes boundaries and tries to define what is possible instead of just accepting it. Dyno’s mission is incredibly ambitious – it’s risky and seriously hard. But it feels great to be part of something you believe in, and I am confident that ML-guided design will be key to the future of gene therapy.

Jeff: I definitely enjoyed the science and the people the most. Giving a data scientist direction and rich data, is like setting him or her loose in a playground. It was very exciting for me to be able to make my own novel discoveries here and integrate my work into the Dyno pipeline. For the people, it was great to be surrounded by so many smart people that felt driven in what they do. That environment is refreshing. 

Tiwa: The thing I enjoy most here at Dyno is the academic rigor that many events have. I find it really cool that the whole company will listen to a person’s presentation on research they did. There are also Journal Clubs where we review academic literature in machine learning. We even had a lecture series where wet lab biologists would teach the data scientists about interesting topics in biology!

 

What was the most difficult aspect of working at Dyno?

Stewy: There was a lot of paper reading and wikipedia scouring over the first couple of weeks as I tried to orient myself in the context of ML-guided sequence design. Afterwards, the challenge was to do things that weren’t in papers. Dyno’s business is to try to solve unsolved problems, which is usually pretty difficult, but I always felt supported enough to get help when I needed it. I also appreciate my mentor’s reminders to not bite off more than I could chew, and to focus on what I could.

Tiwa: The open-ended nature of the work is the most difficult aspect of working at Dyno! The problems AAViators work on haven’t been solved before, so you have to think really hard about what will/won’t work. Thinking creatively about how to approach problems with no answer key is really hard (though the work wouldn’t be nearly as fun if it weren’t so challenging!)

Jeff: The most difficult aspect of working at Dyno was the lack of social interaction due to the COVID restrictions that forced many people in the company to work from home. There are a lot of cool people at Dyno that I wish I could talk more in a more casual context. I missed having the coffee chat breaks with employees and relaxed lunches. These are interactions that, from past experience, not only stimulate friendship and a sense of belonging, but also creativity and work efficiency. So it was difficult being surrounded by all of these driven people and not being able to get as close as I’d like.

Stephen: At Dyno, the ground truth about how good our AAVs are is quite expensive to determine and can take on the order of months. Instead, we have to figure out ways to validate our methods using proxies that hopefully reflect the actual results of experiments.

Max: There were many steep learning curves I had to overcome when I started at Dyno, which made it challenging for me to progress with my project at the start. Catching up with the literature and learning about the tools and techniques Dyno uses were certainly not trivial. However, I enjoyed the learning process and I had a lot of help to overcome these learning curves. Reaching out for help is something I could have done more often, and would have lowered some of these barriers.

 

How did you grow professionally while you were here?

Max: This internship has been an incredibly valuable experience for me professionally. Being a small company, everyone at Dyno (including interns!) has the opportunity to observe or participate in many aspects of the business. This includes listening in on partnership presentations or interview presentations for potential hires, as well as discussions regarding the company culture and business goals. I learned a lot about how the company as a whole operates. On the technical side of things, I learned a lot about the infrastructure and design of our pipelines, as well as explored new ideas that could be useful long after I leave. The level of involvement I have had in bigger design questions was very unique and extremely valuable.

Stephen: I’ve worked as a traditional software engineer, but this was my first time working full-time on machine learning in an industry setting, especially a biotech one. As a result, I in general learned a lot about how to build, validate, & share ML models in an industry setting. In particular, I spent a lot of time working on, thinking about, and learning from other Dyno team members validating our models in lieu of being able to get ground truth validation of their new predictions. I have a longer term goal of becoming the type of person who can reliably and productively think independently, and I feel that the Dyno DS/ML team members and environment moved me closer to that goal by giving me the time and support to seek solutions on my own and explore outside my comfort zone.

Stewy: I came into this summer having taken courses and done some projects involving machine learning, but I hadn’t worked on any real-world ML systems. Having the opportunity to do that this summer was hugely valuable. Knowing how to deal with data imbalances and spot silent training bugs, seeing through the hype and getting a feel for what works and what doesn’t – you don’t get these kinds of things from reading papers, only from working on real systems. I still have a lot to learn, but after this summer, I feel much more confident in my ability as a machine learning practitioner.

Jeff: I grew substantially in my ability to do research and make experimental decisions. Because I was given a large degree of independence, I was able to really stretch my creativity and navigate my own decision tree. Learning how to fail and fail quickly, to collaborate toward directions outside your expertise, and to brainstorm experiment design, these are all skills that I felt like I lacked in my undergraduate research experience. 

Tiwa: In addition to improving my machine learning skills, I grew professionally by chatting with people all over the company, cultivating my network of both data scientists and biologists. Nobody minded taking the time to speak to me about their role, background, and goals. Additionally, I attended internal career panels which gave me a better sense of the important nuances underpinning pursuing careers in ML/AI.

 

What’s your advice for people who are interested in interning at Dyno? 

Max: Working at Dyno can be a super fulfilling experience. Come ready to learn a lot, and also struggle a lot. Being independent is important to facilitate the learning process, but that doesn’t mean you shouldn’t ask for help often. Be ready to tackle hard problems where no one knows what the answer should be, but also have a lot of fun doing it. Make sure to reach out for help, ask questions, read, collaborate often, and have fun!

Tiwa: One piece of advice I would give is to always be ready to learn. The importance of this is clear when you’re stuck and you’re looking for feedback/advice, but it really applies in general when interning here. I think that a hunger to learn more is a big part of the culture at Dyno; the experience will be even more valuable if you’re always willing to ask questions and expose yourself to learning opportunities.

Stewy: Dyno is a great place to be. Every person I met really cared about me, and I will always appreciate that. But every person I met also really cared about the work they were doing, and I think that it’s important to share that excitement. If you are considering working at Dyno for the summer, I suggest you read up and learn about what the company is doing, get excited about it, and then hit me up!

If reading this post makes you excited about interning at a startup that aims to help millions of people live better through enabling gene therapy using tools at the cutting edge of machine learning and synthetic biology, consider applying through our careers page.

The capsid of Adeno-associated Virus (AAV) is a naturally occurring, replication-deficient, virus that is widely considered the frontrunner for solving the delivery problem in gene therapy. These viruses are known to be harmless to humans, and are relatively simple to manipulate. One well-known drawback of natural capsids however, which are currently used for delivery, is that many patients with pre-existing immunity to the virus (due to previous natural exposure) may be ineligible for life-changing treatment.  

In previous work (published in Science), we validated the use of computational models in conjunction with high-throughput experiments to design better liver-targeting variants of naturally occurring AAV capsids. In that work we were primarily focused on single edits to the capsid, and hypothesis that the effect of a combination of single mutations, at least when the number of total edits is limited, can be approximated by the sum of the effect of each mutation. Through this approach we validated that a model-guided method can lead to more efficient design of better capsids for more effective liver targeting. 

The paradigm of measuring the effects of mutations independently and combining the best ones no longer works as we attempt to modify the capsid beyond a handful of mutations. Making capsids with many changes relative to natural variants increases our chances of being able to treat the thousands of potential recipients of gene therapy by evading pre-existing immunity. To achieve the ability to introduce a large number of changes to the capsid sequence without breaking its essential abilities, a wholly new approach was needed, which our latest study in Nature Biotechnology aims to address. Our goal was to design highly diverse AAV capsids, for which we used much more advanced machine learning models and trained on more complex datasets The work was a result of years of collaboration between teams at Dyno, Harvard’s Wyss Institute, and Google Research.

To test these methods, we focused on a representative region of the capsid (positions 560-588, seen in pink in the fully assembled virus, the hexamer assembly, and the individual subunit in the figure above) that had both surface-exposed and buried residues (Generally speaking, surface-exposed residues are known to be more mutation-tolerant) This region is also well known for the presence of immunogenic structures, as well as its role for tissue targeting. Our aim was to introduce as many mutations as we could in this 28 amino-region, including substitutions and insertions, the latter of which is a  less common type of mutation in nature. When we started this study, it was unknown if machine learning models would be reliable for predicting the effects of mutations for variants beyond 5-10 edits to the original sequence. We expected this was possible, however, based on analyzing the diversity of sequences that have been isolated from natural sources. In this region, the average difference between two AAV serotypes is 12 amino-acids (often with few or no insertions). Nonetheless, we pushed the models to propose sequences with up to 29 substitutions and insertions. 

Using the naturally observed level of diversity as a benchmark, we set our goal to generate diversity beyond that observed in nature, while maintaining the capsid’s viability. After screening billions of potential sequences in-silico using machine learning models, we settled on ~200,000 designed variants which we experimentally tested for their viability. Of those, approximately 110,000 produced viable viruses (many of our attempts were deep into the sequence space, where it is very hard to propose viable viruses). About 57,000 variants were farther than 12 mutations away from the AAV2 serotype. By generating more than two thousand sequences that were 25 or more mutations away, we decisively demonstrated the power of machine learning models to design diverse synthetic capsid sequences. 

In this study, largely conducted before Dyno’s official launch, we report one of the largest AI-driven protein design assays published to date and validated the utility of these techniques for capsid engineering. The success of this approach bolstered our confidence in  Dyno’s foundational science.  Building upon this foundation, we have established infrastructure and machine learning techniques at Dyno to expand and optimize the AAV repertoire for multiple traits (including in-vivo targeting of challenging tissues), multiple serotypes, and at a larger scale. This study is just the beginning of our endeavour.  

This work was a multi-year collaboration between Dyno co-founders Eric Kelsic, Sam Sinai, and George Church, colleagues at Harvard’s Wyss Institute including Nina Jain and Pierce Ogden and members of the Google Accelerated Science team including Patrick Riley, co-first authors Drew Bryant, Ali Bashir, and co-corresponding author Lucy Colwell. 

Back to Top