Selected Research

MBE: model-based enrichment estimation and prediction for differential sequencing data

Akosua Busia and Jennifer Listgarten
Genome Biology 2023

We introduce model-based enrichment (MBE) to overcome key shortcomings of current approaches to differential analysis using high-throughput sequencing data. MBE is based on sound theoretical principles, is easy to implement, and can trivially make use of advances in modern-day machine learning classification architectures or related innovations.

Read More

Optimal trade-off control in machine learning-based library design, with application to adeno-associated virus for gene therapy

Danqing Zhu, David H. Brookes, Akosua Busia, Ana Carneiro, Clara Fannjiang, Galina Popova, David Shin, Kevin. C. Donohue, Edward F. Chang, Tomasz J. Nowakowski, Jennifer Listgarten, David. V. Schaffer
Science Advances 2024

We develop and showcase a machine learning-based method for systematically designing more effective adeno-associated virus capsid libraries---ones that have broadly good packaging capabilities while being as diverse as possible. Such carefully-designed libraries stand to significantly increase the chance of success in engineering any property of interest.

Read More

A view of Estimation of Distribution Algorithms through the lens of Expectation-Maximization

David H. Brookes, Akosua Busia, Clara Fannjiang, Kevin Murphy, Jennifer Listgarten
Genetic and Evolutionary Computation Conference (GECCO) 2020

We show that a large class of Estimation of Distribution Algorithms, including, but not limited to, Covariance Matrix Adaption, can be written as a Monte Carlo Expectation-Maximization algorithm, and as exact EM in the limit of infinite samples. Because EM sits on a rigorous statistical foundation and has been thoroughly analyzed, this connection provides a new coherent framework with which to reason about EDAs.

Read More

A deep learning approach to pattern recognition for short DNA sequences

Akosua Busia, George E. Dahl, Clara Fannjiang, David H. Alexander, Elizabeth Dorfman, Ryan Poplin, Cory Y. McLean, Pi-Chuan Chang, Mark DePristo
bioRxiv 2019

Inferring properties of biological sequences--such as determining the species-of-origin of a DNA sequence or the function of an amino-acid sequence--is a core task in many bioinformatics applications. These tasks are often solved using string-matching to map query sequences to labeled database sequences or via Hidden Markov Model-like pattern matching. In the current work we describe and assess an deep learning approach which trains a deep neural network (DNN) to predict database-derived labels directly from query sequences. We demonstrate this DNN performs at state-of-the-art or above levels on a difficult, practically important problem: predicting species-of-origin from short reads of 16S ribosomal DNA.

Read More

Next-Step Conditioned Deep Convolutional Neural Networks Improve Protein Secondary Structure Prediction

Akosua Busia, Navdeep Jaitly
Joint 25th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB) and 16th European Conference on Computational Biology (ECCB) 2017, Poster

Recently developed deep learning techniques have significantly improved the accuracy of various speech and image recognition systems. We adapt some of these techniques to create a chained convolutional architecture with next-step conditioning for improving performance on protein sequence prediction problems.

See More

Around the Web

GoogleAI Logo

Google Brain Residency Program - 7 months in and looking ahead

Stanford Firestone Thesis Award

Stanford seniors’ thesis projects garner university medals

Dean Richard Saller, Percy Chirinos, Akosua Busia, and Russ Altman at Sterling Award Ceremony

Congratulations to Sterling Award winners