What is ESM Metagenomic Atlas, and How is it Changing Drug Discovery?

What is ESM Metagenomic Atlas, and How is it Changing Drug Discovery?
What is ESM Metagenomic Atlas, and How is it Changing Drug Discovery?

Part of what drives the drug development process towards new discoveries and treatment advances is the ability of pharmaceutical sponsors and contract research organizations (CROs) to embrace change. The past decade of drug research has seen significant leaps in machine learning tools to streamline the discovery process. Because the potential for a sponsor’s or CRO’s success starts with their originally chosen drug candidate, researchers are increasingly looking into leveraging artificial intelligence (AI) technology to improve efficiency. In particular, AI-based protein prediction databases like the ESM Metagenomic Atlas are becoming more widely accessible, but has it brought any changes to the drug discovery process? Read on to determine what ESM Metagenomic Atlas means for the future of drug development!

Why the Interest in Protein Structure Prediction?

Many new protein structure prediction models, including ESM Metagenomic Atlas, have captured the interest of leading pharma companies, biotech startups, and biology researchers since the release of AlphaFold by DeepMind (learn more about AlphaFold here). For over 50 years, scientists have only had expensive, complex, time-consuming technology to help them determine the 3D structures of target proteins. Considering how much we still have yet to understand about the role of proteins in supporting life, gleaning functional information from a protein’s shape with AI modeling is a far cheaper and more accessible preliminary alternative to costly traditional methods like x-ray crystallography and nuclear magnetic resonance.

What is the ESM Metagenomic Atlas?

The ESM (Evolutionary Scale Modelling) Metagenomic Atlas was recently developed by the California-based tech company, Meta, following the release of a similar database from DeepMind. However, whereas AlphaFold2 contains nearly 200 million protein structures, Meta’s ESM Atlas models representations of over 617 million proteins specifically found in microorganisms originating from environmental sources such as soil, seawater, and the human gut. Their approach, named ESMFold, uses the ESM-2 large language model (LLM), which was trained on known sequences of proteins using all 20 amino acids. After training, the algorithm was able to generate its entire database of protein structure predictions over a period of only 2 weeks. As of March 2023, the original ESM Atlas has been updated to its second version.

Has ESM Changed Drug Discovery?

The Meta AI team does reiterate some of the novel steps forward that ESM Metagenomic Atlas has brought to the world of 3D structure prediction for drug discovery. Specifically, ESM takes the concept of AlphaFold and centers it around metagenomics, a newer field that applies gene sequencing to identify previously unknown proteins from bacteria, viruses, and other organisms which have never been studied. Furthermore, ESMFold offers even faster prediction modeling, estimated to be about 60 times faster at predicting shorter sequences than AlphaFold.

Not only does this enable a larger-scale database of structure representations, but ESM Metagenomic Atlas has the potential to provide a glimpse of unknown proteins which may later expand our understanding of evolutionary history, untreatable diseases, and environmental science. Its underlying LLM prediction model also enables the algorithm to quickly assess how mutations in the primary sequence will affect a protein’s tertiary or quaternary structure, which is currently beyond AlphaFold2’s scope.

Although, in the bigger picture, the significance of Meta’s ESM Metagenomic Atlas bears a lot of similarity to AlphaFold’s impact on drug discovery. That is, it does introduce another open-access option for scientists to visualize previously unknown protein structures, but it’s too early for this technology to be considered a game-changer for pharmaceutical companies. As a representation tool, ESM Metagenomic Atlas is certainly a welcome addition for drug researchers to gain more information about the role of proteins in disease and biology, but there have yet to be any new drugs which reached approval after being discovered using these prediction databases.

Nonetheless, revealing the physical properties of a target protein provides essential information about its role in disease pathophysiology. Therefore, AI-powered protein prediction technology offers a faster, cheaper, and simpler method for characterizing protein targets and driving the future development of novel therapies.

The Future of ESM Metagenomic Atlas in Drug Design

The ESM Metagenomic Atlas is quickly being perceived as an exciting new tool for exploring the “dark matter” of biology as it investigates proteins beyond those we know of from plant and animal life. Despite the large-scale capacity this database has demonstrated, especially in such a short period compared with other prediction modeling software, ESM Metagenomic Atlas is still not as accurate as AlphaFold or other more traditional protein visualization techniques. However, it has opened the world of drug discovery to an even larger expanse of potential protein targets which could drive innovation in medicine, science, and research in the future.

Vial, The CRO Powered by Technology

As machine learning tools continue to develop, preclinical and potentially clinical phases of drug development will evolve simultaneously. Vial is a full-service CRO that recognizes the role of technology in the future of drug development and is paving the way for modernized clinical research through digital innovation. Trusted by leading sponsors, our specialized teams deliver shorter study timelines, quality affordable services, and a clinical trial experience that puts you first. Contact a team member today to discover how we can help!

Contact Us

By submitting, you are agreeing to our terms and privacy policy
This field is for validation purposes and should be left unchanged.