Clinical research conducted at biotech and pharma companies involves drug discovery and development which is complex, expensive, and time-consuming, with hurdles of low efficacy, off-target delivery, and a high attrition rate. Artificial intelligence (AI) advancements in big data and computer-aided drug design (CADD) integrating AI algorithms can address traditional drug design and development challenges and lay the foundation for future rational drug design and discovery. From decentralization to biosimulation, contract research organizations (CROs) and medical research institutions leverage AI to support, enhance, and transform clinical research. So, how can AI help select drug targets?
AI is a field combining computer science with expansive datasets, which allows for machine-enabled problem-solving. AI comprises machine learning (ML), while deep learning (DL) is a subset of ML. One of the early efforts of applying DL in the drug discovery process was the Merck 2012 Quantitative-structure activity relationship (QSAR) ML challenge. The findings of this program revealed that DL models exhibited significantly better predictivity than traditional ML approaches for 15 absorption, distribution, metabolism, and excretion (ADME) and toxicity data sets for drug candidates developed at Merck. With further advancements and the development of neural network approaches, DL has since been widely applied to drug discovery. An example is a DL model developed to predict drug-target interactions based on 15,524 drug-target pairs obtained from the DrugBank database. Below, we explain how AI helps in the drug discovery process.
How AI Helps in Drug Discovery
Artificial neural networks and DL algorithms have helped modernize drug discovery. In drug discovery, AI can
- be used effectively in drug design, chemical synthesis, drug screening, polypharmacology, and drug repurposing
- recognize hit and lead compounds
- expedite drug target validation
- optimize drug structure design.
Using ML and DL algorithms, virtual screening (VS) of compounds from chemical libraries (more than 106 million compounds) is simpler and more time-effective. Separately, we explored how GPT-4, a generative AI model, can potentially impact drug discovery.
How AI Helps in the Drug Target Identification
Target identification is the first step in drug discovery. Targets e.g., genes involved in disease pathophysiology, are identified through gene expression, genome-wide association studies (GWAS), identification of risk genes, and data mining of published literature.
Gene expression is widely used for drug target identification to understand disease mechanisms.
- Microarray and RNA-seq technologies have generated a large amount of gene expression data. Researchers can determine the target genes responsible for different conditions by analyzing gene expression signatures.
- Example: using ML and gene expression data, researchers discovered novel biomarkers and potential drug targets for rare soft tissue sarcoma.
- Large repositories with gene expression data include the NCBI Gene Expression Omnibus (GEO), The Cancer Genome Atlas (TCGA), and Arrayexpress.
Genes linked with disease-associated genetic loci.
- GWAS are used to determine the interrelation of genomic variants with particular complex disorders and disease-associated genetic loci, whereby the genes linked with these loci are potential therapeutic targets.
- Example: Based on ML analysis, the GWAS catalog, gene expression, epigenomics, and methylation data helped determine target disease-associated genetic loci.
- Repositories with GWAS data are GWAS central and the NHGRI-EBI GWAS Catalog.
- Specific genes with mutations that can cause different threatening diseases are also promising therapeutic targets.
- Risk genes can be identified by analyzing genome and exome sequencing data.
- Example: Using big data and AI, researchers developed a supervised ML-based tool that identified driver genes related to cancer.
- Public repositories with sequencing data include the Sequence Read archive, The National Cancer Institute Genomic Data Commons (NCIGDC), and TCGA.
Published literature can be analyzed for target identification. Data mining of PubMed, the major repository of published biomedical literature, can help identify targets for different disorders.
How AI Helps in Drug Design and Development
AI has also been applied to different areas of drug design and development, from peptide synthesis to molecule design, VS to molecular docking, QSAR to drug repositioning, and molecular pathway identification to polypharmacology.
- Suitable drugs and/or drug-like molecules that interact with the target(s) identified in the first step and elicit the desired response are identified.
- ML and DL algorithms screen extensive chemical databases like PubChem, a freely accessible chemical database; open access databases ChEMBL and DrugBank; the library of integrated network-based cellular signature (LINCS) L1000 repository with information on gene expression signatures of human cell lines; and the protein data bank (PDB), a freely accessible online repository with data on three-dimensional structures of proteins, DNA, RNA.
- Example: Xu et al. (2020) combined ML and molecular docking to identify inhibitors of COVID 3CL proteinase. The authors obtained the crystal structure of the COVID 3CL proteinase from PDB.
Application of AI in Cancer Drug Discovery
AI is used to effectively identify new anticancer targets and discover novel drugs from biology networks. Due to its complexity, researchers face difficulties in comprehensively understanding the pathogenesis of cancer. As a result, most targeted drugs are based on experimentally validated hypotheses of the possible mechanisms of carcinogenesis and could have severe undesired side effects.
How AI Helps
AI algorithms can tackle the complexity of cancer due to interactions between genes and their products in biological network structures, helping researchers better understand carcinogenesis and identify new anticancer targets. You et al. (2022) found that ML-based biology analysis can efficiently handle high throughput, heterogeneous, and complex molecular data and mine the feature or relationships in biological networks. ML-based biology analysis algorithms have two significant advantages: feature learning and detection (using sophisticated neural network architectures to link up features of biological networks and characterize their relationships) and the ability to integrate large and diverse data effectively.
Novel anticancer target identification
ML-based biology network analysis applications are applied to interrogate large, complex data and identify reliable potential novel targets.
CROs Supporting Drug Discovery
For more on AI in drug discovery, check out this article on GPT-4 by Vial, a CRO powered by technology. Vial is a next-generation CRO that delivers faster, more efficient clinical trials that enable scientists to develop effective therapies to improve patients’ quality of life, slow disease progression, or even cure the disease. Contact a team member to learn how Vial can help with your next clinical trial.