Chemogenomics

Chemogenomics, or Chemical Genomics, is the systematic screening of targeted chemical libraries of small molecules against individual drug target families (e.g., GPCRs, nuclear receptors, kinases, proteases, etc.) with the ultimate goal of identification of novel drugs and drug targets. Typically some members of a target library have been well characterized where both the function has been determined and compounds that modulate the function of those targets (ligands in the case of receptors, inhibitors of enzymes, or blockers of ion channels) have been identified. Other members of the target family may have unknown function with no known ligands and hence are classified as orphan receptors. By identifying screening hits that modulate the activity of the less well characterized members of the target family, the function of these novel targets can be elucidated. Furthermore the hits for these targets can be used as a starting point for drug discovery. The completion of the human genome project has provided an abundance of potential targets for therapeutic intervention. Chemogenomics strives to study the intersection of all possible drugs on all of these potential targets.

A common method to construct a targeted chemical library is to include known ligands of at least one and preferably several members of the target family. Since a portion of ligands that were designed and synthesized to bind to one family member will also bind to additional family members, the compounds contained in a targeted chemical library should collectively bind to a high percentage of the target family.

Strategy
Chemogenomics integrates target and drug discovery by using active compounds, which function as ligands, as probes to characterize proteome functions. The interaction between a small compound and a protein induces a phenotype. Once the phenotype is characterized, we could associate a protein to a molecular event. Compared with genetics, chemogenomics techniques are able to modify the function of a protein rather than the gene. Also, chemogenomics is able to observe the interaction as well as reversibility in real-time. For example, the modification of a phenotype can be observed only after addition of a specific compound and can be interrupted after its withdrawal from the medium.

Currently, there are two experimental chemogenomic approaches: forward (classical) chemogenomics and reverse chemogenomics. Forward chemogenomics attempt to identify drug targets by searching for molecules which give a certain phenotype on cells or animals, while reverse chemogenomics aim to validate phenotypes by searching for molecules that interact specifically with a given protein. Both of these approaches require a suitable collection of compounds and an appropriate model system for screening the compounds and looking for the parallel identification of biological targets and biologically active compounds. The biologically active compounds that are discovered through forward or reverse chemogenomics approaches are known as modulators because they bind to and modulate specific molecular targets, thus they could be used as ‘targeted therapeutics’.

Forward Chemogenomics
In forward chemogenomics, which is also known as classical chemogenomics, a particular phenotype is studied and small compound interacting with this function are identified. The molecular basis of this desired phenotype is unknown. Once the modulators have been identified, they will be used as tools to look for the protein responsible for the phenotype. For example, a loss-of-function phenotype could be an arrest of tumor growth. Once compounds that lead to a target phenotype have been identified, identifying the gene and protein targets should be the next step. The detailed steps of forward chemogenomics can be described as follows:


 * 1) The cells or organisms are dispensed and cultured in multiwell plates. Solutions of single ligands from stock plates are added to different wells.
 * 2) After incubation, an aliquot is transferred from the donor plate to a new recipient plate and then the ligand-target binding assay is carried out. The effects of a compound can be identified by several methods: functional assays (to measure cellular activities, such as cell division); marker assay (to identify specific molecular events that act as surrogate transcriptional and post-transcriptional markers for phenotypic changes of interest, such as reporter-gene assays) and imaging-based assay (to capture further morphological changes).
 * 3) The end point is usually spectroscopic readout. Final data calculation is proceeded through in silico quality control and structure-activity relationship (SAR) analysis.
 * 4) Active compounds that achieve the desired phenotypic change are then selected to identify their molecular targets through several ways, such as phage display, affinity chromatography and microarrays.
 * 5) In profiling experiments, protein or RNA identified are analyzed in reference to mock treatment for global molecular drug signature assessment.

The main challenge of forward chemogenomics strategy lies in designing phenotypic assays that lead immediately from screening to target identification.

Reverse Chemogenomics
In reverse chemogenomics, small compounds that perturb the function of an enzyme in the context of an in vitro enzymatic test will be identified. Once the modulators have been identified, the phenotype induced by the molecule is analyzed in a test on cells or on whole organisms. This method will identify or confirm the role of the enzyme in the biological response. The detailed steps of reverse chemogenomics can be described as follows:


 * 1) Reverse chemogenomics is normally carried out using a cell-free binding system. The target protein or the libraries are immobilized on assay plates or dispensed in multiwell plates, and then the study compounds or target proteins are added in solution.
 * 2) Several technologies are used to detect ligand-target binding: fluorescence-based detection, ligand-induced conformational target stabilization and mass spectrometry.
 * 3) Readout data are then analyzed through in silico quality control and structure-activity relationship (SAR) analysis to look for active compounds.

Reverse chemogenomics used to be virtually identical to the target-based approaches that have been applied in drug discovery and molecular pharmacology over the past decade. This strategy is now enhanced by parallel screening and by the ability to perform lead optimization on many targets that belong to one target family.

Recent Advances
Due to prohibitive time and cost limitations of experimental approaches, two computational approaches are frequently used in chemogenomics: ligand-based virtual screening and docking. These approaches take advantage of the myriad of molecular databases currently available and continuously growing. These approaches provide “hits” that are more likely to produce valuable results in subsequent experimental testing.

Ligand-based virtual screening compares candidate ligands to the known ligands of a target protein to predict binding. This approach follows the chemogenomics principle that similar targets will bind similar ligands, but it is therefore limited when the number of known ligands is small. Docking is a molecular modeling approach that predicts the preferred orientation of a ligand to a target by dynamic simulation. It requires the 3-D structure of the target protein to be known and is therefore limited when the structure is not known.

Cao et al. (2013) expanded on the ligand-based virtual screening approach by including drug-target binding affinities in a Random forest model. The inhibition constant for a drug (Ki) can quantitatively describe the degree to which the drug binds to the target protein. The model they proposed has the following advantages: The currently limitation is that without 3-D information, hits identified that would contradict the structural similarity- functionally similarity principle would not be eliminated.
 * 1) Directly encodes the drug-target pairs within a pharmaceutical space
 * 2) Uses the Ki to overcome the false assumption that unknown interaction are non-interactions
 * 3) Is not limited by 3D structure
 * 4) Can find multi-target drugs by recognizing the groups of proteins targeted by a particular ligand.

Furthermore, a large shift in approach is occurring away from the “one target-one” drug paradigm to a “multiple targets” approach. Knowledge from systems biology and system chemistry is creating a need to refine drug discovery strategies to include an awareness of multiple interactions from both ligand and protein perspectives. Chemical biology studies in model organisms including yeast, chickens, and mice are revealing the complexity of the underlying networks. Drug side effects are thought to be at least partially due to unintentional ligand binding, supporting the many-to-many nature of ligand-target interactions. By incorporating more polypharmacology and chemical biology data, prediction programs are expected to become more accurate over time. As such, a goal of chemogenomics is to identify all possible ligands for all possible targets.

Brown & Okuno (2012) have proposed the drug design concept called “Chemical Genomics-Based Drug Design” (CGBDD) for incorporating system-level multi-interaction networks connecting chemistry and biology. G protein-coupled receptors (GPCRs) are a focus as they are involved many high-level physiological functions and are an example of a protein family that can bind a large number of ligands. The GPCR-ligand database (GLIDA) helps catalog these interactions to demonstrate polypharmacology and allow identification of “promiscuous” compounds. Currently there are 39,000 interactions listed in to GLIDA database. The overall goal is to create a technique to leverage an interaction database for construction of models that have sufficient predictive performance for more than just GPCRs.

Possible benefits from the systems approach to chemogenomics is two weed out the “promiscuous” ligands and locate “master” ligands that can provide desirable clinical effects on several targets without operating on any unwanted targets and creating undesired side effects. Drug repurposing is where existing approved drugs are evaluated for new clinical uses. Similarly, drug rescue evaluates ligands that failed in clinical trials to meet efficacy expectations for one target against new clinical uses. By taking advantage of these predictive tools, costs of clinical trials themselves are reduced by eliminating poor candidates early and preventing unpredictable side effects.

Applications
Both experimental and computational chemogenomic approaches are useful in identifying potential additional targets for both existing and “virtual” compounds.

Understanding Ligand Binding
Chemogenomics has recently been used to increase understanding of ligand binding to histamine receptor subfamily of the GPCR protein family. Thanks to the recent availability of the proteins’ crystal structures, Kooistra et al. (2013) were able to compare existing ligand affinity data, existing receptor mutagenesis studies, and amino acid sequence analyses to the structural analyses of GPCR-ligand interactions. Both molecular and structural determinants of ligand affinity and selectivity were identified for all histamine receptors.

Determining Mode of Action
An interesting application of chemogenomics is its recent application to identify mode of action (MOA) for traditional Chinese medicine (TCM) and Ayurveda. Compounds contained in traditional medicines are usually more soluble than synthetic compounds, have “privileged structures” (chemical structures that are more frequently found to bind in different living organisms), and have more comprehensively known safety and tolerance factors. Therefore, this makes them especially attractive as a resource for lead structures in when developing new molecular entities. Databases containing chemical structures of compounds used in alternative medicine along with their phenotypic effects, in silico analysis may be of use to assist in determining MOA. Mohd Fauzi et al. (2013) demonstrated this by predicting ligand targets that were relevant to known phenotypes for traditional medicines. In a case study for TCM, the therapeutic class of ‘toning and replenishing medicine” was evaluated. Therapeutic actions (or phenotypes) for that class include anti-inflammatory, antioxidant, neuroprotective, hypoglycemic activity, immunomodulatory, antimetastatic, and hypotensive. Sodium-glucose transport proteins and PTP1B (an insulin signaling regulator) were identified as targets which link to the hypoglycemic phenotype suggested. The case study for Ayurveda involved anti-cancer formulations. In this case, the target prediction program enriched for targets directly connected to cancer progression such as steroid-5-alpha-reductase and synergistic targets like the efflux pump P-gp. These target-phenotype links can help identify novel MOAs.

Beyond TCM and Ayurveda, chemogenomics can be applied early in drug discovery to determine a compound’s mechanism of action and take advantage of genomic biomarkers of toxicity and efficacy for application to Phase I and II clinical trials.

Identifying New Therapeutic Agents-Targets
The recent study by Bhattacharjee et al. (2013) demonstrates how chemogenomics profiling can be used to identify totally new therapeutic targets, in this case for new antibacterial agents. The study capitalized on the availability of an existing ligand library for an enzyme called murD that is used in the peptidoglycan synthesis pathway. Relying on the chemogenomics similarity principle, the researchers mapped the murD ligand library to other members of the mur ligase family (murC, murE, murF, murA, and murG) to identify new targets for the known ligands. Ligands identified would be expected to be broad-spectrum Gram-negative inhibitors in experimental assays since peptidoglycan synthesis is exclusive to bacteria. Structural and molecular docking studies revealed candidate ligands for murC and murE ligases.

Identifying Missing Genes in Biological Pathway
Thirty years after the structure of diphthamide was determined, Su et al. (2012) used chemogenomics to discover the enzyme responsible for its final synthesis. Dipthamide is a posttranslationally modified histidine residue found on the translation elongation factor 2 (eEF-2). The first two steps of the biosynthesis pathway leading to dipthine have been known, but the enzyme responsible for the amidation of dipthine to diphthamide remained a mystery. The researchers capitalized on Saccharomyces cerevisiae cofitness data. Cofitness data is data representing the similarity of growth fitness under various conditions between any two different deletion strains. Under the assumption that strains lacking the diphthamide synthetase gene should have high cofitness with strain lacking other diphthamide biosynthesis genes, they identified ylr143w as the strain with the highest cofitness to the all other strains lacking known diphthamide biosynthesis genes. Subsequent experimental assays confirmed that YLR143W was required for diphthamide synthesis and was the missing diphthamide ynthetase. Additional information from these authors on yeast cofitness data can be found here: http://chemogenomics.stanford.edu/supplements/cofitness/.

Tools and Resources
ChemMapper is a bioinformatics tool created to address molecular similarity searches. It is a web-based tool for exploring target pharmacology and chemical relationships against any given small molecules via a fast 3-D similarity method in which the 3D similarity calculation is driven by the hybrid information of molecular shape and chemotype features.

The NIH Chemical Genomics Center was founded in 2008. Its goal is to “translate the discoveries of the Human Genome Project in biological and disease insight and ultimately new therapeutics for human disease through small molecule assay development, high-throughput screening, cheminformatics and chemistry.” Pre-clinical research tools are available on their website including informatics tools and small molecule, compound, and probe databases like PubChem, a freely accessible database of small organic molecules and their activities against biological assays.