Overview of Rational Drug Design

© Copyright 2017 Herbert J. Bernstein

Drug Discovery versus Drug Design

For most of history, new medications have been introduced to medical practice by hit or miss experimentation with new uses for existing drugs or by experimentation with minerals, plants, animal products and other substances found in nature, or one had to synthesize new substances. One had to extract or synthesize thousands of potential leads to slowly and patiently try in vitro (in the lab) or in vivo (in animals and human subjects) to discover useful activity, and hopefully not injure or kill the subjects. This is the process of drug discovery. See https://en.wikipedia.org/wiki/Drug_discovery.

Ligands and Targets

The action of most drugs involves at least two molecules, the drug itself, and the "target", a molecule in a biological pathway the normal action of which the drug either inhibits or promotes. In many cases the drug is a ligand that binds in an active site on the target. Usually the drug is a small molecule and the target is a macromolecule, most commonly a G-coupled-protein receptor (GCPR) or a kinase. This is a rendering of a portion of the surface of Protein Data Bank entry 5F19, "The Crystal Structure of Aspirin Acetylated Human Cyclooxygenase-2" by Lucido, M.J., Orlando, B.J., Vecchio, A.J., Malkowski, M.G. (2016) Biochemistry 55: 1226-1238.

As the three-dimensional structures of an increasing number of small molecules and macromolecules became know, it became increasingly feasible to move from screening in vitro or in vivo to a least preliminary screening in silico to at least reject the least promising leads. The combination of highly automated laboratory techniques, large databases of annotated chemical and biological data, and computational chemistry has resulted in high throughput screening and a move from the ability to design drugs that may exist only as a computer model.

Representing Molecules in Computers

In order to be able to work with molecules in computers, we need some representation of those molecules. Internally computers work only with numbers. We can put letters and words and other innformation into computers by assigning a distinct number to each letter, word or item of information, so a good starting point is to represent molecules as strings of letters and digits, i.e. to describe each molecule in words. We may lose some detail, especially in terms of fine-grained dynamics and steric constraints, in doing so, but is is a good start. For example, consider aspirin , as shown by pubchem.ncbi.nlm.nih.gov. This simple, but important, molecule consists of 9 carbon atoms, 8 hydrogen atoms and 4 oxygen atoms, so we can describe it by chemical formula as C9H8O4, but that tells us almost nothing about how where the various atoms are placed relative to one another. Perhaps we should use a formula that puts related atoms near one another in the formula, as in this NIOSH formula for aspirin, CH3COOC6H4COOH. That helps, put will a little more effort, we can use the SMILES representation, CC(=O)OC1=CC=CC=C1C(=O)O, or the InChi representation, InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12), to give us enough information for an accurate representation as a 2-dimensional chemical diagram. . If we need more detail we can record the actual 3-dimensional coordinates to the atoms, using CIF, as in COD entry 1515581, which is obtained from the Crystallographic Open Database http://www.crystallography.net/cod/1515581.html. There are many more representations of chemical information. A good starting point is to consider the Crystallographic Information Framework (CIF), the Chemical Markup Language (CML), SMILES and InChi.

For CIF, look at the documentation and software at http://www.iucr.org/resources/cif/ and the original paper on CIF at http://scripts.iucr.org/cgi-bin/paper?es0164

For CML look at http://www.xml-cml.org/ and the paper at http://pubs.rsc.org/en/content/articlehtml/2001/nj/b008780g

For SMILES look at the wikipedia article at https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system and the paper at http://pubs.acs.org/doi/pdf/10.1021/ci00057a005. There is a good tutorial on the rules of smiles from http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

For InCHI see the wikipedia article at https://en.wikipedia.org/wiki/International_Chemical_Identifier and the paper at http://jcheminf.springeropen.com/articles/10.1186/s13321-015-0068-4 and the YouTube video at https://www.youtube.com/watch?v=rAnJ5toz26c

Differences in Representing Small Molecule Ligands and Macromolecule Targets

Much of rational drug design software is based on knowledge of three-dimensional atomic models of molecules. If you have accurate observations of element types and relative positions of atoms, you can make good estimates of bonding patterns and charges. With some techniques, you can even observe charge densities in addition to atomic positions. However, it become increasingly difficult to estimate the positions of individual atoms, the larger a molecule gets. Therefore you may not have accurate observations of individual atoms positions for the larger macromolecules. Instead what you are more likely to be able to observe are aggregations of atoms into groups or residues.

Therefore the primary representations of ligands are likely to be in terms of individual atoms (see the periodic table [Mendeleev, Dmitrii. "The relation between the properties and atomic weights of the elements" Journal of the Russian Chemical Society 1 (1869): 60-77] in https://en.wikipedia.org/wiki/Periodic_table), while the primary representations of macromolecules are likely to be in terms of sequences of residues, either amino acids (see https://en.wikipedia.org/wiki/Amino_acid) or nuclei acids (see https://en.wikipedia.org/wiki/Nucleic_acid). This difference makes the representation of ligands simpler than the representation of macromolecules.

The Periodic Table

Interactive dynamic period table from http://www.ptable.com

The periodic table helps us to understand which elements are likely to bond to which other elements, forming compounds that will interact in predictable ways, an essential part of rational drug design.

Table of Amino Acids

This table of amino acids (from the RasMol Manual http://www.openrasmol.org/doc/ helps us to understand how the most commin residues in proteins will interact.

Residues:alaargasnaspcysgluglnglyhisileleulysmetpheproserthrtrptyrval
ARNDCEQGHILKMFPSTWYV
Predefined Set
ARNDCEQGHILKMFPSTWYV
acidic * *
acyclic *** **** * ** ** ** *
aliphatic * * ** *
aromatic * * **
basic * * *
buried * * ** ** * *
charged * * * * *
cyclic * ** **
hydrophobic* * ** *** ** *
large * ** *** *** **
medium * ** * * *
negative * *
neutral * * * * **** *** **** *
polar ** **** * * **
positive * * *
small * * *
surface ** * ** ** * * ** *


Rational Drug Design Processes

For some desirable characteristics of a drug, see admet.html