Proteinogenic amino acid

Proteinogenic amino acids are amino acids that are precursors to proteins, and are incorporated into proteins cotranslationally - that is, during translation. There are 23 proteinogenic amino acids, but only 21 are encoded by the nuclear genes of eukaryotes. Of the 23, selenocysteine and pyrrolysine are incorporated into proteins by distinct post-translational biosynthetic mechanisms, and N-formylmethionine is often the initial amino acid of proteins in bacteria, mitochondria, and chloroplasts, but is often removed post-translationally. The other 20 are directly encoded by the universal genetic code. Humans can synthesize 11 of these 20 from each other or from other molecules of intermediary metabolism. The other 9 must be consumed (usually as their protein derivatives) in the diet and so are thus called essential amino acids. The essential amino acids are histidine, isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine.

The word proteinogenic means "protein building". Proteinogenic amino acids can be condensed into a polypeptide (the subunit of a protein) through a process called translation (the second stage of protein biosynthesis, part of the overall process of gene expression).

In contrast, non-proteinogenic amino acids are either not incorporated in proteins (like GABA, L -DOPA, or triiodothyronine), or are not produced directly and in isolation by standard cellular machinery (like hydroxyproline and selenomethionine). The latter often results from posttranslational modification of proteins.

The proteinogenic amino acids have been found to be related to the set of amino acids that can be recognized by ribozyme auto-aminoacylation systems. Thus, non-proteinogenic amino acids would have been excluded by the contingent evolutionary success of nucleotide-based life forms. Other reasons have been offered to explain why certain specific non-proteinogenic amino acids are not generally incorporated into proteins: for example, ornithine and homoserine cyclize against the peptide backbone and fragment the protein with relatively short half-lives, while others are toxic because they can be mistakenly incorporated into proteins, such as the arginine analog canavanine.

Non-proteinogenic amino acids are incorporated in nonribosomal peptides, which are not produced by the ribosome during translation.

Structures
The following illustrates the structures and abbreviations of the 21 amino acids that are directly encoded for protein synthesis by the genetic code of eukaryotes. The structures given below are standard chemical structures, not the typical zwitterion forms that exist in aqueous solutions.

Non-specific abbreviations
Sometimes the specific identity of an amino acid cannot be determined unambiguously. Certain protein sequencing techniques do not distinguish among certain pairs. Thus, the following codes are used: In addition, the symbol X is used to indicate an amino acid that is completely unidentified.
 * Asx (B) is "asparagine or aspartic acid"
 * Glx (Z) is "glutamic acid or glutamine"
 * Xle (J) is "leucine or isoleucine"

Chemical properties
Following is a table listing the one-letter symbols, the three-letter symbols, and the chemical properties of the side-chains of the standard amino acids. The masses listed are based on weighted averages of the elemental isotopes at their natural abundances. Note that forming a peptide bond results in elimination of a molecule of water, so the mass of an amino acid unit within a protein chain is reduced by 18.01524 Da.

General chemical properties

Side chain properties
Note: The pKa values of amino acids are typically slightly different when the amino acid is inside a protein. Protein pKa calculations are sometimes used to calculate the change in the pKa value of an amino acid in this situation.

Gene expression and biochemistry
* UAG is normally the amber stop codon, but encodes pyrrolysine if a PYLIS element is present. ** UGA is normally the opal (or umber) stop codon, but encodes selenocysteine if a SECIS element is present.

† The stop codon is not an amino acid, but is included for completeness.

†† UAG and UGA do not always act as stop codons (see above).

‡ An essential amino acid cannot be synthesized in humans and must, therefore, be supplied in the diet. Conditionally essential amino acids are not normally required in the diet, but must be supplied exogenously to specific populations that do not synthesize it in adequate amounts.

Mass spectrometry
In mass spectrometry of peptides and proteins, it is useful to know the masses of the residues. The mass of the peptide or protein is the sum of the residue masses plus the mass of water.

§ Monoisotopic mass

Stoichiometry and metabolic cost in cell
Following table lists the abundance of amino acids in E.coli cell and the metabolic cost (ATP) for synthesis the amino acids. Negative numbers indicate the metabolic processes are energy favorable and do not cost net ATP of the cell. Note that the abundance of amino acids include amino acids in free-form and in polymerization form (proteins).

Life based on alternative proteinogenic sets
The proteinogenic used by known life on Earth appears to abitrarily selected by evolution, according to current knowledge, from many hundreds of possible alpha-type amino acids. Xenobiology studies hypothetical life forms that could be constructed using alternative sets using expanded genetic codes. Miller type experiences on artificial abiogenesis show that alpha-type amino acids predominate in water-based 'primordial soups' but beta-type amino acids dominate when there is less water. Both alpha and beta based sets could form the basis for alternative protein constructions and life forms.