Proteins: Post translational modification

Proteins: Post translational modification

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am a physicist and trying to understand some protein chemistry for a small project. Basically, amino acids combine to form proteins and after forming the primary structure, some chemical modification may occur, such as phosphorylation or glycosylation (post translational modification). In case of the enzyme carbonic anhydrase a zinc ion is added post-translationally as a cofactor. (The amino acid sequence can be seen here (scroll down to 'sequence' title).

  1. Is the attachment of zinc regarded as a type of post-translational modification?

  2. When carbonic anhydrase is denatured, is the zinc ion released in the medium?

  3. Will the amino acid sequence (primary structure) be changed after denaturation? If yes, are amino acids removed after denaturation, if yes, which?

  4. I guess Zinc will be removed after denaturation. Is the remaining structure the primary protein structure?

1) Is the attachment of zinc regarded as a type of post-translational modification?

It is not really considered a post-translational modification because the zinc atom is not covalently bound to the protein. Binding to zinc is adsorption.

2) When carbonic anhydrase is denatured, is the zinc ion released in the medium?

Yes, but it depends on the extent of denaturation too. Metal ions can be removed using chelating ligands like EDTA, without actually denaturing the protein.

3) Will the amino acid sequence (primary structure) be changed after denaturation? If yes, are amino acids removed after denaturation, if yes, which?

Denaturation does not include cleavage of covalent bonds. There won't be change in the amino acid sequence.

4) I guess Zinc will be removed after denaturation. Is the remaining structure the primary protein structure?

Yes; just the polypeptide.

Different Modes of Post-Transcriptional Modifications in Proteins: 6 Modes

Polypeptide chains like RNA transcripts are also modified after their synthesis. This additional processing is termed as post transcriptional modification.

These types of post translational modifications are important in achieving the functional status specific to any given protein. Because the final 3D structure of the molecule is closely elated to its specific function, the folding of protein is also important.

These complex biochemical processes are briefly described here for an understanding about overall process:

1. The N-terminus and C-terminus amino acids are usually removed or modified. The initial and terminal formylmethionine residue in bacterial polypeptide is usually removed enzymatically. In eukaryotes, initial methionine residue is removed and the amino group of the N-terminal residue is chemically modified.

2. Individual amino acid residues are sometimes modified, e.g., phosphate may be added to the hydroxyl groups of certain amino acids such as tyrosine. The process of phosphorylation is extremely important in regulating several cellular activities and is a result of the action of enzymes called kinases. In other proteins, methyl group may be added enzymatically.

3. In some proteins, carbohydrate side chains are sometimes added. Covalently added carbohydrates form a class of molecules called glycoproteins having antigenic properties.

4. Sometimes polypeptide chains are trimmed to make active protein molecules, e.g., insulin is produced as a large molecule and then trimmed to 51 amino acid molecules.

5. At the end terminal end of some proteins, a sequence of up to 30 amino acids is found that plays an important role in directing the protein to the location in the cell where it becomes functional. This is called a signal sequence and it determines the final destination of protein in the cell.

The Mechanism of Protein Synthesis

Just as with mRNA synthesis, protein synthesis can be divided into three phases: initiation, elongation, and termination. The process of translation is similar in bacteria, archaea and eukaryotes.

Translation Initiation

In general, protein synthesis begins with the formation of an initiation complex. The small ribosomal subunit will bind to the mRNA at the ribosomal binding site. Soon after, the methionine-tRNA will bind to the AUG start codon (through complementary binding with its anticodon). This complex is then joined by large ribosomal subunit. This initiation complex then recruits the second tRNA and thus translation begins.

Translation begins when a tRNA anticodon recognizes a codon on the mRNA. The large ribosomal subunit joins the small subunit, and a second tRNA is recruited. As the mRNA moves relative to the ribosome, the polypeptide chain is formed. Entry of a release factor into the A site terminates translation and the components dissociate.

Bacterial vs Eukaryotic initiation

In E. coli mRNA, a sequence upstream of the first AUG codon, called the Shine-Dalgarno sequence (AGGAGG), interacts with a rRNA molecule. This interaction anchors the 30S ribosomal subunit at the correct location on the mRNA template. Stop for a moment to appreciate the repetition of a mechanism you've encountered before. In this case, getting a protein complex to associate - in proper register - with a nucleic acid polymer is accomplished by aligning two antiparallel strands of complementary nucleotides with one another. We also saw this in the function of telomerase.

Instead of binding at the Shine-Dalgarno sequence, the eukaryotic initiation complex recognizes the 7-methylguanosine cap at the 5' end of the mRNA. A cap-binding protein (CBP) assists the movement of the ribosome to the 5' cap. Once at the cap, the initiation complex tracks along the mRNA in the 5' to 3' direction, searching for the AUG start codon. Many eukaryotic mRNAs are translated from the first AUG, but this is not always the case. According to Kozak&rsquos rules, the nucleotides around the AUG indicate whether it is the correct start codon. Kozak&rsquos rules state that the following consensus sequence must appear around the AUG of vertebrate genes: 5'-gccRccAUGG-3'. The R (for purine) indicates a site that can be either A or G, but cannot be C or U. Essentially, the closer the sequence is to this consensus, the higher the efficiency of translation.

Translation Elongation

During translation elongation, the mRNA template provides specificity. As the ribosome moves along the mRNA, each mRNA codon comes into 'view', and specific binding with the corresponding charged tRNA anticodon is ensured. If mRNA were not present in the elongation complex, the ribosome would bind tRNAs nonspecifically. Note again the use of base pairing between two antiparallel strands of complementary nucleotides to bring and keep our molecular machine in register and in this case also to accomplish the job of "translating" between the language of nucleotides and amino acids.

The large ribosomal subunit consists of three compartments: the A site binds incoming charged tRNAs (tRNAs with their attached specific amino acids), the P site binds charged tRNAs carrying amino acids that have formed bonds with the growing polypeptide chain but have not yet dissociated from their corresponding tRNA, and the E site which releases dissociated tRNAs so they can be recharged with another free amino acid.

Elongation proceeds with charged tRNAs entering the A site and then shifting to the P site followed by the E site with each single-codon &ldquostep&rdquo of the ribosome. Ribosomal steps are induced by conformational changes that advance the ribosome by three bases in the 3' direction. The energy for each step of the ribosome is donated by an elongation factor that hydrolyzes GTP. Peptide bonds form between the amino group of the amino acid attached to the A-site tRNA and the carboxyl group of the amino acid attached to the P-site tRNA. The formation of each peptide bond is catalyzed by peptidyl transferase, an RNA-based enzyme that is integrated into the 50S ribosomal subunit. The energy for each peptide bond formation is derived from GTP hydrolysis, which is catalyzed by a separate elongation factor. The amino acid bound to the P-site tRNA is also linked to the growing polypeptide chain. As the ribosome steps across the mRNA, the former P-site tRNA enters the E site, detaches from the amino acid, and is expelled. The ribosome moves along the mRNA, one codon at a time, catalyzing each process that occurs in the three sites. With each step, a charged tRNA enters the complex, the polypeptide becomes one amino acid longer, and an uncharged tRNA departs. Amazingly, this process occurs rapidly in the cell, the E. coli translation apparatus takes only 0.05 seconds to add each amino acid, meaning that a 200-amino acid polypeptide could be translated in just 10 seconds.

Many antibiotics inhibit bacterial protein synthesis. For example, tetracycline blocks the A site on the bacterial ribosome, and chloramphenicol blocks peptidyl transfer. What specific effect would you expect each of these antibiotics to have on protein synthesis?

The Genetic Code

To summarize what we know to this point, the cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template converts nucleotide-based genetic information into a protein product. Protein sequences consist of 20 commonly occurring amino acids therefore, it can be said that the protein alphabet consists of 20 letters. Each amino acid is defined by a three-nucleotide sequence called the triplet codon. The relationship between a nucleotide codon and its corresponding amino acid is called the genetic code. Given the different numbers of &ldquoletters&rdquo in the mRNA and protein &ldquoalphabets,&rdquo means that there are a total of 64 (4 × 4 × 4) possible codons therefore, a given amino acid (20 total) must be encoded for by more than one codon.

Three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5' end of the mRNA. The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis, which is powerful evidence that all life on Earth shares a common origin.

This figure shows the genetic code for translating each nucleotide triplet, or codon, in mRNA into an amino acid or a termination signal in a nascent protein. (credit: modification of work by NIH)

Redundant, not Ambiguous

The information in the genetic code is redundant. Multiple codons code for the same amino acid. For example, using the chart above, you can find 4 different codons that code for Valine, likewise, there are two codons that code for Leucine, etc. But the code is not ambiguous, meaning, that if you were given a codon you would know definitively which amino acid it is coding for, a codon will only code for a specific amino acid. For example, GUU will always code for Valine, and AUG will always code for Methionine. This is important, you will be asked to translate an mRNA into a protein using a codon chart like the one shown above.

Translation Termination

Termination of translation occurs when a stop codon (UAA, UAG, or UGA) is encountered. When the ribosome encounters the stop codon no tRNA enters into the A site. Instead a protein know as a release factor binds to the complex. This interaction destabilizes the translation machinery, causing the release of the polypeptide and the dissociation of the ribosome subunits from the mRNA. After many ribosomes have completed translation, the mRNA is degraded so the nucleotides can be reused in another transcription reaction.

What are the benefits and drawbacks to translating a single mRNA multiple times?

Coupling between Transcription and Translation

As discussed previously, bacteria and archaea do not need to transport their RNA transcripts between a membrane bound nucleous and the cytoplasm. The RNA polymerase is therefore transcribing RNA directly into the cytoplasm. Here ribosomes can bind to the RNA and begin the process of translation, in some instances while transciption is still occurring. The coupling of these two processes, and even mRNA degradation, is facilitated not only because transcription and translation happen in the same compartment but also because both of the processes happen in the same direction - synthesis of the RNA transcript happens in the 5' to 3' direction and translation reads the transcript in the 5' to 3' direction. This "coupling" of transcription with translation occurs in both bacteria and archaea and is, in fact, essential for proper gene expression in some instances.

Multiple polymerases can transcribe a single bacterial gene while numerous ribosomes concurrently translate the mRNA transcripts into polypeptides. In this way, a specific protein can rapidly reach a high concentration in the bacterial cell.

Protein Sorting

In context of a protein synthesis Design Challenge we can also raise the question/problem of how proteins get to where they are supposed to go. We know that some proteins are destined for the plasma membrane, others in eukaryotic cells need to be directed to various organelles, some proteins, like hormones or nutrient scavenging proteins, are intended to be secreted by cells while others may need to be directed to parts of the cytosol to serve structural roles. How does this happen?

Since various mechanisms have been uncovered, the details of this process are not easily summarized in a brief paragraph or two. However, some key common elements of all mechanisms can be mentioned. First, is the need for a specific "tag" that can provide some molecular information about where the protein of interest is destined. This tag usually takes the form of a short string of amino acids - a so called signal peptide - that can encode information about where the protein is intended to end up. The second required component of the protein sorting machinery must be a system to actually read and sort the proteins. In bacterial and archaeal systems this usually consists of proteins that can identify the signal peptide during translation, bind to it, and direct the synthesis of the nascent protein to the plasma membrane. In eukaryotic systems, the sorting is by necessity more complex, and involves a rather elaborate set of mechanisms of signal recognition, protein modification, and trafficking of vesicles between organelles or the membrane. These biochemical steps are initiated in the endoplasmic reticulum and further "refined" in the Golgi apparatus where proteins are modified and packaged into vesicles bound for various parts of the cell.

Some of the various specific mechanisms may be discussed by your instructor in class. The key for all students it so appreciate the problem and to have a general idea of the high-level requirements that cells have adopted to solve them.

Post-translational Protein Modification

After translation individual amino acids may be chemically modified. These modifications add chemical variation and new properties that are rooted in the chemistries of the functional groups that are being added. Common modifications include phosphate groups, methyl, acetate, and amide groups. Some proteins, typically targeted to membranes will be lipidated - a lipid will be added. Other proteins will be glycosylated - a sugar will be added. Another common post-translational modification is cleavage or linking of parts of the protein itself. Signal-peptides may be cleaved, parts may be excised from the middle of the protein, or new covalent linkages may be made between cysteine or other amino acid side chains. Nearly all modifications will be catalyzed by enzymes and all change the functional behavior of the protein.

Section Summary

mRNA is used to synthesize proteins by the process of translation. The genetic code is the correspondence between the three-nucleotide mRNA codon and an amino acid. The genetic code is &ldquotranslated&rdquo by the tRNA molecules, which associate a specific codon with a specific amino acid. The genetic code is degenerate because 64 triplet codons in mRNA specify only 20 amino acids and three stop codons. This means that more than one codon corresponds to an amino acid. Almost every species on the planet uses the same genetic code.

The players in translation include the mRNA template, ribosomes, tRNAs, and various enzymatic factors. The small ribosomal subunit binds to the mRNA template. Translation begins at the initiating AUG on the mRNA. The formation of bonds occurs between sequential amino acids specified by the mRNA template according to the genetic code. The ribosome accepts charged tRNAs, and as it steps along the mRNA, it catalyzes bonding between the new amino acid and the end of the growing polypeptide. The entire mRNA is translated in three-nucleotide &ldquosteps&rdquo of the ribosome. When a stop codon is encountered, a release factor binds and dissociates the components and frees the new protein.

Proteins: Post translational modification - Biology

Post-Translational Modifications to Regulate Protein Function

Hening Lin, Jintang Du, and Hong Jiang, Department of Chemistry and Chemical Biology, Cornell University, Ithaca, New York

Protein post-translational modifications (PTM) are very important to regulate protein function and to control numerous important biological processes. Here a brief review of commonly found enzyme-catalyzed PTM is given. These PTM include modifications that occur on protein side chains and those that involve protein backbones. The introduction of different PTM is followed by a summary of the molecular basis for the regulation of protein function by PTM. The focus is then given to a few major PTM that play important roles in eukaryotes, such as phosphorylation, methylation, acetylation, glycosylation, ubiquitylation, and proteolysis. For each modification, a description will be given about the residues modified, the enzymatic reaction mechanisms, the major known biological functions, and its relevance to human diseases. At the end, we discuss challenges in identifying new pathways regulated by known PTM and discovering new PTM.

The central dogma of molecular biology, DNA is transcribed to mRNA which is then translated to proteins, implies the importance of proteins. After all, it is the proteins that carry out most of the biological functions of a cell. Thus controlling transcription and translation are very important, as they ultimately control what proteins are synthesized in cells and thus control the properties of cells. However, one should not overlook what happens to proteins after they are synthesized. Many chemical modifications can occur to proteins after translation. Collectively, these modifications are called post-translational modifications (PTM). PTM are very important in regulating protein function, which is reflected by the large number of genes devoted to catalyzing PTM. For example, in the human genome (with less than 30,000 genes total), more than 500 kinases catalyze protein phosphorylation (1), and more than 500 proteases catalyze the hydrolytic cleavage of proteins (2). Deregulation in PTM is the cause of various human diseases, as will be explained later in specific PTM sections. Here, a brief review is given on different types of PTM and on how PTM regulate protein function. Some basic principles will be highlighted so that readers who are unfamiliar with PTM can have a quick but comprehensive understanding of PTM. The recent book on PTM by Professor Walsh from Harvard Medical School provides a more complete description of PTM (3). Where appropriate, references on specific PTM will also be given in different sections for additional information. The abbreviations used are cataloged in Table 1 to help readers who are not familiar with the biological language.

Types of post-translational modifications

PTM can be enzyme-catalyzed and thus controlled carefully, or they can be nonenzymatic with less control. For example, protein glycation during hyperglycemia is a nonenzymatic PTM that accounts for some symptoms of diabetes (4). Protein nitrosylation on Cys residues is another nonenzymatic PTM that can affect protein function (5). Coordination by metal ions can also be considered as a PTM. For many proteins, metal binding is crucial for maintaining the correct structure or the enzymatic activity (6). Here, the focus will be given to enzyme-catalyzed PTM. Figures 1 and 2 show many commonly found enzyme-catalyzed PTM. (3)

As can be observed from Fig. 1, most PTM happen to protein side chains. Typically, the side chains involved are nucleophilic, such as Cys (palmitoylation, isoprennylation, disulfide bond formation, ADP-ribosylation), Lys (acetylation, methylation, ubiquitinylation), Arg (methylation, ADP-ribosylation), Asp/Glu [methylation, poly(ADP-ribosyl)ation], Ser/Thr (phosphorylation, O-glycosylation), and Tyr (phosphorylation). Weaker nucleophiles are also used, such as the side chain amide nitrogen in Asn (in N-glycosylation), the C-2 position of Trp (in C-glycosylation), and the C-2 position of His (in diphthamide). In amidation reactions catalyzed by transglutaminases and polyglutamylation/polyglycylation reactions that happen to Glu residues, the ε-NH2 from Lys or α-NH2 from Glu/Gly acts as the neucleophile, whereas the side chain of Gln or Glu serves as the electrophile. In addition, several amino acid side chains can be oxidized, such as Pro, Lys, Asn, Tyr, Trp, and Cys, to give oxidized amino acids.

A few PTM reactions also involve changes in protein backbone. These reactions include the hydrolytic cleavage of the peptide backbone by proteases, the anchoring of proteins to glycosylphosphotidylinositol (GPI) or cholesterol, and the C-terminal amide formation by oxidative cleavage of glycine residues. Some PTM involve changes in both the side chain and the main chain, such as the formation of 4-methylidene-5-imiazole-5-one (MIO) prosthetic group in deaminases and aminomutases, the formation of the fluorophore in GFP (green fluorescent protein), and the formation of pyruvamide in decarboxylases (Fig. 2).

Figure 1. Major enzyme-catalyzed PTM that modify protein side chains.

Figure 2. A few PTM that involve protein backbone.

Table 1. List of abbreviations

A tyrosine kinase encoded from abl (Abelson) gene, the fusion protein ABL-BCR is involved in inhibition of apoptosis in chronic myelogenous leukemia cells

Adenylate cyclase, converts ATP to cyclic AMP

Acyl carrier protein, found in fatty acid synthases and polyketide synthases, functions to carry the elongating fatty acyl chain

A disintergrase and metalloprotease, a family of proteases that hydrolyze off extracellular portions of transmembrane proteins

Apoptotic protease activation factor-1, a cytosolic protein involved in cell death or apoptosis, interacts with cytochrome c to activate caspase 9

Acyltransferase, found in fatty acid syntheases and polyketide syntheases, adds a malonyl group to the holo form of the ACP domain

Named from B-cell lymphoma 2, an antiapoptotic protein

A protein encoded from breakpoint cluster region gene, has serine/threonine kinase activity. Fusion with abl protein causes leukemia

3’-5’-cyclic adenosine monophosphate

Caspase recruitment domain, mediates the formation of larger protein complexes via direct interactions between individual cards, involved in the regulation of caspase activation and apoptosis

Coactivator-associated arginine(R) methyltransferase 1, methylates Arg17 and Arg26 residues on Histone H3

Ubiquitously expressed homolog of Cbl, a mammalian protein involved in cell signaling and protein ubiquitination, named after Casitas B-lineage Lymphoma

CREB binding protein, a transcriptional co-activating protein

Cellular differentiation marker 2, a cell adhesion protein found on the surface of T cells and natural killer cells

Cell-division kinases, serine/threonine kinases, activated by association with cyclins and involved in regulation of the cell cycle, transcription and mrna processing

Chromodomain helicase DNA-binding protein 1, interacts with methylated Lys4 on Histone H3

A protein named from circadian locomotor output cycles kaput gene, regulating circadian rhythm

Chronic myelogenous leukemia, a form of leukemia characterized by the increased and unregulated growth of predominantly myeloid cells in the bone marrow and the accumulation of these cells in the blood

Camp response element binding proteins, as transcription factors, bind to certain sequences called camp response elements (CRE) in DNA and thereby increase or decrease the transcription of certain genes

Cytochrome c, a small heme protein associated with the inner membrane of the mitochondria and released in response to pro-apoptotic stimuli

Death effector domain, a protein interaction domain found in inactive procaspases and proteins that regulate caspase activation in the apoptosis cascade

Dehydratase, found in fatty acid syntheases and polyketide syntheases, dehydrates the P-OH of acyl thioester

Deoxyribonuclease, catalyzes the hydrolytic cleavage of phosphodiester linkages in the DNA backbone

Enoylreductase, found in fatty acid syntheases and polyketide syntheases, reduces the enoyl of enoyl thioester to the saturated thioester

Extracellular signal-regulated kinase, activates many transcription factors and some downstream protein kinases, involved in functions including the regulation of meiosis, mitosis, and postmitotic functions in differentiated cells

One of the serine proteases of the coagulation system

Flavin adenine dinucleotide

Fas-associated protein with death domain, connects the Fas-receptor and other death receptors to caspase-8 through its death domain to form the death inducing signaling complex during apoptosis

Forkhead-associated domain, a phosphospecific protein-protein interaction motif involved in checkpoint control of the cell cycle

A yeast transcriptional adaptor that has histone acetyltransferase activity

Green fluorescent protein

G protein-coupled receptor, a transmembrane receptor that senses molecules outside the cell and activates inside signal transduction pathways and cellular responses

Glycosylphosphatidylinositol, a glycolipid that can be attached to the C-terminus of a protein during post-translational modification

Glycogen phosphorylase kinase, a serine/threonine-specific protein kinase which activates glycogen phosphorylase by phosphorylation

Growth factor receptor-bound protein 2, an adaptor protein involved in signal transduction/cell communication

Histone deacetylases, remove acetyl groups from an e-N-acetyl lysine residues on histones

Human DOTl-like protein, methylates histone H3 at Lys79. (DOT1: Yeast disruptor of telomeric silencing-1)

Homologous to E6-AP C terminus, mediates E2 binding and ubiquitination

Hypoxia inducible factor, a transcription factor that responds to changes in available oxygen in the cellular environment, specifically to decreases in oxygen or hypoxia

Hetergenous nuclear ribonucleoproteins, which forms complex with pre-MRNA and MRNA and shuttles between the nucleus and the cytoplasm

Heterochromatin protein 1, binds to heterochromatin and interacts with numerous partner proteins to organize the higher-order structure of heterochromatin

Immunoglobulin G, one antibody isotype

Inhibitor of NF-Kb kinase, which phosphorylates inhibitor of NF-Kb for the proteasomal degradation to release NF-Kb dimers to translocate to the nucleus and activate transcription of target genes

Inositol pyrophosphate, a proposed physiologic phosphate donor

Jmjc domain-containing histone demethylase

Jumonji domain-containing, a novel demethylase signature motif

Ketoreductase, found in fatty acid syntheases and polyketide syntheases, reduces the β-ketoacyl thioester

Ketosynthase, found in fatty acid syntheases and polyketide syntheases, carries out C-C bond-forming chain elongation step

Lysine-specific demethylase 1, demethylates histone H3 at lysine 9

Mitogen-activated protein kinase, serine/threonine-specific protein kinases that respond to extracellular stimuli (mitogens) and regulate various cellular activities, such as gene expression, mitosis, differentiation, and cell survival/apoptosis

MAPK/ERK kinase, activates a MAP kinase or ERK through phosphorylation

Monocytic leukemia zinc finger protein, a histone acetyltransferase implicated in leukemogenic and other tumorigenic processes, regulates expression of genes required for proliferation and repopulation of potential of stem cells in the hematopoietic compartment

Nicotinamide adenine dinucleotide

Nerve growth factor P1, a secreted protein which induces the differentiation and survival of particular target neurons, belonging to neurotrophins protein family

Protein Arg deiminases, hydrolyzes the guanidine side chain of Arg residues to citrulline residues in proteins

Poly(ADP-ribose) polymerase-1, catalyzes the transfer of poly ADP-ribose to substrate proteins by using NAD as substrate, involved in cellular response to DNA damage and DNA metabolism

Protein kinase A, a family of kinases whose activity are dependent on the level of cyclic AMP, involved in the regulation of glycogen, sugar, and lipid metabolism

Protein Arg(R) methyltransferase, catalyzes the transfer of methyl group from S-adenosylmethionine to the guanidino nitrogen atoms of arginine residues

RIP-associated ICH-1/CED-3 homologous protein with a death domain, functions as an adaptor in recruiting the death protease ICH-1 to the TNFR-1 signaling complex (ICH: Ice and ced-3 homolog TNRF: tumor necrosis factor receptor)

Really interesting new gene. Ring proteins are components of ubiquitin e3 enzyme complexes.

Vorinostat, suberoylanilide hydroxamic acid, brand name Zolinza, a class of agents known as histone deacetylase inhibitors, as a drug for the treatment of cutaneous T cell lymphoma (a type of skin cancer)

Skp1-Cullin-F Box, a multi-protein complex catalyzing the ubiquitylation of proteins destined for proteasomal degradation

Supressor of variegation-Enhanser of zeste-Trithorax. SET domains have methyltranferase activity.

A novel human SET domain-containing protein, which specifically methylates H4 at Lys20

A novel human SET domain-containing protein, which specifically methylates H3 at Lys4

Src homology 2, a phosphotyrosine-recognition protein domain of about 100 amino acid residues first identified as a conserved sequence region among the oncoproteins Src and Fps

Proteins homologs of both the drosophila protein, mothers against decapentaplegic (MAD) and the C. Elegans protein SMA, as signal-activated transcription factors regulated by the TGF-β superfamily

Proteins containing SET and MYND domain. MYND encoded mynd (myosin) gene, which have histone methyltransferase activity

Small nuclear ribonucleoproteins, combining with pre-MRNA and various proteins to form spliceosomes to removes introns from pre-MRNA segment

Son of sevenless, a guanine nucleotide exchange factor that activates Ras

Signal transducers and activators of transcription, proteins which are involved in the development and function of the immune system

Small ubiquitin-like modifier, a family of small proteins that are covalently attached to and detached from other proteins in cells to modify their functions

TATA box-binding protein-associated factor 10, a component of the general transcription factor complex TFIID and the TATA box-binding protein (TBP)-free TAF-containing complex

Transcription intermediary factor 2, a transcriptional coregulatory protein which contains several nuclear receptor interacting domains and an intrinsic histone acetyltransferase activity

Transcription factor, a protein that binds to specific region of DNA by DNA binding domains and mediates the transcription from DNA to RNA

Transforming growth factor |31, a secreted protein that performs many cellular functions, including the control of cell growth, cell proliferation, cell differentiation and apoptosis, belonging to the transforming growth factor beta superfamily of cytokines

Trans Golgi network, a part of the golgi apparatus in cells

Tumor necrosis factor receptor-associated protein with death domain, an adapter protein that recruits other proteins to the cytoplasmic TNF (tumor necrosis factor) receptor complex, involved in apoptosis

Ubiquitin activating protein

Ubiquitin binding associated domain, one class of ubiquitin binding domains

Ubiquitin binding domain, which binds mono- or poly-ubiqitin

Ubiquitin-specific protease, hydrolyzes both linear and branched Ub modifications

As with all other chemical species, protein structure determines protein function. PTM can regulate protein function because they can change protein structure. The structure change introduced by PTM can be local and small. For example, methylation of Lys residues makes the side chain more hydrophobic without changing protein backbone conformation significantly [at least based on crystal structures in which methylated and unmethylated histone peptides are bound by another protein (7)], whereas phosphorylation can change the backbone conformation within a limited region of a protein by charge-pairing with nearby Arg residues or by interacting with main chain NH and helical dipole (8). In contrast, some PTM can alter protein overall structure more dramatically, such as the proteolytic cleavage of proteins into smaller fragments, or the addition of protein tags like ubiquitin. These structure changes, small or big, are the basis for the biological functions of different PTM and typically lead to one or more of the consequences described below.

Changing protein structure to turn on/off catalytic activity of enzymes

The best-known PTM that is widely used to regulate enzymatic activity is phosphorylation. Phosphoryation regulates the activity of many enzymes by different mechanisms. For example, glycogen phosphorylase is activated allosterically by phosphorylation at Ser14, whereas Escherichia coli isocitrate dehydrogenase is inhibited by phosphorylation because of the block of substrate access to the active site (9). The most interesting and very important catalytic activity regulated by phosphorylation is protein kinase activity. Most protein kinases are activated by phosphorylation of Thr/Tyr residue(s) in the activation segment. The structural changes induced by phosphorylation, which are illustrated in Fig. 3 with ERK (extracellular signal-regulated kinase), convert the inactive kinases to active kinases (8). The regulation of protein kinase activity by phosphorylation bears enormous biological significance because protein phosphorylation is important in signal transduction, and the control of downstream kinase activity via phosphorylation by upstream kinase is one major method to propagate signals to downstream partners, as will be elaborated later.

Proteolysis is another way to control enzymatic activity, although unlike phosphorylation, the change in activity is irreversible. Many proteases are synthesized as inactive precursors (zymogens) that have to be cleaved by proteolysis to become active. These precursors include proteases that are secreted into digestive tracts or lysosomes, the catalytic active P subunits in the eukaryotic 20 S proteosome that are activated by self-cleavage (10), and the effector caspases involved in apoptosis that are activated by initiator caspases-mediate cleavage (11).

Figure 3. Structure of ERK2 in both unphosphorylated (inactive) and phosphorylated (active) state. (a) ERK2 in unphosphorylated state (figure made using PDB 1ERK) residues Thr183 and Tyr185 in the activation segment are labeled (b) ERK2 in phosphorylated state (ERK-P2, figure made using PDB 2ERK). The two phosphorylated residues, pThr183 and pTyr185, are labeled (c) Superposition of ERK2 and ERK2-P2.

Changing protein structure to create or to mask recognition motifs

Many PTM exert their biological functions by creating recognition motifs to recruit binding partners (12) or by masking recognition motifs to disrupt existing interactions. Phosphory- lated Ser/Thr residues can be recognized by proteins that contain 14-3-3 domains, FHA (forkhead-associated) domains, SMAD [proteins homologs of both the drosophila protein, mothers against decapentaplegic (MAD) and the Caenorhabditis elegans protein SMA] domains, and several other domains (13). Phosphorylated Tyr residues can recruit proteins that contain SH2 (Src homology 2) domains and PTB (phosphotyrosine binding) domains (14). Acetyl Lys residues can be recognized by proteins with bromodomains (15, 16), and methylated Lys residue can be recognized by proteins with chromodomains and Tudor domains (17). The ubiquitin and ubiquitin-like protein tags can also be recognized by various protein domains that mediate the biological function of modification with these protein tags (18, 19). The structures of a few domains dedicated to recognition of post-translationally modified residues are shown in Fig. 4. Typically, domains that recognize post-translationally modified residues have specificities in that they recognize not only the modified residue, but also the local structure in which the residue resides. The specific recognition of PTM in different contexts is the key to understand many biological consequences of PTM, as will be explained in more detail in particular PTM sections later.

In addition to creating recognition motifs to recruit proteins, a few PTM can also increase interaction with other species, such as the lipid bilayer of different cellular membranes. These modifications include the formation of GPI-anchored proteins (20), protein myristoylation on the a-amino group of the N-terminal Gly (21), protein C-terminal prennylation on Cys residues (22), and protein palmitoylation on Cys residues that are close to membrane surface (23). These lipid modifications occur to many signaling proteins, which include G protein-coupled receptors and small G proteins, and they play important roles in signal transduction and membrane trafficking (24).

Figure 4. Structures of a few dedicated domains that recognize post-translationally modified residues. (a) SH2 domain of v-Src in complex with pTyr peptide (pTyr-Val-Pro-Met-Leu). Residues Arg12, Arg32, Ser34, Thr36, and Lys60 from the SH2 domain interact with pTyr (figure made using PDB 1SHA) (b) Bromodomain of yeast histone acetyltransferase Gcn5 in complex with AcLys peptide (histone H4 residues 15-29, AK(Ac)RHRKILRNSIQGI). Bromodomain residues Pro351, Gln354, Tyr364, Met372, Val399, and Asn407 interact with AcLys (with some of the interaction is mediated by water molecules, figure made using PDB 1 E6I) (c) Chromodomain of HP1 in the complex with histone H3 Me3Lys9 (figure made using PDB 1KNE). Chromodomain residues Tyr 24, Trp 45, Tyr 48 and Glu 52 bind Me3Lys(d) UBA domain of Cbl-b in complex with ubiquitin (figure made using PDB 2OOB). UBA domain residues Asp933, Ala937, Met940, Phe946, and Lys950 interact with ubiquitin residues Leu8, Ile44, Ala46, Gly47, Gln49, His68, and Val70. UBA: ubiquitin binding associated.

Adding functional groups to allow catalysis

Typically, proteins are formed with the most common 20 amino acids, which only offer a limited number of choices of functional groups for catalyzing different reactions. The limit in the number of functional groups is complemented by the use of various coenzymes or cofactors, many of which are attached covalently to the corresponding enzymes. One class of PTM with this function is the addition of “swinging arm” prosthetic groups (biotin, phosphopantetheine, and lipoic acid) to proteins (25). Biotin is used as a carrier of CO2 in carboxylation reactions, and the disulfide bond in lipoyl group is used as an electron carrier and acyl carrier in 2-keto acid dehydrogenases. The phosphopan- tetheine group provides a thiolate as the carrier of acyl chains and is used in fatty acids synthases, polyketide synthases, and nonribosomal peptide synthases (26). Although a thiolate side chain can also be provided by Cys, the longer phosphopantetheine can shuttle the acyl chains to different catalytic domains, which allows multiple reactions to occur in sequence on the acyl chains (Fig. 5). This “swinging arm” catalysis, which is also enabled by biotinylation and lipoylation, cannot be achieved by natural proteinogenic amino acids with shorter side chains.

Another type of PTM provides new functional groups for enzyme catalysis by oxidation of side chain. These include TOPA (2,4,5-trihydroxyphenylalanine) quinone in amine oxidases (Fig. 6), tryptophan tryptophanyl quinone in methylamine dehydrogenase (27) and formylglycine in sulfatases. (28) Main chain modifications can also generate prosthetic groups for enzyme catalysis, such as the MIO group in His/Phe ammonia-lyase (29, 30) and Tyr aminomutases, (31) and the pyruvoyl group in decarboxylases (Fig. 6) (32). The formation of these cofactors by PTM extends the catalytic power of enzymes greatly, which enables them to catalyze chemistry that is difficult with just the side chains of the 20 amino acids commonly found in proteins.

Figure 5. Fatty acid biosynthesis catalyzed by fatty acid synthases. The growing acyl chain is tethered to the phosphopantetheinylated ACP domain, which enables it to undergo cycles of condensation, ketone reduction, dehydration, and enol reduction catalyzed by different domains. AT, acyltransferase ACP, acyl-carrier protein KS, ketosynthase KR, ketoreductase DH, dehydratase ER, enoylreductase.

Figure 6. Post-translationally generated cofactors provide functional groups to allow catalysis. The mechanisms of TOPA quinone in amine oxidases, MIO in deaminases, and pyruvamide in decarboxylases are shown.

Locking proteins into the correct structures or increasing protein stability

The major type of PTM that has this function is protein disulfide bond formation (33). Disulfide bonds are more stable thermodynamically than the reduced thiols in an oxidizing environment. In eukaryotes, proteins that undergo the secretary pathway start to form disulfide bonds once they are translocated into the endoplasmic recticulum (ER) lumen, which is an oxidizing environment. These disulfide bonds help to stabilize the desired protein structure by locking the protein in a certain conformation, and perhaps to assist protein folding too. Many secreted proteins later undergo proteolysis in the Golgi to give smaller fragments (see the proteolysis section below). In this case, disulfide bonds also serve to link the fragments covalently to maintain a certain structure. One textbook example is insulin, which is produced as a single peptide chain that later undergoes several proteolysis step, and the mature insulin consists of two chains connected via two disulfide bonds (Fig. 7) (34). The light and heavy chains of antibodies are connected by disulfide bonds. Another PTM that can increase protein stability is glycosylation. For example, erythropoietin N-glycosylation has been found to increase its in vivo lifetime (35), which is probably because of the blocking of tissue proteases action by carbohydrate modifications.

Figure 7. Maturation of insulin. Insulin is synthesized as preproinsulin that contains an N-terminal signal sequence. After translocating into the ER, the signal sequence is cleaved off by the signal peptidase and the resulting proinsulin folds into a stable conformation. Three disulfide bonds are formed between cysteine side chains. The connecting sequence (Chain C) is cleaved off in the Golgi by proprotein convertases to form the mature and active insulin molecule, which is then secreted.

Exploration of major PTM

In this section, a few major PTM will be explored in more details. For each PTM discussed, a brief introduction on the PTM reaction and the enzymes catalyzing the reaction will be given. A few biological processes that involve the PTM will be explained to demonstrate the important function of the PTM in biology.

Protein phosphorylation typically occurs on Ser, Thr, and Tyr residues (Fig. 1), although His and Asp residues can also be phosphorylated as in bacteria two-component signal transduction systems. The universal phosphate donor is adenine triphosphate (ATP, Fig. 8), and the reaction is catalyzed by more than 500 kinases in humans. Many kinases are Ser/Thr specific, some are Tyr specific, whereas some have dual specificity. It was reported that inositol pyrophosphate (IP7) can also serve as phosphate donor in protein phosphorylation (36). However, the reaction is not enzyme catalyzed and the physiologic relevance is not proven yet.

The large number of protein kinases in the human genome reflects that this PTM is widely occurring and regulates numerous biological processes. The most well understood function is signal transduction, because phosphorylation of proteins can turn ON/OFF catalytic activity or create recognition motif to recruit other protein partners, thus allowing signal to propagate. In accord with its role in signal transduction, protein phosphorylation is reversible so that the signaling process can be terminated as needed. The removal of the phosphate group is catalyzed by phosphatases (Fig. 8).

Figure 8. Kinase-catalyzed phosphorylation and phosphatases-catalyzed dephosphorylation reactions. (a) Catalytic mechanism of protein kinases (b) Catalytic mechanism of bimetallic pSer/pThr or dual specifity protein phosphatases (c) Catalytic mechanism of pTyr phosphatases.

Two signaling processes will be discussed here to illustrate how protein phosphorylation can play a critical role in cell signaling. A more detailed description of these two signaling processes can be found in the Molecular Cell Biology textbook by Lodish et al. (34). The first one, which is shown in Fig. 9, involves protein kinase A (PKA), which can be activated by cyclic AMP (cAMP) (37). PKA at resting state exists as an inactive tetramer that consists of two copies of a regulatory subunit and two copies of the catalytic subunit. Hormones that signal through G-protein coupled receptors can activate the trimeric G protein, which in turn can activate an effector enzyme, adenylate cyclase (38). Adenylate cyclase catalyzes the formation of cAMP from ATP (39), which results in the increase in cAMP concentration. Binding of cAMP to the regulatory subunits of PKA dissociate the inactive tetramer, which releases the catalytic subunit of PKA. The catalytic subunit can then be activated by phosphorylation at the activation loop. Activated PKA can phosphorylate many different substrates and produce both short-term and long-term effects. Short-term effects come from the change of the catalytic activities of substrate proteins on phosphorylation by PKA. The substrates of PKA include proteins involved in glycogen synthesis and degradation, such as glycogen phosphorylase kinase and glycogen synthase (40). Phosphorylation of these proteins by PKA leads to activation of glycogen degradation and inhibition of glycogen synthesis. Long-term effects come from the changes in gene transcription. PKA can affect transcription by phosphorylating CREB (cAMP response element binding proteins) and other transcription factors (41). On phosphorylation, CREB can bind to specific regions of the chromosomal DNA, and it can recruit the basal transcription machinery via CBP (CREB binding protein)/P300 to activate the transcription of certain genes.

Figure 9. The signaling process that involves G protein-coupled receptors (GPCR) and PKA. (1) Binging of hormone produces conformational change in the GPCR (2) GPCR binds to Gs protein (3) GDP bound to Gs is replaced by GTP and the β and γ subunits of Gs dissociate from the α subunit (4) Gsa subunit binds to adenylate cyclase (AC), which activates the synthesis of cAMP (4a), the hormone tends to dissociate, and hydrolysis of GTP to GDP causes Gsα to dissociate from adenylate cyclase and binds to Gβγ, which regenerates a conformation of Gs that can be activated by an GPCR hormone complex (4b) (5) dissociation of regulatory subunits (R) from PKA as cAMP concentration increases (6) subsequent activation of the catalytic subunits (C) by phosphorylation in the activation loop generates the fully active kinase (7) activated PKA can phosphorylate glycogen phosphorylase kinase (GPK) and other enzymes, which leads to activation of glycogen degradation and inhibition of glycogen synthesis and (8) PKA can affect transcription by phosphorylating the transcription factor CREB.

The second example of cell-signaling process that involves protein phosphorylation is receptor tyrosine kinase signaling (Fig. 10) (42). Receptor tyrosine kinases are transmembrane proteins with an extracellular ligand-binding domain and an intracellular tyrosine kinase domain. Ligand binding to the extracellular domain triggers receptor dimerization and/or activation, so that the intracellular catalytic domains from two receptor protein molecules can phosphorylate each other at the activation segment. This transphosphorylation activates the catalytic domain so that it can phosphorylate other Tyr residues in the receptor and other substrate proteins. These phosphorylated Tyr residues then recruit protein-binding partners that contain SH2 or PTB domains that recognize specific phosphorylated Tyr residues. One of the proteins recruited is Grb2 (growth factor receptor-bound protein 2), which contains an SH2 domain. Grb2 in turn recruit Sos (son of sevenless), which is a guanine nucleotide exchange factor for the G protein Ras. Sos catalyzes the exchange of Ras-bound GDP (guanosine-5’-diphosphate) for GTP (guanosine-5’-tiphosphate), which converts Ras to the activated form. Activated Ras can bind to and activate Raf, which is the most upstream kinase in the MAP kinase (Mitogen-activated protein kinase) cascade (43). By phosphorylation of MEK (MAPK/ERK kinase, a dual specificity MAP kinase kinase) on the activation segment, Raf activates MEK, which in turn phosphorylates and activates ERK. Activated ERK can phosphorylate many transcription factors, which leads to changes in gene transcription and ultimately cell division/differentiation.

The two examples mentioned above illustrated basic principles how protein phosphorylation serves specific biological purposes. Although different kinases might be involved in diverse pathways, the molecular mechanism for the regulation of protein function by phosphorylation is similar: By changing protein structure, phosphorylation can turn ON/OFF the catalytic activity of a protein, or create/mask recognition motif for binding by other molecules.

The 500 or so protein kinases in the human genome regulate numerous biological processes. Consequently, deregulation of protein phosphorylation can lead to various diseases, among which cancer is the most prominent one. Accordingly, kinase inhibitors are being sought for treating various cancers. One best understood example is chronic myeloid leukemia, which is caused by chromosomal abnormality that fuses a kinase ABL (encoded from Abelson gene) with another protein BCR (encoded from breakpoint cluster region gene) (44). The BCR-ABL fusion protein was shown to be sufficient to cause chronic myeloid leukemia in mice. Imatinib mesylate (Gleevec Novartis Pharmaceuticals, East Hanover, NJ) is a clinically used BCR-ABL inhibitor to treat CML (chronic myelogenous leukemia). The receptor tyrosine kinase and MAP kinase-signaling pathway mentioned above are key pathways that regulate cell proliferation and differentiation frequently, tumor cells have mutations in proteins involved in this pathway (45). This pathway has thus been studied intensively for the search of cancer drugs. Other kinases, such as cell-division kinases (CDKs), have also been targeted for therapeutics (46). In addition, because phosphatases reverse the effects of kinases, mutations in phosphatases have been indicated in human diseases such as cancer, diabetes, and neurologic disorders (47).

Figure 10. Receptor tyrosine kinase signaling process and the activation of MAP Kinase. (1) Binding of hormone to the receptor causes activation of the kinase activity of the receptor, which leads to phosphorylation of Tyr residues (2) pTyr residues recruit GRB2, which in turn recruit Sos (3) Sos promotes exchange of GTP for GDP in Ras, which leads to the active Ras-GTP complex. Then, Sos dissociates from the active Ras (4) active Ras binds to and activate the kinase Raf (4a) and hormone can dissociate from the receptor (4b) (5) activated Raf phosphoryates and activates MEK (6) activated MEK phosphorylates and activates of MAP kinase (7) activated MAP kinase can phosphorylate transcription factors (TF) and (8) phosphorylated translation factors then bind to DNA and lead to changes in gene transcription and ultimately cell division/differentiation.

Acetylation of Lys residues is a very well known PTM because of histone acetylation, which is involved in transcriptional regulation of genes. The acetyl group comes from Acetyl-CoA, and typically, the acetyl acceptor is Lys residues (Fig. 11). Histone acetylation correlates with transcription activation, and accordingly, histone acetyltransferases (HATs) are normally multidomain proteins associated with transcription activator/coactivator complexes (48). The correlation of histone acetylation with transcription activation can be explained by the relaxation of the chromatin structure on histone acetylation and the recruitment of other proteins via acetyl Lys. In eukaryotic cells, chromosomal DNA wrap around core histone octamers consisted of two copies each of histone H2A, H2B, H3 and H4 (49). The complex formed between the histone octamer and the DNA associated with it is called a nucleosome. Nucleosomes can pack into a more condensed structure. Evidence suggests that the tight packing suppresses transcription, whereas transcription activation correlates with relaxed chromatin structure. The N-terminal tails of the histones have many Lys and Arg residues, among other residues, that can be modified post-translationally. No detailed structure information is available to explain how histone tail modification affects nucleosome packing. However, intuitively, masking the positive charges on histones by Lys acetylation can decrease the interaction with negatively charged DNA, which loosens the chromatin structure (50). In addition, acetylated Lys residues can be recognized by proteins that contain bromodomains (Fig. 4) (16, 51), which serve to recruit other proteins (including chromatin remodeling complexes) that help to activate the transcription of the gene.

Histone acetylation not only affects transcription, but also affects other processes that involve DNA, such as nucleosome assembly, heterochromation formation, and DNA repair (52). The acetylation/deacetylation of different Lys residues can have different biological effects. For example, histone H4 Lys5, 8, and 12 acetylation are involved in nucleosome assembly, H4 Lys16 acetylation does not affect nucleosome assembly but is involved in transcription activation (52), whereas H4 Lys56 has been shown recently to promote genome stability and DNA repair in yeast (53, 54).

Proteins other than histones can also be modified by Lys acetylation. Many transcription factors, cytoskeleton proteins, metabolic enzymes, and signaling proteins are acetylated (55). Transcription factors are known to be substrates of HATs, whereas the enzymes responsible for the acetylation of nonnuclear proteins in many cases are not well known (55). The number of proteins that are regulated by acetylation will continue to increase as method to detect protein acetylation improves. Acetylation of nonhistone proteins can change protein-protein interaction, regulate enzymatic activity, and increase protein stability by suppressing ubiquitinylation (55).

Lys acetylation can be reversed by the action of deacetylases. Many deacetylases are Zn-dependent enzymes that use Zn 2+ in the active site to activate water molecules to hydrolyze the amide bond (Fig. 11) (56). Recently, another type of deacetylases that are nicotinamide adenine dinucleotide (NAD)-dependent, also known as sirtuins, have been identified (57, 58). Their unique ability to couple NAD degradation to Lys deacetylation (Fig. 11) suggests that this type of enzyme can sense the metabolic state (for example, NAD concentration) of the cell and use that information to regulate the acetylation state and thus the function of the substrate proteins.

In addition to Lys side chain acetylation, protein N-terminal can also be acetylated (59). In eukaryotic cells, the first residue Met in most proteins is cleaved by N-terminal methionine peptidase. The newly released N-terminal amino group is then acetylated. This modification can happen co-translationally before the mature peptide chain is released from the ribosome. The function of this modification in most cases is still not understood, although deletion of the genes involved in this modification has clear phenotypes (59).

Because of the involvement of protein Lys acetylation in regulation of transcription, protein-protein interaction, enzymatic activity, and protein stability, the deregulation of protein acetylation has been associated with many diseases, such as cancer and neurodegeneration (60, 61). Frequently, mutations in histone acetyltransferases are found in cancer (60). Chromosomal abnormalities that generate fusions of acetyltranferases are known to lead to acute myeloid leukemia. These abnormalities include the fusions of MOZ (monocytic leukemia zinc finger protein) with CBP (CREB binding protein) or p300, and fusion of MOZ with the transcription factor TIF2 (transcription intermediary factor 2) (60). MOZ, CBP, p300, and TIF2 all contain histone acetyltransferase domains. Presumably, the generation of these aberrant fusion proteins disrupts normal gene transcription profile, which leads to leukemia. Deregulation of histone deacetylases is also suggested to be associated with cancer (61). A histone deacetylase inhibitor, SAHA (Vorinostat, Merck & Co., Inc, Whitehous Station, NJ), was approved by Food and Drug Administration recently for treatment of cutaneous T-cell lymphoma (62).

Figure 11. (a) Lys acetylation catalyzed by acetyltransferases (b) mechanism of Zn-dependent HDACs-catalyzed deacetylation (c) mechanism of sirtuins-catalyzed deacetylation.

Although methylation can happen to several different residues (3, 63), most attention has been given to protein Lys/Arg methylation because the methylation of Lys/Arg in histones controls gene transcription. For Lys and Arg methylation, multiple methyl groups can be added to the same Lys or Arg residue (Fig. 12). The methyl group comes from S-adenosyl methionine (SAM), which is a versatile small molecule that is used in many enzymatic transformations (64). Almost all Lys methyltrans- ferases belong to the SET (supressor of variegation-Enhanser of zeste-Trithorax) family of methyltransferases, whereas the protein Arg methyltransferases belong to a different class (65-67). Both histone Lys/Arg methylation and acetylation are associated with transcription regulation. In contrast to histone acetylation, which usually correlates with transcription activation, histone methylation can lead either to transcription activation or to suppression (17, 68). The effect of histone methylation, which is based on current understanding, is mediated by proteins that are recruited by methylated Lys or Arg residues. Tudor domains and chromodomains are known to recognize methylated Lys/Arg residues via both charge interaction and cation-n interaction (69-73). The methylated Lys/Arg residue is more hydrophobic and sterically bulkier than free Lys/Arg, and it can be differentiated by the domains that recognize methylated Lys/Arg residues (69, 74). Sequences that surround the methylate Lys residues are also read by the chromo domains and Tudor domains (69-71). This finding explains why different Lys residues could recruit different proteins on methylation and thus have different biological effects. For example, H3K4 methylation activates transcription by recruiting chromodomain helicase DNA-binding protein 1 (CHD1) specifically in yeast whereas H3K9 methylation represses transcription by recruiting heterochromatin protein 1 (HP1) (75-77).

Nonhistone proteins are known to be methylated on Lys residues, which include transcription factors, such as p53 (78-80), TAF10 (TATA box-binding protein-associated factor 10) (81), and translation factors (63). The p53 protein can be methylated by different methyltransferases [Set9, (78) Smyd2 (79), and Set8 (80)] on different Lys residues (Lys372, 370, and 382, respectively). These different methylation events either activate or repress p53 activity. Arg methylation has been found frequently in nonhistone proteins. For example, PRMT1 has been reported to methylate the transcription factor STAT1 (signal transducers and activators of transcription) (82), PRMT4/CARM1 [coactivator-associated arginine(R) methyltransferase 1] can methylate CBP/p300 (83), and hetergenous nuclear ribonucleoproteins (hnRNPs) and small nuclear ribonucleoproteins (snRNPs) that are involved in pre-mRNA splicing are also Arg methylated (67). The biological functions of these Lys/Arg methylations in most cases can also be explained by the effect of methylation to block or create interaction with other proteins or nucleic acids.

Compared with acetylation, methylation is more stable. For this reason, it was thought that methylation could be a permanent epigenetic mark. The recent discovery of two types of Lys demethylases suggests that methylation is also a reversible PTM. The first Lys demethylase discovered is LSD1 (lysine-specific demethylase 1), which is a FAD (flavin adenine dinucleotide)-dependent enzyme similar to amine oxidases (Fig. 12) (84). It is believed that LSD1 uses two-electron oxidation mechanism and thus cannot demethylate tri-methylated Lys residues (85). The second type of Lys demethylase, which contains the JmjC (Jumonji domain-containing) domain, is a nonheme Fe(II)-dependent enzyme that is capable of doing one-electron oxidation, and thus it can demethylate trimethylated Lys residues (86). The effect of Arg methylation was proposed to be reversed by protein Arg deiminase 4 (PAD4), which generate citrulline via demethyliminiation (87, 88). However, later studies indicate that PAD4 as well as other PAD enzymes do not catalyze demethylimination with appreciable rates in vitro (88-91). A recent report showed that Arg methylation can be truly reversed by JmjC domain containing demethylases, which suggests that PADs are probably not required for Arg demethylation (92). Thus, both Lys and Arg methylation are reversible modifications.

Similar to Lys acetylation, abnormality in Lys methylation has been considered a contributing factor to cancer (93, 94). Decrease in H3 Lys9 and H4 Lys20 trimethyaltion is found in cancer cells. Both H3 Lys9 and H4 Lys20 methylation are associated with heterochromatin formation. Presumably, the decrease in the methyaltion leads to defects in heterochromatin formation, which in turn lead to chromosomal instability and tumor formation (93). Histone methyltransferase fusion proteins generated from chromosomal translocation are found frequently in leukemia and are thought to contribute to the development of leukemia. For example, the H3 Lys79 methyltransferase hDOT1L (human DOT1-like protein) fusion found in mixed lineage leukemia is sufficient to cause leukemic transformation (95). The close association of methylation and cancer suggests that protein methyltranferases and demethylases can be potential therapeutic targets.

Figure 12. (a) Lys/Arg N-methylation (b) mechanism of FAD-dependent LSDI-catalyzed Lys demethylation (c) mechanism of Fe-dependent JHDM (JmjC domain-containing histone demethylase)-catalyzed demethylation.

In eukaryotic cells, glycosylation happens to many membrane and secreted proteins (i.e., proteins that transit through the ER and the Golgi secretary pathway). Glycosylation can occur either on Asn residues (N-glycosylation, Fig. 13), Ser/Thr and post-translationally hydroxylated Lys and Pro residues (O-glycosylation, Fig. 14), or Trp residues (C-glycosylation, Fig. 14). N-glycosylation is a complicated process and involves three stages: 1) the formation of donor substrate with 14 sugar units (GlC3Man9GlcNAc2-PP-dolichol), which occurs in both the cytosolic and the luminal faces of ER (96) 2) the transfer of the tetradecasaccharyl group to the Asn residues found in the consensus sequence Asn-X-Ser/Thr, which occurs in the ER (97) and 3) the hydrolytic removal of the terminal sugar residues on the tetradecasaccharide, the addition of more sugar units (Fig. 13) (98), and the sulfation and phosphorylation of the carbohydrate moieties in the ER and Golgi (99). The later trimming steps can generate different sets of N-linked carbohydrates, such as the high-mannose type glycans, the complex type glycans, and the hybrid type glycans (Fig. 13) (99). Each stage is achieved by the function of multiple proteins. For example, up to nine proteins are required for the transfer of the tetradecasaccharyl group in yeast (100).

Figure 13. Protein N-glycosylation. (1) The formation of the donor substrate with 14 sugar units (Glc3Man9GlcNAc2-PP-dolichol) (2) the reaction scheme that shows the transfer of the tetradecasaccharyl group to the Asn residues found in the consensus sequence Asn-X-Ser/Thr in proteins (3) hydrolytic removal of the terminal sugar residues on the tetradecasaccharide and addition of more sugar units in the ER and Golgi. OSTase, oligosacchryltransferase.

Figure 14. O- and C-glycosylation reactions. UDP, uridine diphosphate.

Different from N-glycosylation, O-glycosylation starts with the addition of a single sugar residue, which can be followed by the addition of more sugars (101). Similar to N-glycosylation, most O-glycosylation also occurs to proteins that transit through ER and Golgi. However, the addition of a single GlcNAc residue to Ser/Thr is a type of O-glycosylation that occurs to cytosolic proteins (102). This cytosolic O-glycosylation has drawn much attention recently because it can regulate the activity of the substrate proteins, especially because it can compete with protein phosphorylation for the same Ser/Thr on substrate proteins (103).

C-glycosylation is the addition of a single mannosyl group to the indole C-2 position of Trp residues of membrane and secreted proteins (104). The Trp residue that is C-mannosylated reside in a consensus Trp-X-X-Trp sequence, and the first Trp is C-mannosylated. About a dozen proteins in humans are C-mannosylated. The enzyme that catalyzes the modification has not been cloned yet, and currently, the function of this modification is not clear.

The large number of enzymes involved in protein glycosylation and the fact that this complicated N-glycosylation pathway is conserved throughout eukaryotic species suggest that glycosylation has important functions. Deficiency in protein glycosylation causes several diseases in humans, such as lysosomal storage diseases (105), congenital disorders of glycosylation, and leukocytes adhesion deficiency II (106). In addition, changes in glycosylation patterns are associated with cancer and inflammation (107). Protein glycosylation can serve several different biological purposes. One purpose is to help proteins that transit through the secretary pathway to fold correctly. Particularly, the removal of the glucose residue by glucosidase II and the reglucosylation in the ER have been well known to help secreted proteins to fold and make sure only correctly folded proteins are secreted (Fig. 15) (108). Protein O-fucosyltransferase I that modifies Notch protein was reported to have chaperon activity that helps Notch folding and secretion, and this chaperon activity is independent of its catalytic activity (109). Glycosylation is also important for sorting secreted proteins. For example, the phosphorylation of Man on N -glycan (Fig. 16) creates a recognition signal for sorting lysosomal proteins to lysosome. Glycosylation is also believed to increase the protein stability, as has been shown for erythropoietin mentioned earlier. Glycosylation is also proposed to affect ligand receptor interaction and thus regulates cell-cell signaling. However, a detailed molecular understanding about the effect of glycosylation on ligand receptor interaction is hard to obtain in most cases. In two well-studied cases, human CD2 (cellular differentiation marker 2) and IgG (immunoglobulin G), N-glycosylation is found to affect the interaction with their ligands or receptors. Structural data show that the carbohydrate portion does not contact the binding partner directly. Instead, glycsosylation affects the binding by changing the conformation of the glycosylated proteins (110-112).

Figure 15. N-glycosylation helps secreted protein to fold correctly in the ER.

Figure 16. Phophorylation of Man on N-glycan. UMP, uridine monophosphate.

Ubiquitin is an abundant small protein (76 amino acids) found in all eukaryotes. It can be conjugated to many proteins covalently and regulates important biological processes. The addition of ubiquitin to substrate proteins goes through an E1-E2-E3 enzymatic cascade (Fig. 17) (113). E1, which is also called ubiquitin-activating protein (UAP), uses ATP to adenylate the C-terminal Gly of ubiquitin and then captures the activated ubiquitin with a Cys residue in the active site. Most eukaryotic species only have one E1 enzyme responsible for activating all the ubiquitin molecules needed. The ubiquitin-E1 conjugate then is recognized by several dozens of E2 enzymes, which capture ubiquitin from E1 via a transthiolation reaction. The ubiquitin-conjugated E2 enzymes are then recognized by many different E3 enzymes, which recruit the substrate proteins and transfer ubiquitin from E2 to Lys residues of the substrate proteins, either directly or indirectly (Fig. 17). Two major families of E3 enzymes exist: the RING (really interesting new gene) E3s and HECT (homologous to E6AP C terminus) E3s. The Pfam database lists more than 400 RING proteins and 70 HECT proteins. Many E3s form complexes with other proteins. One well-understood E3 complex is the SCF (Skp1-Cullin-F Box) RING E3, for which a crystal structure was reported (114). In humans, multiple Cullins and multiple F Box proteins exist (115). Considering the different combinations, the number of possible E3 complexes can be much more than the number of E3 enzymes (3). E3s decide which substrate proteins get ubiquitylated, thus the large number of E3s and E3 complexes reflects the diverse substrate proteins that must be recognized.

Ubiquitin itself has 7 Lys residues (Lys6, 11, 27, 29, 33, 48, and 63) that can be used for ubiquitin attachment, which lead to polyubiquitylation of substrate proteins. Polyubiquitin chain assembled via different Lys residues have different biological functions (116), as will be explained later. Which Lys residue is used in the polyubiqutine chain is controlled by the specific E3 involved. E3 presumably also controls the length of the polyubiquitin chain, although the detailed chain assembly mechanism is still not clear (117). Ubiquitylation can be reversed by the action of ubiquitin-specific proteases (UBPs). About 60 UBPs exist in the human genome, which presumably recognize different types of ubiquitin modifications at various cellular locations (118).

The biological function of ubiquitylation was recognized originally as targeting proteins to the proteasome for degradation. The importance of this function can be illustrated by many examples. In cell division, progression through the cell cycle is driven by cell division kinases, the activities of which are controlled by a group of proteins called cyclins. Different cyclins function only at certain stages of the cell cycle. Then, they must be degraded, which requires polyubiquitylation by specific E3 enzymes (119). Aberration in the ubiquitylation and degradation of cyclins is associated with cancer. Misfolded proteins must be degraded by the ubiquitin and proteasome system. Aggregation of misfolded proteins is known to cause neurodegeneration, such as Parkinson’s disease (116). Ubiquitylation and proteasome degradation of proteins are also important for other biological processes, such as hypoxia and circadian clock. Ubiquitylation is required for the degradation of hypoxia inducible factor (HIF) on hydroxylation at high oxygen levels (120). Maintaining the circadian clock requires the ubiquitylation and degradation of proteins that inhibit the CLOCK (a protein named from circadian locomotor output cycles kaput gene) transcription factor (121).

It is becoming clear that the biological function of ubiquitylation is not limited to proteasome degradation. Other functions have been discovered, such as promoting membrane protein endocytosis, targeting membrane protein to lysosome for degradation, and regulating cytoplasm/nuclear shuttling (116, 122). It is now generally believed that polyubiquitylation via Lys48 of ubiquitin is a signal for proteasomal degradation, and this action requires minimally 4 ubiquitin units in the chain (123). In contrast, monoubiquitylation, multiple monoubiquitylation on different Lys residues of substrate proteins, and polyubiquitylation via Lys 63 of ubiquitin typically signal proteasome-independent pathways (116). How can so many different functions be achieved? The diverse sets of ubiquitin binding domains (UBDs) provide the molecular explanation to this question (19). Presumably, different UBDs recognize different types of ubiquitin modifications (monoubiquitylation vs. polyubiquitylation, and Lys48-linked vs. Lys63-linked polyubiquitylation, for example), and thus they mediate different functional consequences of ubiquitylation. UBD on yeast proteins Rad23, Rpd10, and Dsk2 recognize the Lys48-linked polyubiquitin chain and deliver the modified substrate proteins to the 26 S proteasome (124). The UBD on the vacuolar proteins recognize monoubiquitylation or Lys63-linked polyubiquitin chain on membrane proteins, which mediate their sorting into lysosome or vacuole. Binding of the Lys63-linked polyubiqutin chain on inhibitor of NF-KB kinase (IKK) by other proteins has been proposed to activate IKK and thus turn on NF-kB signaling (116). The recognition of ubiquitin by UBDs can also explain some “unusual” functions of protein ubiquitylation. For example, Lys48-linked polyubiquitylation of a yeast transcription factor Met4p does not signal for proteasome degradation, but instead it inactivates the transcription factor. It inactivates the transcription factor because Met4p has an in-cis UBD that binds the ubiquitin chain and thus inactivates itself and blocks the proteasomal pathway (125).

In generalization of the function of ubiquitylation, we can say that ubiquitin is an “information-rich protein tag” that can be read by different proteins that contain UBD domains (3), and the exact consequence of ubiquitylation is determined by how the tag is recognized. Besides ubiquitin, eukaryotic cells also have about a dozen known ubiquitin-like protein tags, with SUMO being the best studied one. In addition, many proteins have built-in ubiquitin-like domains. The logic that underlies the biological functions of these ubiquitin-like proteins/domains will likely be the same as what is learned from ubiquitin (3).

Figure 17. Ubiquitylation catalyzed by the E1, E2, E3 cascade.

Hydrolytic cleavage of proteins by proteases is an irreversible PTM. The large number (more than 500) of proteases in the human genome indicates that proteolysis occurrs often. Proteases can be classified into four types based on catalytic mechanisms (Fig. 18): Ser/Thr proteases, Cys proteases, Asp proteases, and metalloproteases.

Figure 18. Catalytic mechanism of different proteases.

At first glance, proteolysis may seem to be an uncontrolled destruction process like the digestion of food proteins in the gut. In fact, proteolysis in cells is under tight regulation. Even proteases secreted to the digestive tract must be controlled to avoid self-destruction. Typically, proteases are made in the inactive forms (zymogens) that can be activated by proteolysis. Inside eukaryotic cells, two major locations exist for proteolytic degradation of unwanted proteins: the 26 S proteosome and the lysosome (126, 127). Access to the two degradation organelles is controlled tightly. The lysosome is an acidic membrane organelle that contains many proteases and is responsible for degradation of endocytosed membrane proteins, such as activated receptor tyrosine kinases and G protein-coupled receptors that are ubiquitylated and sorted to the lysosome (described in the ubiquitylation section). The lysosome can also degrade endocytosed or phagocytosed bacterial and viral proteins (128). In autophagy, the lysosome is responsible for degrading cellular organelles and some cytosolic protein complexes (126). The 26 S proteosome (Fig. 19) has a 20 S degradation chamber that consists of four rings α β β α (129). In eukaryotes, each a ring has seven different a subunits, and each α ring has seven different β subunits. Three β subunits are catalytically active Thr proteases that are responsible for the degradation of substrate proteins. By forming this chamber, the active sites of the proteases are buried inside the chamber to avoid proteolysis of proteins that should not be digested. Access to the degradation chamber is controlled by the 19 S regulatory complex that caps both ends of the degradation chamber. The regulatory complex contains subunits that recognize polyubiquitylated substrates, subunits that recycle the ubiquitin tag, and subunits that use ATP hydrolysis to unfold and translocate the protein into the degradation chamber. Degradation of the unwanted proteins by the 26 S proteasome or lysosome in a timely fashion is very important. For example, cyclins that activate cell division kinases have to be polyubiquitylated and degraded by the proteasome at specific times to drive cell cycle progression (119). Degradation of activated membrane receptors in the lysosome is important to avoid over stimulation (130, 131). Misfolded proteins must be degraded by the proteasome or lysosome (in autophagy). Failure to do so is thought to contribute to neurodegeneration disorders such as Parkinson’s disease and Alzheimer’s disease (126).

Figure 19. The eukaryotic 26 S proteasome. Subunit compositions of the 19 S regulatory particle of Saccharomyces cerevisiae is shown on the left. The a and p rings of the 20 S proteasome, each of which consists of seven different subunits, are included to indicate how the base 19 S complex is linked to the core 20 S protease complex. The crystal structure of the 20s degradation chamber is shown in both side and top views (figure made using PDB 1RYP).

In addition to the “destructive” proteolysis processes in the proteasome and lysosome, many “constructive” proteolysis processes occur in cells. In both prokaryotes and eukaryotes, secreted proteins contain a signal peptide at the N-terminus that directs them to the secretary pathway. This signal peptide must be cleaved later by signal peptidases (typically serine proteases) so that the protein can transit further in the secretary pathway (132). Many secreted proteins, which include insulin, TGFβ1 (transforming growth factor β1), nerve growth factor β1, albumin, Factor IX, insulin receptor, and Notch, also contain a propeptide that is cleaved by proprotein convertases in the Golgi (133). Selective proteolysis also occurs at the cell membrane in signal transduction processes. Notch protein, on binding to its ligand Delta/Jagged (membrane proteins on neighboring cells), is cleaved by one of the ADAM (a disintegrase and metallo-protease) proteins at a site close to the transmembrane region. This cleavage activates Notch for regulated intramembrane proteolysis, which cuts within the membrane-spanning region of Notch and releases the intracellular domain of Notch from the cytoplasm membrane. Then, the intracellular domain translocates into the nucleus where it acts as a transcription factor to turn on genes required for development (Fig. 20) (134). Regulated intramembrane proteolysis is catalyzed by the membrane protein complex called presenilin that contains Asp protease subunits. Presenilin is also responsible for cleavage of the amyloid-p precursor proteins in Alzheimer’s disease. This proteolysis-triggered proteolysis signaling occurs often. Similar signaling pathways are present also in bacteria. For example, the release of the transcription factor σ E is achieved via the sequential cleavage of the membrane protein RseA by DegS (a Ser protease) and YaeL (a Zn protease) (135).

Figure 20. Four proteolysis events for Notch that lead to the release of an active transcription factor. TGN, trans Golgi network.

Figure 21. (a) Domain structures of mammalian caspases (b) the caspase cascades and the initiation of apoptosis. Apaf-1, apoptotic protease activation factor-1 Cyto c, cytochrome c FADD, Fas-associated protein with death domain LS, large subunit RAIDD, RIP-associated ICH-1/CED-3 homologous protein with a death domain RIP, receptor-interacting protein TRADD, tumor necrosis factor receptor-associated protein with death domain SS, small subunit.

Similar to the MAP kinase cascades for protein phosphorylation, protease cascades exist, in which downstream proteases are activated by the action of upstream proteases (3). One of the most famous cascades is the caspase cascade that leads to apoptosis (Fig. 22) (11, 136). Caspases are Cys proteases that cleave the amide bond specifically after an Asp residue. Two types of caspases exist, initiator caspases (Caspase 2, 8, 9, 10) and effector caspases (Caspase 3, 6, 7). Both initiator and effector caspases are produced in zymogen forms. Initiator caspases use their N-terminal DED (death effector domain) and CARD (caspase recruitment domain) domains to interact with other proteins to receive apoptosis signals. The signals cause the dimerization of the initiator caspases and activate them so that they can cleave themselves and the effector caspases after specific Asp residues. Cleavage by the initiator caspases activates the effector caspases, which then cleave their substrate proteins to carry out cell apoptosis. The substrate proteins of effector caspases include the inhibitor of caspases-activated DNAse (deoxyribonuclease), Bcl2 (named from B-cell lymphoma 2, an antiapoptotic protein), and PARP-1 (poly(ADP-ribose) polymerase-1, an enzyme catalyzing protein poly(ADP-ribosyl)ation and required for DNA repair). Cleavage of the inhibitor of DNAse by effector caspases activates its catalytic activity, resulting in the fragmentation of chromosomal DNA, which is a hallmark of apoptosis. The caspases cascade and apoptosis is very important for the development and homeostasis of metazoans. Decreased ability of cells to undergo apoptosis will lead to cancer, whereas too much apoptosis can lead to autoimmune diseases (137).

Figure 22. Shokat's (144) ''bump and hole'' method to identify substrates for kinases.

Identifying new pathways regulated by known PTM and discovering new PTM

The brief description above on a few major PTM demonstrates clearly that PTM can regulate many important biological processes. So far, a fairly good understanding of many aspects of PTM has been obtained. What remaining challenges must be addressed?

One direction is to figure out the molecular details of many of the biological processes that are regulated by PTM. Structural biology and biochemisty is needed to answer questions like what structural changes are induced by a particular PTM and how the structure changes lead to changes in activity or recognition by binding partners. Much progress has been made in this direction but still more remains to be figured out. For example, in protein ubiquitylation, no structural details about E1 exist, it is not clear how the polyubiquitin chain is made (117), and it is not clear how specificities of different ubiquitin binding domains are achieved (19).

Another direction is to identify the proteome that is modified by a specific PTM. Advancement in protein identification by mass spectrometry (MS) has greatly facilitated studies in this direction and many efforts have been invested. Generally, an affinity purification method is used to enrich proteins that are modified by a specific PTM, and then these proteins are identified by MS. For example, phosphotyrosine-specific antibodies have been used to enrich proteins that are modified on Tyr residues, and metal affinity columns have been used to isolate all phosphopeptides (138). These isolated phosphoproteins/peptides can then be identified by MS. A His6 tag has been fused to the N-terminus of ubiquitin and used to isolate ubiquitylated proteins that are then identified by MS (139). GlcNAc with an azide group attached has been used to label proteins that are O-GlcNAc modified, and then a biotin tag is conjugated to the modified protein via Staudinger ligation. O-GlcNAc modified proteins can be pulled out using streptavidin beads and identified using MS. Using this method, close to 200 O-GlcNAc modified proteins were identified (140). A clever method to detect protein S-acylation has been reported recently (141).

These proteomic studies have provided much information. However, to understand the function of a PTM in cell physiology completely, it is desirable to know which enzyme is responsible for the modification of a particular substrate protein. With the availability of bioinformatics tools and completed genome sequences, it is now relatively straightforward to identify all the enzymes in a genome that share similar biochemical function. For example, we now know that the human genome contains more than 500 protein kinases, more than 500 proteases, and

400 ubiquitin E3s. But without knowing what substrate proteins they modify, it will be very difficult (if not impossible) to understand their biological functions on a molecular level. Currently, no efficient and reliable method exists yet to identify the substrate proteins for an enzyme. A straightforward method is to make a library of short peptides and try to identify consensus sequences that are recognized by an enzyme (142, 143). The disadvantage is that the structure of a short peptide may be different from the structure of the same sequence present in a folded protein. Thus, the reliability of this method must be validated by other methods. Shokat and coworkers (144) have used a clever approach to identify kinase substrates (Fig. 22). This approach uses a bulky ATP analog that can be used only by a kinase mutant as a cosubstrate. By incubating 32 P-labeled ATP analog and the kinase mutant with cell extract, the substrate proteins of the specific kinase can be labeled. Identification of the substrate proteins may be difficult though because the radiolabeled substrate proteins cannot be enriched/purified easily for identification by MS. It is not clear whether this method can be applied easily to other PTM enzymes.

Parallel to the efforts of identifying substrate proteins for a particular enzyme, the activity-based small molecule probes pioneered by Cravatt and coworkers can facilitate the identification of the biological functions of an enzyme that catalyzes protein post-translational modifications (145). The major advantage of this type of probes is that potentially they can detect enzymes that are in the active states, and thus can provide snapshots of enzymes that are in the active states at different development stages or different types of cells. Among enzymes that catalyze PTM, so far probes have been developed for studying proteases (145, 146), kinases (147), pTyr phosphatases (148), and protein Arg deiminases (149).

Perhaps a more challenging question is how we can discover new PTM reactions. In principle, there are analytic tools that can be used to research this topic. One such tool is top-down FT-MS, which determines the molecular weight of the whole protein with high accuracy. By comparing the obtained tandem MS (MS/MS) result with the expected MS/MS result, post-translational modifications can be identified (150). Crystallography can also discover new PTM, if a protein expressed in the proper host can be crystallized. Some rare modifications or protein side chains were discovered this way (151). However, the success of using these methods would require that a significant portion of the protein population is modified and the modification is stable. This condition cannot be met by all PTM. Thus, discovering new PTM poses a great challenge to chemical biologists. Undoubtedly, new PTM reactions are waiting to be discovered and the identification of these new PTM, together with the identification of new pathways that are regulated by known PTM, will advance our understanding about the molecular logic of living systems.

1. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001 409:860-921.

2. Puente XS, Sanchez LM, Overall CM, Lopez-Otin C. Human and mouse proteases: a comparative genomic approach. Nat. Rev. Genet. 2003 4:544-558.

3. Walsh CT. Posttranslational Modification of Proteins: Expanding Nature’s Inventory. 2005. Roberts and Company Publishers, Englewood, CO.

4. Ahmed N, Thornalley PJ. Advanced glycation endproducts: what is their relevance to diabetic complications? Diabetes Obes. Metab. 2007 9:233-245.

5. Hess DT, Matsumoto A, Kim SO, Marshall HE, Stamler JS. Protein S-nitrosylation: purview and parameters. Nat. Rev. Mol. Cell Biol. 2005 6:150-166.

6. Lippard SJ, Berg JM. Principles of Bioinorganic Chemistry. 1994. University Science Books. Mill Valley, CA.

7. Ruthenburg AJ, et al. Histone H3 recognition and presentation by the WDR5 module of the MLL1 complex. Nat. Struct. Mol. Biol. 2006 13:704-712.

8. Johnson LN, Lewis RJ. Structural basis for control by phosphorylation. Chem. Rev. 2001 101:2209-2242.

9. Johnson LN, Barford D. The effects of phosphorylation on the structure and function of proteins. Annu. Rev. Biophys. Biomol. Struct. 1993 22:199-232.

10. Chen P, Hochstrasser M. Autocatalytic subunit processing couples active site formation in the 20S proteasome to completion of assembly. Cell 1996 86:961-972.

11. Riedl SJ, Shi Y. Molecular mechanisms of caspase regulation during apoptosis. Nat. Rev. Mol. Cell Biol. 2004 5:897-907.

12. Pawson T, Nash P. Assembly of cell regulatory systems through protein interaction domains. Science 2003 300:445-452.

13. Yaffe MB, Elia, AEH. Phosphoserine/threonine-binding domains. Curr. Opin. Cell Biol. 2001 13:131-138.

14. Yaffe MB. Phosphotyrosine-binding domains in signal transduction. Nat. Rev. Mol. Cell Biol. 2002 3:177-186.

15. Xiang-Jiao Y. Lysine acetylation and the bromodomain: a new partnership for signaling. Bioessays 2004 26:1076-1087.

16. Mujtaba S, Zeng L, Zhou MM. Structure and acetyl-lysine recognition of the bromodomain. Oncogene 2007 26:5521-5527.

17. Daniel JA, Pray-Grant MG, Grant PA. Effector proteins for methylated histones. Cell Cycle 2005 4:919-926.

18. Hurley JH, Lee S, Prag G. Ubiquitin-binding domains. Biochem. J. 2006 399:361-372.

19. Harper JW, Schulman BA. Structural complexity in ubiquitin recognition. Cell 2006 1241133-1136.

20. Pittet M, Conzelmann A. Biosynthesis and function of GPI proteins in the yeast Saccharomyces cerevisiae. Biochim. Biophys. Acta 2007 1771:405-420.

21. Farazi TA, Waksman G, Gordon JI. The biology and enzymology of protein N-myristoylation. J. Biol. Chem. 2001 276:39501-39504.

22. McTaggart S. Isoprenylated proteins. Cell. Mol. Life Sci. 2006 63:255-267.

23. Linder ME, Deschenes RJ. Palmitoylation: policing protein stability and traffic. Nat. Rev. Mol. Cell Biol. 2007 8:74-84.

24. Resh MD. Trafficking and signaling by fatty-acylated and prenylated proteins. Nat. Chem. Biol. 2006 2:584-590.

25. Perham RN. Swinging arms and swinging domains in multifunctional enzymes: catalytic machines for multistep reactions. Annu. Rev. Biochem. 2000 69:961-1004.

26. Fischbach MA, Walsh CT. Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem. Rev. 2006 106:3468-3496.

27. Schwartz B, Klinman JP. Mechanisms of biosynthesis of protein-derived redox cofactors. Vitam. Horm. 2001 61:219-239.

28. Ghosh D. Human sulfatases: a structural perspective to catalysis. Cell. Mol. Life Sci. 2007 64:2013-2022.

29. Schwede TF, Retey J, Schulz GE. Crystal structure of histidine ammonia-lyase revealing a novel polypeptide modification as the catalytic electrophile. Biochemistry 1999 38:5355-5361.

30. Calabrese JC, Jordan DB, Boodhoo A, Sariaslani S, Vannelli T. Crystal structure of phenylalanine ammonia lyase: multiple helix dipoles implicated in catalysis. Biochemistry 2004 43:11403-11416.

31. Christenson SD, Liu W, Toney, MD, Shen B. A novel 4-methylideneimidazole-5-one-containing tyrosine aminomutase in enediyne antitumor antibiotic C-1027 biosynthesis. J. Am. Chem. Soc. 2003 125:6062-6063.

32. Poelje PD, Snell EE. Pyruvoyl-dependent enzymes. Annu. Rev. Biochem. 1990 59:29-59.

33. Kadokura H, Katzen F, Beckwith J. Protein disulfide bond formation in prokaryotes. Annu. Rev. Biochem. 2003 72:111-135.

34. Lodish H, et al. Molecular Cell Biology. 2007. W.H. Freeman & Co Ltd, New York.

35. Takeuchi M, Kobata A. Structures and functional roles of the sugar chains of human erythropoietins. Glycobiology 1991 1:337-346.

36. Saiardi A, Bhandari R, Resnick AC, Snowman AM, Snyder SH. Phosphorylation of proteins by inositol pyrophosphates. Science 2004 306:2101-2105.

37. Skalhegg BS, Tasken K. Specificity in the cAMP/PKA signaling pathway. Differential expression,regulation, and subcellular localization of subunits of PKA. Front. Biosci. 2000 5:678-693.

38. Pierce KL, Premont RT, Lefkowitz RJ. Seven-transmembrane receptors. Nat. Rev. Mol. Cell Biol. 2002 3:639-650.

39. Hurley JH. Structure, mechanism, and regulation of mammalian adenylyl cyclase. J. Biol. Chem. 1999 274:7599-7602.

40. Krebs EG, Beavo JA. Phosphorylation-dephosphorylation of enzymes. Annu. Rev. Biochem. 1979 48:923-959.

41. Daniel PB, Walker WH, Habener JF. Cyclic amp signaling and gene regulation. Ann. Rev. Nutr. 1998 18:353-383.

42. Schlessinger J. Cell signaling by receptor tyrosine kinases. Cell 2000 103:211-225.

43. Avruch J. MAP kinase pathways: the first twenty years. Biochim. Biophys. Acta 2007 1773:1150-1160.

44. Melo JV, Barnes DJ. Chronic myeloid leukaemia as a model of disease evolution in human cancer. Oncogene 2007 7:441-453.

45. Roberts PJ, Der CJ. Targeting the Raf-MEK-ERK mitogen- activated protein kinase cascade for the treatment of cancer. Oncogene 2007 26:3291-3310.

46. Shchemelinin I, Sefc L, Necas E. Protein kinase inhibitors. Folia Biol. (Praha) 2006 52:137-148.

47. Laurent Bialy HW. Inhibitors of protein tyrosine phosphatases: next-generation drugs? Angew. Chem. Int. Ed. Engl. 2005 44:3814-3839.

48. Roth SY, Denu JM, Allis CD. Histone acetyl transferases. Annu. Rev. Biochem. 2001 70:81-120.

49. Luger K, Mader AW, Richmond RK, Sargent DF, Richmond TJ. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 1997 389:251-260.

50. Verdone L, Caserta M, Mauro ED. Role of histone acetylation in the control of gene expression. Biochem. Cell Biol. 2005 83:344-353.

51. Yang XJ. Lysine acetylation and the bromodomain: a new partnership for signaling. Bioessays 2004 26:1076-1087.

52. Shahbazian MD, Grunstein M. Functions of site-specific histone acetylation and deacetylation. Annu. Rev. Biochem. 2007 76:75-100.

53. Han J, et al. Rtt109 acetylates histone H3 lysine 56 and functions in DNA replication. Science 2007 15:653-655.

54. Driscoll R, Hudson A, Jackson SP. Yeast Rtt109 promotes genome stability by acetylating histone H3 on lysine 56. Science 2007 315:649-652.

55. Yang XJ, Gregoire S. Metabolism, cytoskeleton and cellular signalling in the grip of protein Ne - and O-acetylation. EMBO Rep. 2007 8:556-562.

56. Grozinger CM, Schreiber SL. Deacetylase enzymes: biological functions and the use of small-molecule inhibitors. Chem. Biol. 2002 9:3-16.

57. Imai SI, Armstrong CM, Kaeberlein M, Guarente L Transcriptional silencing and longevity protein Sir2 is an NAD-dependent histone deacetylase. Nature 2000 403:795-800.

58. Sauve AA, Wolberger C, Schramm VL, Boeke JD. The biochemistry of sirtuins. Annu. Rev. Biochem. 2006 75:435-465.

59. Polevoda B, Sherman F. N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J. Mol. Biol. 2003 325:595-622.

60. Timmermann S, Lehrmann H, Polesskaya A, Harel-Bellan A. Histone acetylation and disease. Cell. Mol. Life Sci. 2001 58:728-736.

61. Varier RA, Swaminathan V, Balasubramanyam K, Kundu TK. Implications of small molecule activators and inhibitors of histone acetyltransferases in chromatin therapy. Biochem. Pharmacol. 2004 68:1215-1220.

62. Marks PA, Breslow R. Dimethyl sulfoxide to vorinostat: development of this histone deacetylase inhibitor as an anticancer drug. Nat. Biotechnol. 2007 25:84-90.

63. Polevoda B, Sherman F. Methylation of proteins involved in translation. Mol. Microbiol. 2007 65:590-606.

64. Fontecave M, Atta M, Mulliez E. S-adenosylmethionine: nothing goes to waste. Trends Biochem. Sci. 2004 29:243-249.

65. Kouzarides T. Histone methylation in transcriptional control. Curr. Opin. Genet. Dev. 2002 12:198-209.

66. Schubert HL, Blumenthal RM, Cheng X. Many paths to methyltransfer: a chronicle of convergence. Trends Biochem. Sci. 2003 28:329-335.

67. Bedford MT, Richard S. Arginine methylation: an emerging regulator of protein function. Mol. Cell 18, 263-272 (2005).

68. Bannister AJ, Kouzarides T. Reversing histone methylation. Nature 2005 436:1103-1106.

69. Jacobs SA, Khorasanizadeh S. Structure of HP1 chromodomain bound to a lysine 9-methylated histone H3 tail. Science 2002 295:2080-2083.

70. Huyen Y, et al. Methylated lysine 79 of histone H3 targets 53BP1 to DNA double-strand breaks. Nature 2004 432:406-411.

71. Flanagan JF, et al. Double chromodomains cooperate to recognize the methylated histone H3 tail. Nature 2005 438:1181-1185.

72. Cote J, Richard S. Tudor domains bind symmetrical dimethylated arginines. J. Biol. Chem. 2005 280:28476-28483.

73. Sprangers R, Groves MR, Sinning I, Sattler M. High-resolution X-ray and NMR structures of the SMN tudor domain: conformational variation in the binding site for symmetrically dimethylated arginine residues. J. Mol. Biol. 2003 327:507-520.

74. Friesen WJ, Massenet S, Paushkin S, Wyce A, Dreyfuss G. SMN, the product of the spinal muscular atrophy gene, binds preferentially to dimethylarginine-containing protein targets. Mol. Cell 2001 7:1111-1117.

75. Bannister AJ, et al. Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature 2001 410:120-124.

76. Nakayama J-I, Rice JC, Strahl BD, Allis CD, Grewal SIS. Role of histone H3 lysine 9 methylation in epigenetic control of heterochromatin assembly. Science 2001 292:110-113.

77. Lachner M, O’Carroll D, Rea S, Mechtler K, Jenuwein T. Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins. Nature 2001 410:116-120.

78. Chuikov S, et al. Regulation of p53 activity through lysine methylation. Nature 2004 432:353-360.

79. Huang J, et al. Repression of p53 activity by Smyd2-mediated methylation. Nature 2006 444:629-632.

80. Shi X, et al. Modulation of p53 function by SET8-mediated methylation at lysine 382. Mol. Cell 2007 27:636-646.

81. Kouskouti A, Scheer E, Staub A, Tora L, Talianidis I. Genespecific modulation of TAF10 function by SET9-mediated methylation. Mol. Cell 2004 14:175-182.

82. Mowen KA, et al. Arginine methylation of STAT1 modulates IFN[alpha]/[beta]-induced transcription. Cell 2001 104:731-741.

83. Xu W, et al. A transcriptional switch mediated by cofactor methylation. Science 2001 294:2507-2511.

84. Shi Y, et al. Histone demethylation mediated by the nuclear amine oxidase Homolog LSD1. Cell 2004 119:941-953.

85. Stavropoulos P, Blobel G, Hoelz A. Crystal structure and mechanism of human lysine-specific demethylase-1. Nat. Struct. Mol. Biol. 2006 13:626-632.

86. Whetstine JR, et al. Reversal of histone lysine trimethylation by the JMJD2 family of histone demethylases. Cell 2006 125:467-481.

87. Wang Y, et al. Human PAD4 regulates histone arginine methylation levels via demethylimination. Science 2004 306:279-283.

88. Thompson PR, Fast W. Histone citrullination by protein arginine deiminase: is arginine methylation a green light or a roadblock? ACS Chem. Biol. 2006 1:433-441.

89. Kearney PL, et al. Kinetic characterization of protein arginine deiminase 4: a transcriptional corepressor implicated in the onset and progression of rheumatoid arthritis. Biochemistry 2005 44:10570-10582.

90. Raijmakers R, et al. Methylation of arginine residues interferes with citrullination by peptidylarginine deiminases in vitro. J. Mol. Biol. 2007 367:1118-1129.

91. Hidaka Y, Hagiwara T, Yamada M. Methylation of the guanidino group of arginine residues prevents citrullination by peptidylarginine deiminase IV. FEBS Lett. 2005 579:4088-4092.

92. Chang B, Chen Y, Zhao Y, Bruick RK. JMJD6 is a histone arginine demethylase. Science 2007 318:444-447.

93. Fraga MF, Esteller M. Towards the human cancer epigenome: a first draft of histone modifications. Cell Cycle 2005 4:1377-1381.

94. Shi Y, Whetstine Jr. Dynamic regulation of histone lysine methylation by demethylases. Mol. Cell 2007 25:1-14.

95. Okada, Y. et al. hDOT1L links histone methylation to leukemogenesis. Cell 2005 121:167-178.

96. Burda P, Aebi M. The dolichol pathway of N-linked glycosylation. Biochim. Biophys. Acta 1999 1426:239-257.

97. Yan A, Lennarz WJ. Unraveling the mechanism of protein N-glycosylation. J. Biol. Chem. 2005 280:3121-3124.

98. Roth J. Protein N-glycosylation along the secretory pathway: relationship to organelle topography and function, protein quality control, and cell interactions. Chem. Rev. 2002 102:285-304.

99. Kornfeld R, Kornfeld S. Assembly of asparagine-linked oligosaccharides. Annu. Rev. Biochem. 1985 54:631-664.

100. Knauer R, Lehle L. The oligosaccharyltransferase complex from yeast. Biochim. Biophys. Acta 1999 1426:259-273.

101. PeterKatalinic J. Methods in enzymology: O-Glycosylation of proteins. Methods Enzymol. 2005 405:139-171.

102. Zachara NE, Hart GW. The emerging significance of O-GlcNAc in cellular regulation. Chem. Rev. 2002 102:431-438.

103. Love DC, Hanover JA. The hexosamine signaling pathway: deciphering the “O-GlcNAc Code”. Sci. STKE 20051re13.

104. Furmanek A, Hofsteenge J. Protein C-mannosylation: facts and questions. Acta Biochim Pol. 2000 47:781-789.

105. Neufeld EF. Lysosomal storage diseases. Annu. Rev. Biochem. 1991 60:257-280.

106. Schachter H. Congenital disorders involving defective N-glycosylation of proteins. Cell. Mol. Life Sci. 2001 58:1085-1104.

107. Dube DH, Bertozzi CR. Glycans in cancer and inflammation - potential for therapeutics and diagnostics. Nat. Rev. Drug Discov. 2005 4:477-488.

108. Trombetta ES, Parodi AJ. Quality control and protein folding in the secretory pathway. Annu. Rev. Cel. Dev. Biol. 2003 19:649-676.

109. Okajima T, Xu A, Lei L, Irvine KD. Chaperone activity of protein O-fucosyltransferase 1 promotes notch receptor folding. Science 2005 307:1599-1603.

110. Wyss DF, et al. Conformation and function of the N-linked glycan in the adhesion domain of human CD2. Science 1995 269:1273-1278.

111. Krapp S, Mimura Y, Jefferis R, Huber R, Sondermann P. Structural analysis of human IgG-Fc glycoforms reveals a correlation between glycosylation and structural integrity. J. Mol. Biol. 2003 325:979-989.

112. Sondermann P, Huber R, Oosthuizen V, Jacob U. The 3.2-A crystal structure of the human IgG1 Fc fragment-FcyRIII complex. Nature 2000 406:267-273.

113. Pickart CM. Mechanisms underlying ubiquitination. Annu. Rev. Biochem. 2001 70:503-533.

114. Zheng N, et al. Structure of the Cul1-Rbx1-Skp1-F boxSkp2 SCF ubiquitin ligase complex. Nature 2002 416:703-709.

115. Cardozo T, Pagano M. The SCF ubiquitin ligase: insights into a molecular machine. Nat. Rev. Mol. Cell Biol. 2004 5:739-751.

116. Mukhopadhyay D, Riezman H. Proteasome-independent functions of ubiquitin in endocytosis and signaling. Science 2007 315:201-205.

117. Hochstrasser M. Lingering mysteries of ubiquitin-chain assembly. Cell 2006 124:27-34.

118. Wing SS. Deubiquitinating enzymes-the importance of driving in reverse along the ubiquitin-proteasome pathway. Int. J. Biochem. Cell Biol. 2003 35:590-605.

119. Reed SI. Ratchets and clocks: the cell cycle, ubiquitylation and protein turnover. Nat. Rev. Mol. Cell Biol. 2003 4:855-864.

120. Schofield CJ, Ratcliffe PJ. Oxygen sensing by HIF hydroxylases. Nat. Rev. Mol. Cell Biol. 2004 5:343-354.

121. Gallego M, Virshup DM. Post-translational modifications regulate the ticking of the circadian clock. Nat. Rev. Mol. Cell Biol. 2007 8:139-148.

122. Salmena L, Pandolfi PP. Changing venues for tumour suppression: balancing destruction and localization by monoubiquitylation. Nat. Rev. Cancer 2007 7:409-413.

123. Thrower JS, Hoffman L, Rechsteiner M, Pickart CM. Recognition of the polyubiquitin proteolytic signal. EMBO J. 2000 19:94-102.

124. Madura K. Rad23 and Rpn10: perennial wallflowers join the melee. Trends Biochem. Sci. 2004 29:637-640.

125. Flick K, Raasi S, Zhang H, Yen JL, Kaiser P. A ubiquitin-interacting motif protects polyubiquitinated Met4 from degradation by the 26S proteasome. Nat. Cell Biol. 2006 8:509-515.

126. Rubinsztein DC. The roles of intracellular protein-degradation pathways in neurodegeneration. Nature 2006 443:780-786.

127. Aaron C. Intracellular protein degradation: from a vague idea, through the lysosome and the ubiquitin-proteasome system, and onto human diseases and drug targeting (nobel lecture). Angew. Chem. Int. Ed. 2005 44:5944-5967.

128. Luzio JP, Pryor PR, Bright NA. Lysosomes: fusion and function. Nat. Rev. Mol. Cell Biol. 2007: 8:622-632.

129. Voges D, Zwickl P, Baumeister W. The 26S proteasome: a molecular machine designed for controlled proteolysis. Annu. Rev. Biochem. 1999 68:1015-1068.

130. Marmor MD, Yarden Y. Role of protein ubiquitylation in regulating endocytosis of receptor tyrosine kinases. Oncogene 2004 23:2057-2070.

131. Shenoy SK. Seven-transmembrane receptors and ubiquitination. Circ. Res. 2007 100:1142-1154.

132. Paetzel M, Karla A, Strynadka NCJ, Dalbey RE. Signal peptidases. Chem. Rev. 2002 102:4549-4580.

133. Rockwell NC, Krysan DJ, Komiyama T, Fuller RS. Precursor processing by Kex2/Furin proteases. Chem. Rev. 2002 102:4525-4548.

134. Fortini ME. [gamma]-Secretase-mediated proteolysis in cell-surface-receptor signalling. Nat. Rev. Mol. Cell Biol. 2002 3:673-684.

135. Young JC, Hartl FU. A stress sensor for the bacterial periplasm. Cell 2003 113:1-2.

136. Yan N, Shi Y. Mechanisms of apoptosis through structural biology. Annu. Rev. Cell. Dev. Biol. 2005 21:35-56.

137. Thompson CB. Apoptosis in the pathogenesis and treatment of disease. Science 1995 267:1456-1462.

138. Kalume DE, Molina H, Pandey A. Tackling the phosphoproteome: tools and strategies. Curr. Opin. Chem. Biol. 2003 7:64-69.

139. Peng J, et al. A proteomics approach to understanding protein ubiquitination. Nat. Biotechnol. 2003 21:921-926.

140. Nandi A, et al. Global identification of O-GlcNAc-modified proteins. Anal. Chem. 2006 78:452-458.

141. Roth AF, et al. Global analysis of protein palmitoylation in yeast. Cell Cycle 2006 125:1003-1013.

142. Songyang Z, et al. Use of an oriented peptide library to determine the optimal substrates of protein kinases. Curr. Biol. 1994 4:973-982.

143. Obata T, et al. Peptide and protein library screening defines optimal substrate motifs for AKT/PKB. J. Biol. Chem. 2000 275:36108-36115.

144. Ubersax JA, et al. Targets of the cyclin-dependent kinase Cdk1. Nature 2003 425:859-864.

145. Evans MJ, Cravatt BF. Mechanism-based profiling of enzyme families. Chem. Rev. 2006 106:3279-3301.

146. Love KR, Catic A, Schlieker C, Ploegh HL. Mechanisms, biology and inhibitors of deubiquitinating enzymes. Nat. Chem. Biol. 2007 3:697-705.

147. Patricelli MP, et al. Functional interrogation of the kinome using nucleotide acyl phosphates. Biochemistry 2007 46:350-358.

148. Kumar S, et al. Activity-based probes for protein tyrosine phosphatases. Proc. Natl. Acad. Sci. U.S.A. 2004 101:7943-7948.

149. Luo Y, Knuckley B, Bhatia M, Pellechia PJ, Thompson PR. Activity-based protein profiling reagents for protein arginine deiminase 4 (PAD4): synthesis and in vitro evaluation of a fluorescently labeled probe. J. Am. Chem. Soc. 2006 128:14468-14469.

150. Sze SK, Ge Y, Oh H, McLafferty FW. From the cover: Top-down mass spectrometry of a 29-kDa protein for characterization of any posttranslational modification to within one residue. Proc. Natl. Acad. Sci. U.S.A. 2002 99:1774-1779.

151. Ermler U, Grabarse W, Shima S, Goubeaud M, Thauer RK. Crystal structure of methyl-coenzyme M reductase: the key enzyme of biologic methane formation. Science 1997 278:1457-1462.

If you are the copyright holder of any material contained on our site and intend to remove it, please contact our site administrator for approval.

Proteins: Post translational modification - Biology

Post-translational modifications are the chemical modifications a polypeptide chain receives after it is translated that convert it to the mature protein.

A protein is made up of a chain of amino acids, also known as a polypeptide. During translation, 20 different amino acids can be incorporated to make up a polypeptide chain. Once a polypeptide is create, post-translational modifications can help convert it into a mature protein and extend its range of functions. Many post-translational modifications happen in the endoplasmic reticulum and the Golgi apparatus.

Post-translational modifications include the addition of biochemical functional groups, a change in the chemical nature of an amino acid, a change in protein structure (such as the formation of disulfide bonds), or proteolysis that cuts part of the polypeptide chain. Some very common post-translational modifications that involve the addition of a functional group to a protein include glycosylation, lipidation, ubiquitination, and phosphorylation.

Proteins that are destined to be embedded in the plasma membrane often undergo glycoslylation or lipidation. Glycosylation describes the addition of a carbohydrate group to a protein, which can act as a cell surface marker, while lipidation describes the addition of a lipid to a protein, which can help anchor that protein to the cell membrane. In ubiquitination, the protein ubiquitin is appended to another protein of interest, marking it for degradation by the proteasome. Protein degradation occurs when there are damaged proteins, or as a normal part of regulating protein activity in the cell.

Phosphorylation and dephosphorylation are common modifications that are used in biochemical signal transduction. The only amino acids that can be phosphorylated are serine, tyrosine and threonine making this a very specific modification to amino acid chains.

It may be necessary for enzymes to cut the polypeptide chain to remove single amino acids or even entire regions of the polypeptide before it can become a functional protein. For example, the first amino acid in a polypeptide (methionine) is often removed, as this usually corresponds to the start codon that initiated translation.

In addition to the standard 20 amino acids, there are also many other non-standard amino acids that can be formed by post-translational modifications.

Practice Questions

MCAT Official Prep (AAMC)

Practice Exam 1 C/P Section Passage 5 Question 24

Biology Question Pack, Vol. 1 Question 64

Section Bank B/B Section Passage 6 Question 42

Practice Exam 2 B/B Section Passage 3 Question 15

Practice Exam 4 B/B Section Passage 5 Question 27

Practice Exam 4 B/B Section Passage 9 Question 50

Key Points

• During protein synthesis, 20 different amino acids can be incorporated into a polypeptide chain to become a protein.

• Post-translational modifications can extend a proteins range of functions.

• A common modification to proteins is the addition of phosphate groups in phosphorylation using ATP, Ubiquitination which marks a protein for destruction or the addition of carbohydrate chains to form glycoproteins.

• Post-translational modifications can involve a functional group being added to a protein, a chemical change in amino acids, a structural change in the protein (such as the formation of disulfide bonds), or proteolysis.

• The only amino acids that can be phosphorylated are serine, tyrosine and threonine making this a very specific modification to amino acid chains.

Post-translational modification: The chemical modification of a protein after its translation.

Amino acid: Any organic compound containing both an amino and a carboxylic acid functional group.

Disulfide bonds: A covalent bond formed between the thiol groups in two cysteine amino acids.

Proteolysis: The breakdown of a protein into polypeptides, usually carried out by an enzyme.

Glycosylation: The addition of a carbohydrate group to a protein.

Lipidation: The addition of a lipid to a protein.

Ubiquitination: The addition of a ubiquitin protein to another protein.

Phosphorylation: The addition of a phosphoryl group to a protein.

Proteasome: Protein complex in the cell that degrades proteins

Protein phosphorylation in hPSCs

Protein phosphorylation and signaling cascades

Similar to protein glycosylation, protein phosphorylation is involved in the regulation of a broad spectrum of cellular processes and states. The phosphorylation state of proteins in typical eukaryotic cells is mainly determined by the activity of protein kinases and phosphatases on their substrates. The covalent conjugation of phosphate groups to peptides frequently alters protein function by inducing conformational changes in proteins or by affecting protein-protein/enzyme-substrate interactions. Many kinases and phosphatases are also phosphorylation substrates, thereby forming mutually-dependent and hierarchically-regulated signaling loops and cascades 72 . Cell fate determination in hPSCs strongly depends on the balance between pluripotency and differentiation signalings. As shown in Figure 4, many signaling pathways critically involved in the embryonic development and the modulation of gene expression for cellular pluripotency and differentiation are initiated from the activation of growth factor receptors that are known receptor tyrosine kinases (RTKs e.g., FGFR and IGF1R) or receptor serine/threonine kinases (e.g., TGFβR and BMPR1/2). It is notable that these signaling pathways have frequent crosstalk with each other, and that the steady state of cellular pluripotency is established on top of an intricate and yet delicately-balanced molecular interaction network 73,74 .

Cell signaling pathways governed by protein phosphorylation and critically involved in embryonic development and the regulation of pluripotent states in PSCs. Many growth factor receptors on the cell surface are receptor kinases. Upon ligand binding, these receptor kinases are fully activated and phosphorylate downstream, intracellular kinases to initiate phosphorylation signaling cascades that frequently regulate the translocation and activity of several transcription factors (e.g., Myc, β-Catenin and Smad proteins) and the expression of pluripotency- or differentiation-associated genes. These signal transduction pathways are highly interactive with each other and influenced by other proteins (e.g., HSPG) on the cell surface or in the microenvironment. Several protein phosphatases (e.g., PTEN and Shp2) that negatively control protein phosphorylation also play critical roles in the modulation of this signaling network and the differentiation potential of PSCs.

Regulation of pluripotency by protein phosphorylation and dephosphorylation

In the stem cell field, many efforts are made to dissect the signaling networks regulated by protein phosphorylation in hPSCs and understand how they function as a whole to regulate cellular pluripotency and differentiation. Advances in protein mass spectrometry have enabled the global, quantitative analysis of dynamic changes in phosphorylated proteins in cells. Several recent studies used phosphoproteomic approaches to systematically investigate phosphorylated proteins in hPSCs. The study done by Swaney et al. 75 identified of more than 11 000 unique phosphopeptides that corresponded to more than 10 000 non-redundant phosphorylation sites in hESCs. Five of these phosphorylation sites were localized to POU5F1 (also known as OCT4) and SOX2 75 . Van Hoof et al. 76 discovered that the phosphorylation state of about 50% of protein phosphorylation sites that they identified was dynamically regulated and rapidly changed in hESCs, responding to the induction of differentiation. These phosphorylation sites included three consecutive serine residues that flank an upstream SUMOylation site and regulate the phosphorylation-dependent SUMOylation of SOX2 76 . Moreover, the comparison between the proteomes and phosphoproteomes of a small number of hESCs and hiPSCs revealed functionally-associated differences in protein expression and phosphorylation in these two types of hPSCs, possibly related to residual regulatory characteristics of the somatic cells used for generating the hiPSCs 10 . It is therefore plausible that the protein phosphorylation modulates pluripotency in hPSCs by acting on the key factors, which are essential for pluripotency in addition to numerous signal transduction molecules. Indeed, there have been several reports suggesting that protein phosphorylation that acts directly on POU5F1, NANOG, SOX2, KLF4 and MYC may affect the function of these transcription factors in the regulation of cellular pluripotency 77 . Variations in protein expression and the phosphorylation state of different hPSC lines may affect their responses to environmental stimuli. Like glycoproteins, phosphoproteins appear to convey information regarding the pluripotent state of hPSCs. Specific types of protein phosphorylation are less likely to be identified as “pluripotency-associated” biomarkers due to the lower degree of structural complexity of protein phosphorylation compared with that of protein glycosylation. However, it is likely that the phosphoproteome or a subset of phosphoproteins could provide a sensitive and useful biomarker for monitoring pluripotency and differentiation in hPSCs.

It is clear that both kinases and phosphatases play critical roles in the proper operation of cell signaling mediated by protein phosphorylation. Unlike many kinases that have been well studied in somatic cells and hPSCs, the importance of protein phosphatases in the regulation of cellular pluripotency is less appreciated. Despite the overwhelming amount of attention that has been focused on kinases in mammalian PSCs, protein phosphatases (alkaline phosphatase in particular) remain one of the earliest-discovered and most commonly used biomarkers for cellular pluripotency 78,79 , indicating the potential functional significance of protein phosphatases in PSCs. Indeed, emerging data have shown that several phosphatases (e.g., PTEN and Shp2) are important for the differentiation capacity and lineage specification of human and murine PSCs. Moreover, suppression of these protein phosphatases inhibits hPSC exit from the pluripotent state during differentiation 80,81,82 . These studies also illustrate how phosphatases affect cellular pluripotency by altering protein phosphorylation in various signaling pathways, and establish a strong rationale for the development of a strategy to stabilize pluripotency by specific interference with the activity of certain phosphatases.

Phosphorylation signaling is potentially influenced by genetic variations and proteoglycans in hPSCs

Numerous studies have suggested that the expression and activity of many protein kinases and phosphatases can be influenced by single nucleotide polymorphisms (SNPs) or rare point mutations existing in human genomes. These genetic variations are functionally associated with the differential regulation of signal transduction and the unequal susceptibility to a variety of disorders among different individuals 83,84,85,86,87,88,89,90,91 . A global analysis of SNP marks in hPSC genomes revealed that the duplication or deletion of several genes (e.g., NRAS, AKT3, RASA3 and DUSP15) involved in phosphorylation signaling networks frequently occurs in hPSCs during cellular reprogramming and long-term culture 92 . Although correlations between the differentiation capacity of hPSCs and these genetic variations have not been systematically examined, it is feasible that cellular pluripotency and differentiation propensity may differ in different hPSC lines partially due to intrinsic genetic variation that alters cell signaling mediated by protein phosphorylation.

As mentioned earlier, protein glycosylation and extracellular proteoglycans are critical for modulating growth factors and plasma membrane-bound receptor kinases to which they bind (Figure 4). This suggests that protein glycosylation and phosphorylation are highly interactive in hPSCs, and that the perturbation of glycomodifications or glycoprotein expression on the cell surface may be frequently accompanied by drastic changes in phosphorylation signaling networks and the pluripotent state.


Reversible phosphorylation is one of the most important and well-studied post-translational modifications. Most commonly occurring on threonine, serine and tyrosine residues, phosphorylation plays critical roles in the regulation of many cellular processes including: cell cycle, growth, apoptosis and differentiation. Thus, the identification and characterization of phosphorylation sites is crucial for the understanding of various signaling events. Mass spectrometry (MS) of phosphopeptides obtained from tryptic protein digests has become a powerful tool for characterization (Stensballe, A., et al., 2000). However, there is a general need to significantly enrich samples for phosphopeptide content in order to compensate for low abundance, poor ionization, and suppression effects (Zhou, W., et al., 2000).

Immobilized metal affinity chromatography (IMAC) has been commonly used for purification of phosphorylated compounds. Our PHOS-Select™ Iron Affinity Gel is prepared with a novel iron [Fe(III)] chelate matrix based on our proprietary (patent pending) NTA analog chelate ligand. That matrix provides high capacity affinity binding of molecules containing phosphate groups, making these products ideal for the enrichment of phosphopeptides from protein tryptic digests, or small organic phosphocompounds (e.g. adenosine 5’-monophosphate). They can also be used for direct transfer of phosphocompounds for analysis by HPLC or mass spectrometry.


PTMs are the chemical modification of a protein after translation and have a wide range of effects on the function and structure of the target proteins. These processes occur on almost all proteins, and many domains within proteins are modified on multiple amino acids by diverse modifications. The function of a modified protein is often strongly affected by these modifications that play important roles in a myriad of cellular processes. There is strong evidence that shows that disruptions in PTMs can lead to various diseases. Hence, increased knowledge about the potential PTMs of a target protein may increase our understanding of the molecular processes in which it takes part. High-throughput experimental methods for the discovery of PTMs are very labor-intensive and time-consuming. Thus, there is an urgent need for prediction methods and powerful tools to predict PTMs. There is a considerable amount of PTM data available from various publicly accessible databanks, which are valuable resources for mining patterns to train new models for PTM prediction. In recent years, many computational methods have been developed for this purpose. However, there are some common weaknesses in assessing these methods, and so it seems that such methods should be evaluated more critically. Considering the diversity of PTMs and new PTMs that are reported every couple of years on one hand, and the advancement of machine learning algorithms on the other hand, we can conclude that this field will attract more attention in the future.

Protein Post-translational Modification Analysis

Up to now, scientists suggested that human proteome presents significantly more complexity than the human genome. While the genome comprises 20,000-25,000 genes, the proteome is estimated at over 1 million proteins. The increase in proteomic diversity is further increased by protein post-translational modifications (PTMs).

PTMs refer to the covalent and generally enzymatic modification of proteins during or after protein biosynthesis. It denotes changes in the polypeptide chain as a result of adding distinct chemical moieties to amino acid residues. PTMs are the foundation of governing intricate cellular process, such as cell division, growth, differentiation, signaling and regulation. Also, PTMs are involved in many cellular processes including the maintenance of protein structure and integrity, regulation of metabolism & defense processes, cellular recognition and morphology alternation. Consequently, analysis of protein post-translational modifications, including the modification categories and modified sites, is particularly crucial for the study of cell biology and disease diagnostics and prevention.

With years&rsquo experience in advanced experiment equipment, Creative Proteomics can provide a variety of PTM services to assist your scientific research, including:

[Structural biology of post-translational modifications of proteins]

A majority of proteins encoded in genomes of limited size are post-translationally diversified by covalent modifications such as glycosylation and ubiquitination. Although recent advances in structural proteomics have enabled high-throughput structure determination of proteins, structural analyses of post-translationally modified proteins remain challenging because of the lack of appropriate determination methods. Therefore, we developed methodologies for characterizing the post-translational modifications of proteins from the structural viewpoint, focusing especially on glycosylation and ubiquitination. For instance, we established a systematic method for structural glycomics to address broader issues, including glycosylation profiling and 3D structure analyses of glycoproteins. Our stable-isotope-assisted NMR techniques in conjunction with X-ray crystallographic approach provide valuable information at the atomic level on conformations, dynamics, and interactions of glycoproteins such as antibody and proteins involved in the ubiquitin-proteasome system. These studies provide the structural basis for improved efficacy of therapeutic antibodies on defucosylation of their Fc glycans and mechanistic insights into ubiquitination reactions in glycoprotein-fate determination in cells. These approaches will allow new possibilities for structural studies on post-translationally modified proteins of clinical, pathological, and pharmaceutical interests.

Watch the video: Πρωτεΐνες (January 2023).