Why do membrane proteins at lower temperatures contain more alpha helices than beta sheets?

Why do membrane proteins at lower temperatures contain more alpha helices than beta sheets?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Why do organisms found at low temperatures have membrane proteins with a higher percentage of alpha helices compare to beta sheets?

The reason could be the greater motility of the alpha helix, compared with the more rigid beta sheet. This allows the enzyme to remain functional at low temperature. The proteins of such organisms tend also to have less hydrophobic regions and covalent bonds. Molecular basis of cold adaptation Salvino D'Amico, Paule Claverie, Tony Collins, Daphne' Georlette, Emmanuelle Gratia, Anne Hoyoux, Marie-Alice Meuwis, Georges Feller and Charles Gerday*

This doesn't refer to alpha helix explicitly, however alpha helix are known to allow more flexibility than beta sheets, so it's reasonable to argue that the increase in alpha helix could be one of several strategies used to obtain more flexible enzymes, as well as other means cited in the paper.

This point is also made by Brock-Biology of microorganisms(2015) pag.161

Several cold-active enzymes whose structure is known show a greater content of α-helix and lesser content of β-sheet secondary structure ( Section 4.14) than do enzymes that show little or no activity in the cold. Because β-sheet secondary structures tend to be more rigid than α-helices, the greater α-helix content of cold-active enzymes allows these proteins greater flexibility for catalyzing their reactions at cold temperatures.

However unfortunately no reference is provided about this claim.

Linkers in Biomacromolecules

Tejas M. Gupte , . Sivaraj Sivaramakrishnan , in Methods in Enzymology , 2021

1.1 α-helices are a dominant structural element in proteins

α-helices, β-sheets and random coils are the most common elements of secondary structure in proteins. α-helices are formed and maintained by backbone interactions parallel to the primary axis of the helix. These interactions are hydrogen bonds between the carbonyl oxygen and amino nitrogen of the ith and i + 4th amino acids. The side chains of all residues in the α-helix are directed outwards and away from the helical axis, and the occurrence of polar or charged side chains in the helix can facilitate additional interactions with other side chains in the helix or with other elements outside of the helical structure, imparting further stability ( Pauling, Corey, & Branson, 1951 ). Consequently, α-helices are the most commonly occurring secondary structure, representing 30% of the structure of the average globular protein ( Pace & Scholtz, 1998 ). In globular domains, α-helices can also pack with β-sheets in different arrangements. Among these, orientation of the α-helix along the strands of β-sheets is energetically most favored, followed in stability by a perpendicular orientation of the helical axis to the β-strands ( Chou, Némethy, Rumsey, Tuttle, & Scheraga, 1985 ). α-helices are also the core component of coiled-coil domains and of transmembrane bundles. Coiled-coil domains consist of two to seven α-helices, where the separate helices stabilize each other through hydrophobic patches of side chain interactions following a “knobs-into-holes” motif ( Crick, 1953 ). Multiple α-helices composed primarily of hydrophobic amino acids can also give rise to transmembrane bundles, through protein-protein and protein-lipid interactions ( Lee, 2003 ).

Linkers in Biomacromolecules

Tejas M. Gupte , . Sivaraj Sivaramakrishnan , in Methods in Enzymology , 2021

1.1 α-helices are a dominant structural element in proteins

α-helices, β-sheets and random coils are the most common elements of secondary structure in proteins. α-helices are formed and maintained by backbone interactions parallel to the primary axis of the helix. These interactions are hydrogen bonds between the carbonyl oxygen and amino nitrogen of the ith and i + 4th amino acids. The side chains of all residues in the α-helix are directed outwards and away from the helical axis, and the occurrence of polar or charged side chains in the helix can facilitate additional interactions with other side chains in the helix or with other elements outside of the helical structure, imparting further stability ( Pauling, Corey, & Branson, 1951 ). Consequently, α-helices are the most commonly occurring secondary structure, representing 30% of the structure of the average globular protein ( Pace & Scholtz, 1998 ). In globular domains, α-helices can also pack with β-sheets in different arrangements. Among these, orientation of the α-helix along the strands of β-sheets is energetically most favored, followed in stability by a perpendicular orientation of the helical axis to the β-strands ( Chou, Némethy, Rumsey, Tuttle, & Scheraga, 1985 ). α-helices are also the core component of coiled-coil domains and of transmembrane bundles. Coiled-coil domains consist of two to seven α-helices, where the separate helices stabilize each other through hydrophobic patches of side chain interactions following a “knobs-into-holes” motif ( Crick, 1953 ). Multiple α-helices composed primarily of hydrophobic amino acids can also give rise to transmembrane bundles, through protein-protein and protein-lipid interactions ( Lee, 2003 ).

CH450 and CH451: Biochemistry - Defining Life at the Molecular Level

Proteinsare one of the most abundant organic molecules in living systems and have the most diverse range of functions of all macromolecules. Proteins may be structural, regulatory, contractile, or protective they may serve in transport, storage, or membranes or they may be toxins or enzymes. Each cell in a living system may contain thousands of different proteins, each with a unique function. Their structures, like their functions, vary greatly. They are all, however, polymers of alpha amino acids, arranged in a linear sequence and connected together by covalent bonds.

Alpha Amino Acid Structure

The major building block of proteins are called alpha (α) amino acids. As their name implies they contain a carboxylic acid functional group and an amine functional group. The alpha designation is used to indicate that these two functional groups are separated from one another by one carbon group. In addition to the amine and the carboxylic acid, the alpha carbon is also attached to a hydrogen and one additional group that can vary in size and length. In the diagram below, this group is designated as an R-group. Within living organisms there are 20 amino acids used as protein building blocks. They differ from one another only at the R-group position. The basic structure of an amino acid is shown below:

Figure 2.1 General Structure of an Alpha Amino Acid

There are a total of 20 alpha amino acids that are commonly incorporated into protein structures (Figure 2.x). The different R-groups have different characteristics based on the nature of atoms incorporated into the functional groups. There are R-groups that predominantly contain carbon and hydrogen and are very nonpolar or hydrophobic. Others contain polar uncharged functional groups such as alcohols, amides, and thiols. A few amino acids are basic (containing amine functional groups) or acidic (containing carboxylic acid functional groups). These amino acids are capable of forming full charges and can have ionic interactions. Each amino acid can be abbreviated using a three letter and a one letter code.

Figure 2.2 Structure of the 20 Alpha Amino Acids used in Protein Synthesis. R-groups are indicated by circled/colored portion of each molecule. Colors indicate specific amino acid classes: Hydrophobic – Green and Yellow, Hydrophilic Polar Uncharged – Orange, Hydrophilic Acidic – Blue, Hydrophilic Basic – Rose.

Nonpolar (Hydrophobic) Amino Acids

The nonpolar amino acids can largely be subdivided into two more specific classes, the aliphatic amino acids and the aromatic amino acids. The aliphatic amino acids (glycine, alanine, valine, leucine, isoleucine, and proline) typically contain branched hydrocarbon chains with the simplest being glycine to the more complicated structures of leucine and valine. Proline is also classified as an aliphatic amino acid but contains special properties as the hydrocarbon chain has cyclized with the terminal amine creating a unique 5-membered ring structure. As we will see in the next section covering primary structure, proline can significantly alter the 3-dimentional structure of the due to the structural rigidity of the ring structure when it is incorporated into the polypeptide chain and is commonly found in regions of the protein where folds or turns occur.

The aromatic amino acids (phenylalanine, tyrosine, and tryptophan), as their name implies, contain an aromatic functional groups within their structure making them largely nonpolar and hydrophobic due to the high carbon/hydrogen content. However, it should be noted that hydrophobicity and hydrophilicity represent a sliding scale and each of the different amino acids can have different physical and chemical properties depending on their structure. For example, the hydroxyl group present in tyrosine increase its reactivity and solubility compared to that of phenylalanine.

Methionine, one of the sulfur-containing amino acids is usually classified under the nonpolar, hydrophobic amino acids as the terminal methyl group creates a thioether functional group which generally cannot form a permanent dipole within the molecule and retains low solubility.

Polar (Hydrophilic) Amino Acids

The polar, hydrophilic amino acids can be subdivided into three major classes, the polar uncharged-, the acidic-, and the basic- functional groups. Within the polar uncharged class, the side chains contain heteroatoms (O, S, or N) that are capable of forming permanent dipoles within the R-group. These include the hydroxyl- and sulfoxyl-containing amino acids, serine, threonine, and cysteine, and the amide-containing amino acids, glutamine and asparigine.Two amino acids, glutamic acid (glutamate), and aspartic acid (aspartate) constitute the acidic amino acids and contain side chains with carboxylic acid functional groups capable of fully ionizing in solution. The basic amino acids , lysine, arginine, and histidinecontain amine functional groups that can be protonated to carry a full charge.

Many of the amino acids with hydrophilic R-groups can participate within the active site of enzymes. An active siteis the part of an enzyme that directly binds to a substrate and carries a reaction. Protein-derived enzymes contain catalytic groups consisting of amino acid R-groups that promote formation and degradation of bonds. The amino acids that play a significant role in the binding specificity of the active site are usually not adjacent to each other in the primary structure, but form the active site as a result of folding in creating the tertiary structure, as you will see later in the chapter.

Protein structures built from the basic amino acids can be hundreds of amino acids long. Thus, for simplicity sake, the 20 amino acids used for protein synthesis have both three letter and one letter code abbreviations (Table 2.1). These abbreviations are commonly used to delineate protein sequences for bioinformatic and research purposes.

Table 2.1 α-Amino Acid Abbreviations

Thought Question: Tryptophan contains an amine functional group, why isn’t tryptophan basic?

Answer: Tryptophan contains an indole ring structure that includes the amine functional group. However, due to the proximity of, and electron withdrawing nature of the aromatic ring structure, the lone pair of electrons on the nitrogen are unavailable to accept a proton. Instead they are involved in forming pi-bonds within several of the different resonance structures possible for the indole ring. Figure 2.3A shows four of the possible resonance structures for indole. Conversely, within the immidazole ring structure found in histidine, there are two nitrogen atoms, one of which is involved in the formation of resonance structures (Nitrogen #1 in Figure 2.3B) and cannot accept a proton, and the other (Nitrogen #3) that has a lone pair of electrons that is available to accept a proton.

Figure 2.3 Comparison of the Structural Availability of Lone Pair of Electrons on Nitrogen to Accept a Proton in the Indole and Immidizole Ring Structures. (A) Shown are four resonance structures of the indole ring structure demonstrating that the lone pair of electrons on the nitrogen are involved in the formation of pi-bonds. (B) The immidazole ring structure has one nitrogen (1) that is involved in resonance structures (not shown) and is not available to accept a proton, while the second nitrogen (3) has a lone pair of electrons available to accept a proton as shown.

Work It Out on Your Own:

Given the example above, describe using a chemical diagram, why the amide nitrogen atoms found in asparagine and glutamine are not basic.

Alpha Amino Acids are Chiral Molecules

If you examine the structure of the alpha carbon within each of the amino acids, you will notice that all of the amino acids, except for glycine are chiral molecules (Figure 2.4) A chiral molecule is one that is not superimposable with its mirror image. Like left and right hands that have a thumb, fingers in the same order, but are mirror images and not the same, chiral molecules have the same things attached in the same order, but are mirror images and not the same. The mirror image versions of chiral molecules have physical properties that are nearly identical to one another, making it very difficult to tell them apart from one another or to separate. Because of this nature, they are given a special stereoisomer name called enantiomersand in fact, the compounds themselves are given the same name! These molecules do differ in the way that they rotate plain polarized light and the way that they react with and interact with biological molecules. Molecules that rotate the light in the right-handed direction are called dextrorotary and are given a D- letter designation. Molecules that rotate light in the left-handed direction are called levorotary and are give an L- letter designation to distinguish one enantiomer from the other. The D- and L- forms of alanine are show in Figure 2.4B.

Although most amino acids can exist in both left and right handed forms, life on Earth is made of left handed amino acids, almost exclusively. Proteogenic amino acids incorporated into proteins by ribosomes are always in the L-conformation. Some bacteria can incorporate D-amino acids into non-ribosomally encoded peptides, but the use of D-amino acids in nature is rare. Interestingly, when we will discuss the structure of sugars in Chapter XX, we will find that sugars that are incorporated into carbohydrate structures are almost exclusively in the D-conformation. No one knows why this is the case. However, Drs. John Cronin and Sandra Pizzarello have shown that of the amino acids that fall to earth from space on meteorites, more are in the L-conformation than the D-conformation. Thus, the fact that we are made predominantly of L-amino acids may be because of amino acids from space.

Why do amino acids in space favor the L-conformation? No one really knows, but it is known that radiation can also exist in left and right handed forms. So, there is a theory called the Bonner hypothesis, that proposes that the predominant forms of radiation in space (ie. from a rotating neutron star for example) could lead to the selective formation of homochiral molecules, such as L-amino acids and D-sugars. This is still speculative, but recent findings from meteorites make this hypothesis much more plausible.

Figure 2.4 Amino Acid Chirality. Except for the simplest amino acid, glycine, all of the other amino acids that are incorporated into protein structures are chiral in nature. (A) Demonstrates the chirality of the core alpha amino acid structure when the non-specific R-group is used. (B) The D- and L-Alanine enantiomer pair, upper diagram represents the ball and stick model and the lower diagram represents the line structure.

Note that the D- and L-designations are specific terms used for the way a molecule rotates plain polarized light. It does not denote the absolute stereo configuration of a molecule. An absolute configuration refers to the spatial arrangement of the atoms of a chiral molecular entity (or group) and its stereochemical description e.g. R or S, referring to Rectus, or Sinister, respectively.

Absolute configurations for a chiral molecule (in pure form) are most often obtained by X-ray crystallography. Alternative techniques are optical rotatory dispersion, vibrational circular dichroism, use of chiral shift reagents in proton NMR and Coulomb explosion imaging. When the absolute configuration is obtained the assignment of R or S is based on the Cahn–Ingold–Prelog priority rules, which can be reviewed by following the link and in Figure 2.5. All of the chiral amino acids, except for cysteine, are also in the S-conformation. Cysteine, contains the sulfur atom causing the R-group to have higher priority than the carboxylic acid functional group, leading to the R-conformation for the absolute stereochemistry. However, cysteine does rotate plain polarized light in the levorotary or left-handed direction. Thus, the R- and S-designations do not always correspond with the D- and L- conformation.

Figure 2.5 Absolute Configuration is Determined by the Rectus (R) and Sinister (S) Designations. In the Cahn Ingold Prelog system for naming chiral centers, the groups attached to the chiral center are ranked according to their atomic number with the highest atomic number receiving the highest priority (A in the diagram above) and the lowest atomic number receiving the lowest priority (D in the diagram above). The lowest priority is then pointed away from the viewer to correctly orient the molecule for further evaluation. The path of priorities #1, #2, and #3 (corresponding to A, B and C above) are then traced. If the path is is in the clockwise direction, the chiral center is given the R-designation, whereas if the path is counterclockwise, it is given the S-designation.

Amino Acids are Zwitterions

In chemistry, a zwitterion is a molecule with two or more functional groups, of which at least one has a positive and one has a negative electrical charge and the net charge of the entire molecule is zero at a specific pH. Because they contain at least one positive and one negative charge, zwitterions are also sometimes called inner salts. The charges on the different functional groups balance each other out, and the molecule as a whole can be electrically neutral at a specific pH. The pH where this happens is known as the isoelectric point.

Unlike simple amphoteric compounds that may only form either a cationic or anionic species, a zwitterion simultaneously has both ionic states. Amino acids are examples of zwitterions (Figure 2.6). These compounds contain an ammonium and a carboxylate group, and can be viewed as arising via a kind of intramolecular acid–base reaction: The amine group deprotonates the carboxylic acid.

Figure 2.6 Amino Acids are Zwitterions. An amino acid contains both acidic (carboxylic acid fragment) and basic (amine fragment) centres. The isomer on the right is the zwitterionic form.

Because amino acids are zwitterions, and several also contain the potential for ionization within their R-groups, their charge state in vivo, and thus, their reactivity can vary depending on the pH, temperature, and solvation status of the local microenvironment in which they are located. The chart of standard pKa values for the amino acids is shown in Table 2.1 and can be used to predict the ionization/charge status of amino acids and their resulting peptides/proteins. However, it should be noted that the solvation status in the microenvironment of an amino acid can alter the relative pKa values of these functional groups and provide unique reactive properties within the active sites of enzymes (Table 2.1). A more in depth discussion of the effects of desolvation will be given in Chapter XX discussing enzyme reaction mechanisms.

Table 2.1

As seen in Table 2.1, seven of the amino acids contain R-groups with ionizable side chains and are commonly found in the active sites of enzymes. Recall that the pKa is defined as the pH at which the ionized and unionized forms of an ionizable functional group within a molecule exist in equal concentrations. Thus, as a functional group shifts above or below its pKa value, there will be a shift in the concentrations of the ionized and unionized forms favoring one state over the other. Figure 2.7 shows the various R-groups in their unionized and ionized states and their favored states either above or below the pKa value.

Figure 2.7 Ionizable Functional Groups in Common Amino Acids. Within all amino acids both the carboxylic acid functional group (C-terminus), and the amine functional group (N-terminus) are capable of ionization. In addition, seven amino acids (aspartic acid, glutamic acid, arginine, histidine, lysine, tyrosine, and cysteine) also contain ionizable functional groups within their R-groups. The functional group’s favored states are shown either above or below their respective pKa values.

Typically an ionizable group will favor the protonated state in pH conditions below its respective pKa values and will favor the deprotonated state in pH conditions above its respective pKa value. Thus, pKa values can be used to help predict the overall charge states of amino acids and their resulting peptides/proteins within a defined environment. For example, if we look at a titration curve for the basic amino acid, histidine (Figure 2.8). As each pKa is reached, the charge state of the amino acid is altered to favor the deprotonated state. Thus, histidine will slowly progress from an overall +2 charge at very low pH (fully protonated) to an overall -1 charge at very high pH (fully deprotonated).

Figure 2.8 Ionization State of Histidine in Different pH Environments. (A) Titration curve of histidine from low pH to high pH. Each equivalence point (pKa) is indicated. (B) Shows the favored ionization state of histidine following the passage of each pKa value.

Extra Practice:

Draw glutamic acid and predict the overall charge state of the amino acid at pH = 1, pH = 3, pH = 7, and pH = 12.

Cysteine and Disulfide Bond Formation

Cysteine is also a unique amino acid as this side chain is capable of undergoing a reversible oxidation-reduction (redox) reaction with other cysteine residues creating a covalent disulfide bondin the oxidized state (Figure 2.9). Recall that when molecules become oxidized that they are losing electrons and that when molecules are reduced that they are gaining electrons. During biological redox reactions, hydrogen ions (protons) are often removed with the electrons from the molecule during oxidation, and are returned during reduction. Thus, if a reaction is losing or gaining protons, this is a good indication that it is also losing or gaining electrons and that a redox reaction is occurring. Thus, proton gain or loss can be an easy way to identify this reaction type.

Disulfide bonds are integral in the formation of the 3-dimentional structure of proteins and can therefore highly impact the function of the resulting protein. In cellular systems, disulfide bond formation/disruption is an enzyme-mediated reaction and can be utilized as a mechanism to control the activity of protein. Disulfide bonds will be discussed in further detail section 2.xx within this chapter and in Chapter XX.

Figure 2.9 Cysteine can be Oxidized to Produce Disulfide Bonds. During disulfide bond formation, two cysteines are oxidized to form a cystine molecule. This requires the loss of two protons and two electrons.

Back to the top

2.2 Peptide Bond Formation and Primary Protein Structure

Within cellular systems, proteins are linked together by a large enzyme complex that contains a mixture of RNA and proteins. This complex is called the ribosome. Thus, as the amino acids are linked together to form a specific protein, they are placed within a very specific order that is dictated by the genetic information contained within the messenger RNA (mRNA) molecule. This specific ordering of amino acids is known as the protein’s primary sequence. The translation mechanism used by the ribosome to synthesize proteins will be discussed in detail in Chapter XX. This chapter will focus only on the chemical reaction occurring during synthesis and the physical properties of the resulting peptides/proteins.

The primary sequence of a protein is linked together using dehydration synthesis (loss of water) that combine the carboxylic acid of the upstream amino acid with the amine functional group of the downstream amino acid to form an amide linkage (Figure 2.10). Similarly, the reverse reaction is hydrolysis and requires the incorporation of a water molecule to separate two amino acids and break the amide bond. Notably, the ribosomeserves as the enzyme that mediates the dehydration synthesis reactions required to build protein molecules, whereas a class of enzymes called proteases are required for protein hydrolysis.

Within protein structures, the amide linkage between amino acids is known as the peptide bond. Subsequent amino acids will be added onto the carboxylic acid terminal of the growing protein. Thus, proteins are always synthesized in a directional manner starting with the amine and ending with the carboxylic acid tail. New amino acids are always added onto the carboxylic acid tail, never onto the amine of the first amino acid in the chain. The directionality of protein synthesis is dictated by the ribosome and is known as N- to C- synthesis.

Figure 2.10 Formation of the Peptide Bond. The addition of two amino acids to form a peptide requires dehydration synthesis.

As noted above in the zwitterion section, amide bonds have a resonance structure that will not allow the nitrogen lone pair of electrons to act as a base (Figure 2.11).

Figure 2.11 Amide Resonance Structure. During amide resonance, the lone pair electrons from the nitrogen are involved in pi-bond formation with the carbonyl carbon forming the double bond. Thus, amide nitrogens are not basic. In addition, the C-N bond within the amide structure is fixed in space and cannot rotate due to the pi-bond character.

Instead, they are involved in pi-bond formation with the carbonyl carbon. Furthermore, the C-N bond within the amide structure is fixed in space and cannot rotate due to the pi-bond character. This creates fixed physical locations of the R-groups within the growing peptide in either the cis or trans conformations. Because the R-groups can be quite bulky, they usually alternate on either side of the growing protein chain in the trans conformation. The cis conformation is only preferred with one specific amino acid, proline. This is due to the cyclic structure of the proline R-group and the steric hindrance that is created when proline adopts the trans conformation (Figure 2.12). Thus, proline residues can have a large impact on the 3-D structure of the resulting peptide.

Figure 2.12 Cis and Trans Conformation of Amino Acid R-Groups. The upper diagram displays the cis and trans conformations of two adjacent amino acids noted as X and Y which indicate any of the 20 amino acids, except for proline. In the trans conformation the R-group from amino acid X is rotated away and on the other side of the molecule when compared with the R-group from amino acid Y. This conformation gives the least amount of steric hindrance compared with the cis conformation where the R-groups are located on the same side and in close proximity to one another. In the lower diagram, any amino acid, X is positioned upstream of a proline residue. Due to the cyclization of the proline R-group with the amide nitrogen in the backbone, this shifts the position of the proline R-group to be in closer proximity to the R-group from amino acid X when it adopts the trans conformation. Thus, proline favors the cis conformation which has less steric hindrance.

Proteins are very large molecules containing many amino acid residues linked together in very specific order. Proteins range in size from 50 amino acids in length to the largest known protein containing 33,423 amino acids. Macromolecules with fewer than 50 amino acids are known as peptides(Figure 2.13).

Figure 2.13 Peptides and Proteins are macromolecules built from long chains of amino acids joined together through amide linkages. The order and nature of amino acids in the primary sequence of a protein determine the folding pattern of the protein based on the surrounding environment of the protein (ie if it is inside the cell, it is likely surrounded by water in a very polar environment, whereas if the protein is embedded in the plasma membrane, it will be surrounded by very nonpolar hydrocarbon tails).

Due to the large pool of amino acids that can be incorporated at each position within the protein, there are billions of different possible protein combinations that can be used to create novel protein structures! For example, think about a tripeptide made from this amino acid pool. At each position there are 20 different options that can be incorporated. Thus, the total number of resulting tripeptides possible would be 20 X 20 X 20 or 20 3 , which equals 8,000 different tripeptide options! Now think about how many options there would be for a small peptide containing 40 amino acids. There would be 20 40 options, or a mind boggling 1.09 X 10 52 potential sequence options! Each of these options would vary in the overall protein shape, as the nature of the amino acid side chains helps to determine the interaction of the protein with the other residues in the protein itself and with its surrounding environment.

The character of the amino acids throughout the protein help the protein to fold and form its 3-dimentional structure. It is this 3-D shape that is required for the functional activity of the protein (ie. protein shape = protein function). For proteins found inside the watery environments of the cell, hydrophobic amino acids will often be found on the inside of the protein structure, whereas water-loving hydrophilic amino acids will be on the surface where they can hydrogen bond and interact with the water molecules. Proline is unique because it has the only R-group that forms a cyclic structure with the amine functional group in the main chain. This cyclization is what causes proline to adopt the cis conformation rather than the trans conformation within the backbone. This shift is structure will often mean that prolines are positions where bends or directional changes occur within the protein. Methionine is unique, in that it serves as the starting amino acid for almost all of the many thousands of proteins known in nature. Cysteines contain thiol functional groups and thus, can be oxidized with other cysteine residues to form covalent disulfide bonds within the protein structure (Figure 2.14). Disulfide bridges add additional stability to the 3-D structure and are often required for correct protein folding and function (Figure 2.14).

Figure 2.14 Disulfide Bonds. Disulfide bonds are formed between two cysteine residues within a peptide or protein sequence or between different peptide or protein chains. In the example above the two peptide chains that form the hormone insulin are depicted. Disulfide bridges between the two chains are required for the proper function of this hormone to regulate blood glucose levels.

Protein Shape and Function

The primary structure of each protein leads to the unique folding pattern that is characteristic for that specific protein. In summary, the primary sequence is the linear order of the amino acids as they are linked together in the protein chain (Figure 2.15). In the next section, we will discuss protein folding that gives rise to secondary, tertiary and sometimes quaternary protein structures.

Figure 2.15 Primary protein structure is the linear sequence of amino acids.

Back to the top

2.3 Secondary Protein Structure

In the previous section, we noted the rigidity created by the C-N bond in the amide linkage when amino acids are joined with one another and learned that this causes the amino acid R-groups to favor the trans confromation (except for proline which favors the cis conformation). This rigidity with the protein backbone limits the folding potential and patterns of the resulting protein. However, the bonds attached to the α-carbon can freely rotate and contribute to the flexibility and unique folding patterns seen within proteins. To evaluate the possible rotation patterns that can arise around the α-carbon, the torsion angles Phi (Φ) and Psi (ψ) are commonly measured. The torsion angle Phi (Φ) measures the rotation around the α-carbon – nitrogen bond by evaluating the angle between the two neighboring carbonyl carbons when you are looking directly down the α-carbon – nitrogen bond into the plane of the paper (Figure 2.16). Conversely, the torsion angle Psi (ψ) measures the rotation around the α-carbon – carbonyl carbon bond by evaluating the angle between the two neighboring nitrogen atoms when you are looking directly down the α-carbon – carbonyl carbon bond (Figure 2.16).

Figure 2.16 Phi (Φ) and Psi (ψ) Torsion Angles. (A) The Phi (Φ) torsion angle is a measure of the rotation around the bond between the α-carbon and the amide nitrogen. It is measured as the angle between the two carbonyl carbon atoms adjacent to the bond, shown in the lower panel. (B) The Psi (ψ) torsion angle is a measure of the rotation around the bond between the α-carbon and the carbonyl carbon. It is measured as the angle between the two nitrogen atoms adjacent to the bond, shown in the lower panel.

While the bonds around the α-carbon can rotate freely, the favored torsion angles are limited to a smaller subset of possibilities as neighboring atoms avoid conformations that have high steric hindrance associated with them. G.N. Ramachandran created computer models of small peptides to determine the stable conformations of the Phi (Φ) and Psi (ψ) torsion angles. With his results, he created what is known as the Ramachandran Plot, which graphically displays the overlap regions of the most favorable Phi (Φ) and Psi (ψ) torsion angles (Figure 2.17)

Figure 2.17 The Ramachandran Plot. Favorable and highly favorable Phi (Φ) and Psi (ψ) torsion angles are indicated in yellow and red, respectively. Bond angles for common secondary protein structures are indicated.

Within each protein small regions of the protein may adopt specific, repeating folding patterns. These specific motifs or patterns are called secondary structure. Two of the most common secondary structural features include alpha helix and beta-pleated sheet(Figure 2.18). Within these structures, intramolecular interactions, especially hydrogen bonding between the backbone amine and carbonyl functional groups are critical to maintain 3-dimensional shape.

Figure 2.18 Secondary Structural Features in Protein Structure. The right-handed alpha helix and beta-pleated sheet are common structural motifs found in most proteins. They are held together by hydrogen bonding between the amine and the carbonyl oxygen within the amino acid backbone.

The Alpha Helix

For the alpha helical structures, the right-handed helix is very common, whereas left-handed helices are very rare. This is due to the Phi (Φ) and Psi (ψ) torsion angles required to obtain the left-handed alpha helical structure. The protein would have to fold and twist through many unfavorable angles before obtaining the correct orientation for the left-handed helix. Thus, they are not very common in nature.

For the right-handed alpha helix, every helical turn has 3.6 amino acid residues (Figure 2.19). The R groups (the variant groups) of the polypeptide protrude out from the α-helix chain. The polypeptide backbone forms a repeating helical structure that is stabilized by hydrogen bonds between a carbonyl oxygen and an amine hydrogen. These hydrogen bonds occur at regular intervals of one hydrogen bond every fourth amino acid and cause the polypeptide backbone to form a helix. Each amino acid advances the helix, along its axis, by 1.5 Å. Each turn of the helix is composed of 3.6 amino acids therefore the pitch of the helix is 5.4 Å. There is an average of ten amino acid residues per helix. Different amino acids have different propensities for forming α-helix. Amino acids that prefer to adopt helical conformations in proteins include methionine, alanine, leucine, glutamate and lysine. Proline and glycine have almost no tendency to form helices.

Figure 2.19 Structure of the Right-handed Alpha Helix. (A) Ball and Stick Model Side View. A total of 3.6 amino acids are required to form one turn of an α-helix. Hydrogen bonding between the carbonyl oxygen and the nitrogen of the 4th amino acid stabilize the helical structure. On the structure shown, the black atoms are the alpha carbon, grey are carbonyl carbons, red are oxygen, blue are nitrogen, green are R-groups, and light purple are hydrogen atoms. (B) Expanded Side View Linear Structure and Space-Filling Model (C) Expanded Top View Linear Structure and Space-Filling Model

Image A modified from: Maksim Image B and C from: Henry Jakubowski

Key Points about the Alpha Helix:

  • The alpha helix is more compact than the fully extended polypeptide chain with phi/psi angles of 180 o
  • In proteins, the average number of amino acids in a helix is 11, which gives 3 turns.
  • The left-handed alpha helix, although allowed from inspections of a Ramachandran plot, is rarely observed, since the amino acids used to build protein structure are L-amino acids and are biased towards forming the right-handed helix. When left-handed helices do form, they are often critical for the correct protein folding, protein stability, or are directly involved in the formation of the active site.

Figure 2.20 Left Handed Alpha Helix Structure. In this diagram the left handed alpha helix, shown in yellow, is part of a hairpin turn within the protein structure and is stabilized by two disulfide bridges shown in yellow.

  • The core of the helix is packed tightly. There are not holes or pores in the helix.
  • All the R-groups extend outward and away from the helix axis. The R-groups can be hydrophilic or hydrophobic, and can be localized in specific positions on the helix forming amphipathic regions on the protein or fully hydrophobic helices may also extend through the plasma membrane as shown in Figure 2.21

Figure 2.21 Positioning of the R-Groups within Alpha Helical Structures. R-groups may be positioned within the alpha helix to create amphipathic regions within the protein, where hydrophilic residues are positioned on one-side of the helix and hydrophobic on the other as shown in the side view (A) or top down views (B & C). R-groups may also be fully hydrophobic within alpha helices that span the plasma membrane as shown in (D).

  • Some amino acids are more commonly found in alpha helices than other. Here are the amino acids that are typically NOT found in alpha helical structures: Gly is too small and conformationally flexible to be found with high frequency in alpha helices, while Pro is too rigid and in the cis-conformation. Pro often disrupts helical structure by causing bends in the protein. Some amino acids with side chains that can H-bond (Ser, Asp, and Asn) and aren’t too long appear to act as competitors of main chain H bond donor and acceptors, and destabilize alpha helices. Early branching R-groups, such as Valand Ile,destabilize the alpha helix due to steric interactions of the bulky side chains with the helix backbone.
  • Summary of amino acidspropensities for alpha helices (and beta structure as well)
  • Alpha keratins, the major component of hair, skin, fur, beaks, and fingernails, are almost all alpha helix.

Jmol: Updated An isolated helix from an Antifreeze Protein Jmol14 (Java) | JSMol (HTML5)

The Beta Pleated Sheet:

In the β-pleated sheet, the “pleats” are formed by hydrogen bonding between atoms on the backbone of the polypeptide chain. The R groups are attached to the carbons and extend above and below the folds of the pleat in the trans conformation. The pleated segments align parallel or antiparallel to each other, and hydrogen bonds form between the partially positive nitrogen atom in the amino group and the partially negative oxygen atom in the carbonyl group of the peptide backbone (Figure 2.21).

Figure 2.21 Beta-Pleated Sheet Structure. The β-pleated sheet can be oriented in the parallel or antiparallel orientation, shown in (A) above with the β-pleated sheet represented by the red ribbon arrows. The direction of the arrow indicated the orientation of the protein with the arrow running in the N- to C- direction. Hydrogen bonding between the backbone carbonyl and the backbone amine functional groups stabilized both the antiparallel (B left) and the parallel (B right) β-pleated sheet structures.

Other Secondary Structure Motifs:

Other important secondary structures include turns, loops, hairpins and flexible linkers. There are many different classifications of turnswithin protein structure, including α-turns, β-turns, γ-turns, δ-turns and π-turns. β-turns (the most common form) typically contain four amino acid residues (Fig 2.22). Proline and Glycine are commonly found in turn motifs, as the cis conformation of Proline favors sharper conformational bends, while the minimal Glycine side chain allows for tighter packing of the amino acids to favor the turn structure.

Figure 2.22 Schematic of Type I and II β-turns.

An ω-loop is a catch-all term for a longer, extended or irregular loop without fixed internal hydrogen bonding. A hairpin is a special case of a turn, in which the direction of the protein backbone reverses and the flanking secondary structure elements interact. For example, a beta hairpin connects two hydrogen-bonded, antiparallel β-strands. Turns are sometimes found within flexible linkers or loops connecting protein domains. Linker sequences vary in length and are typically rich in polar uncharged amino acids. Flexible linkersallow connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics.

Back to the top

2.4 Supersecondary Structure and Protein Motifs

In between the secondary structure and tertiary structure of proteins are larger 3-dimensional features that have been identified in multiple different protein structures. They are known as supersecondary structure and as protein motifs. Supersecondary structureis usually composed of two secondary structures linked together by a turn and includes helix-turn-helix, helix-loop-helix, α-α corners, β-β corners, and β-hairpin-β (Figure 2.23).

Figure 2.23 Examples of Supersecondary Structures. (A) β-hairpin-β structures are characterized by a sharp hairpin turn that does not disrupt the hydrogen bonding of the two β-pleated sheet structures. (B) Proposed helix-turn-helix structure of the Taspase1 protein, (C) α-α corner structure present in the Myoglobin protein.

Protein motifs are more complex structures created from secondary and supersecondary structural components that are repeated modalities visualized in many protein structures.

Beta strands have a tendency to twist in the right hand direction to help minimize conformational energy. This leads to the formation of interesting structural motifs found in many types of proteins. Two of these structures include twisted sheets or saddles as well as beta barrels (Figure 2.24)

Figure 2.24 Common Beta Strand Structural Motifs. (A) Right-handed Twisted Sheet Top and Side View, (B) Beta Barrel Side View, and (C) Beta Barrel Top View

Structural motifs can serve particular functions within proteins such as enabling the binding of substrates or cofactors. For example, the Rossmann fold is responsible for binding to nucleotide cofactors such as nicotinamide adenine dinucleotide (NAD + ) (Figure 2.25). The Rossmann fold is composed of six parallel beta strands that form an extended beta sheet. The first three strands are connected by α-helices resulting in a beta-alpha-beta-alpha-beta structure. This pattern is duplicated once to produce an inverted tandem repeat containing six strands. Overall, the strands are arranged in the order of 321456 (1 = N-terminal, 6 = C-terminal). Five stranded Rossmann-like folds are arranged in the order 32145. The overall tertiary structure of the fold resembles a three-layered sandwich wherein the filling is composed of an extended beta sheet and the two slices of bread are formed by the connecting parallel alpha helices.

Figure 2.25 The Rossman Fold. (A) Structure of Nicotinamide Adenine Dinucleotide (NAD + ) (B) Cartoon diagram of the Rossmann Fold (helices A-F red and strands 1-6 yellow) from E. coli malate dehydrogenase enzyme. The NAD + cofactor is shown binding as the space filling molecule. (C) Schematic diagram of the six stranded Rossmann fold.

One of the features if the Rossmann fold is its co-factor binding specificity. The most conserved segment of Rossmann folds is the first beta-alpha-beta segment. Since this segment is in contact with the ADP portion of dinucleotides such as FAD, NAD and NADP it is also called as an “ADP-binding beta-beta fold”.

Interestingly, similar structural motifs do not always have a common evolutionary ancestor and can arise by convergent evolution. This is the case with the TIM Barrel, a conserved protein fold consisting of eight α-helices and eight parallel β-strands that alternate along the peptide backbone. The structure is named after triosephosphate isomerase, a conserved metabolic enzyme. TIM barrels are one of the most common protein folds. One of the most intriguing features among members of this class of proteins is although they all exhibit the same tertiary fold there is very little sequence similarity between them. At least 15 distinct enzyme families use this framework to generate the appropriate active site geometry, always at the C-terminal end of the eight parallel beta-strands of the barrel.

Figure 2.26 The TIM Barrel. TIM barrels are considered α/β protein folds because they include an alternating pattern of α-helices and β-strands in a single domain. In a TIM barrel the helices and strands (usually 8 of each) form a solenoid that curves around to close on itself in a doughnut shape, topologically known as a toroid. The parallel β-strands form the inner wall of the doughnut (hence, a β-barrel), whereas the α-helices form the outer wall of the doughnut. Each β-strand connects to the next adjacent strand in the barrel through a long right-handed loop that includes one of the helices, so that the ribbon N-to-C coloring in the top view (A) proceeds in rainbow order around the barrel. The TIM barrel can also be thought of, then, as made up of 8 overlapping, right-handed β-α-β super-secondary structures, as shown in the side view (B).

Although the ribbon diagram of the TIM Barrel shows a hole in the protein’s central core, the amino acid side chains are not shown in this representation (Figure 2.26). The protein’s core is actually tightly packed, mostly with bulky hydrophobic amino acid residues although a few glycines are needed to allow wiggle room for the highly constrained center of the 8 approximate repeats to fit together. The packing interactions between the strands and helices are also dominated by hydrophobicity and the branched aliphatic residues valine, leucine, and isoleucine comprise about 40% of the total residues in the β-strands.

As our knowledge continues to increase about the myriad of structural motifs found in nature’s treasure trove of protein structures, we continue to gain insight into how protein structure is related to function and are better enabled to characterize newly acquired protein sequences using in silico technologies.

Back to the top

2.5 Tertiary and Quaternary Protein Structure

The complete 3-dimensional shape of the entire protein (or sum of all the secondary structural motifs) is known as the tertiary structure of the protein and is a unique and defining feature for that protein (Figure 2.27). Primarily, the interactions among R groups creates the complex three-dimensional tertiary structure of a protein. The nature of the R groups found in the amino acids involved can counteract the formation of the hydrogen bonds described for standard secondary structures such as the alpha helix. For example, R groups with like charges are repelled by each other and those with unlike charges are attracted to each other (ionic bonds). Uncharged nonpolar side chains can form hydrophobic interactions. Interaction between cysteine side chains can lead to the formation of disulfide linkages.

Figure 2.27 Tertiary Protein Structure. The tertiary structure of proteins is determined by a variety of chemical interactions. These include hydrophobic interactions, ionic bonding, hydrogen bonding and disulfide linkages.

All of these interactions, weak and strong, determine the final three-dimensional shape of the protein. When a protein loses its three-dimensional shape, it is usually no longer be functional.

In nature, some proteins are formed from several polypeptides, also known as subunits, and the interaction of these subunits forms the quaternary structure. Weak interactions between the subunits help to stabilize the overall structure. For example, insulin (a globular protein) has a combination of hydrogen bonds and disulfide bonds that cause it to be mostly clumped into a ball shape. Insulin starts out as a single polypeptide and loses some internal sequences during cellular processing that form two chains held together by disulfide linkages as shown in figure 2.14. Three of these structures are then grouped further forming an inactive hexamer (Figure 2.28). The hexamer form of insulin is a way for the body to store insulin in a stable and inactive conformation so that it is available for release and reactivation in the monomer form.

Figure 2.28 The Insulin Hormone is a Good Example of Quaternary Structure. Insulin is produced and stored in the body as a hexamer (a unit of six insulin molecules), while the active form is the monomer. The hexamer is an inactive form with long-term stability, which serves as a way to keep the highly reactive insulin protected, yet readily available.

Predicting the folding pattern of a protein based on its primary sequence is an extremely difficult task due to the inherent flexibility of amino acid residues that can be utilized to form different secondary features. As described by Fujiwara, et al., the SCOP classification (Structural Classification of Protein) and SCOPe (the extended version) are major databases providing detailed and comprehensive descriptions of all known protein structures. SCOP classification is based on hierarchical levels: The first two levels, family and superfamily, describe near and far evolutionary relationships, whereas the third, fold, describes geometrical relationships and structural motifs within the protein. Within the fold classification scheme, most proteins are assigned to one of four structural classes: (1) all α-helix, (2) all β-sheet, (3) α/β for proteins with dispersed patterns, and (4) α + β for proteins with regions that are predominated by one or the other pattern type.

Based on their shape, function and location proteins can be characterized broadly as fibrous, globular, membrane, or disordered.

Fibrous Proteins

Fibrous Proteins are characterized by elongated protein structures. These types of proteins often aggregate into filaments or bundles forming structural scaffolds in biological systems. Within animals, the two most abundant fibrous protein families are α-keratin and collagen.


α-keratin is the key structural element making up hair, nails, horns, claws, hooves, and the outer layer of skin. Due to its tightly wound structure, it can function as one of the strongest biological materials and has various uses in mammals, from predatory claws to hair for warmth. α-keratin is synthesized through protein biosynthesis, utilizing transcription and translation, but as the cell matures and is full of α-keratin, it dies, creating a strong non-vascular unit of keratinized tissue.

The first sequences of α-keratins were determined by Hanukoglu and Fuchs. These sequences revealed that there are two distinct but homologous keratin families which were named as Type I keratin and Type II keratins. There are 54 keratin genes in humans, 28 of which code for type I, and 26 for type II. Type I proteins are acidic, meaning they contain more acidic amino acids, such as aspartic acid, while type II proteins are basic, meaning they contain more basic amino acids, such as lysine. This differentiation is especially important in α-keratins because in the synthesis of its sub-unit dimer, the coiled coil, one protein coil must be type I, while the other must be type II (Figure 2.29). Even within type I and II, there are acidic and basic keratins that are particularly complementary within each organism. For example, in human skin, K5, a type II α-keratin, pairs primarily with K14, a type I α-keratin, to form the α-keratin complex of the epidermis layer of cells in the skin.

Coiled-coil dimers then assemble into protofilaments, a very stable, left-handed superhelical motif which further multimerises, forming filaments consisting of multiple copies of the keratin monomers (Figure 2.29). The major force that keeps the coiled-coil structures associated with one another are hydrophobic interactions between apolar residues along the keratins helical segments.

Figure 2.29. Formation of an Intermediate Filament. Intermediate filaments are composed of an α-keratin superhelical complex. Initially, two keratin monomers (A) form a coiled coil dimer structure (B) Two coiled coil dimers join to form a staggered tetramer (C), the tetramers start to join together (D), ultimately forming a sheet of eight tetramers (E). The sheet of eight tetramers is then twisted into a lefthanded helix forming the final intermediate filament (E) An electron micrograph of the intermediate filament is shown in the upper lefthand corner.


The fibrous protein, Collagen is the most abundant protein in mammals, making 25% to 35% of the whole-body protein content. It is found predominantly in the extracellular space within various connective tissues in the body. Collagen contains a unique quaternary structure of three protein strands wound together to form a triple helix. It is mostly found in fibrous tissues such as tendons, ligaments, and skin.

Depending upon the degree of mineralization, collagen tissues may be rigid (bone), compliant (tendon), or have a gradient from rigid to compliant (cartilage). It is also abundant in corneas, blood vessels, the gut, intervertebral discs, and the dentin in teeth. In muscle tissue, it serves as a major component of the endomysium. Collagen constitutes one to two percent of muscle tissue and accounts for 6% of the weight of strong, tendinous, muscles. The fibroblast is the most common cell that creates collagen. Gelatin, which is used in food and industry, is collagen that has been irreversibly hydrolyzed. In addition, partially and fully hydrolyzed collagen powders are used as dietary supplements. Collagen has many medical uses in treating complications of the bones and skin.

The name collagen comes from the Greek (kólla), meaning “glue”, and suffix -gen, denoting “producing”. This refers to the compound’s early use in the process of boiling the skin and tendons of horses and other animals to obtain glue.

Over 90% of the collagen in the human body is type I. However, as of 2011, 28 types of collagen have been identified, described, and divided into several groups according to the structure they form. The five most common types are:

  • Type I: skin, tendon, vasculature, organs, bone (main component of the organic part of bone)
  • Type II: cartilage (main collagenous component of cartilage)
  • Type III: reticulate (main component of reticular fibers), commonly found alongside type I
  • Type IV: forms basal lamina, the epithelium-secreted layer of the basement membrane
  • Type V: cell surfaces, hair, and placenta

Here we will focus on the unique attributes of Collagen Type I. Collagen Type I has an unusual amino acid composition and sequence:

  • Glycine is found at almost every third residue.
  • Proline makes up about 17% of collagen.
  • Collagen contains two uncommon derivative amino acids not directly inserted during translation. These amino acids are found at specific locations relative to glycine and are modified post-translationally by different enzymes, both of which require vitamin C as a cofactor (Figure 2.30).
    • Hydroxyproline derived from proline
    • Hydroxylysine derived from lysine – depending on the type of collagen, varying numbers of hydroxylysines are glycosylated (mostly having disaccharides attached).

    Figure 2.30. Hydroxylation of Proline and Lysine During the Post-Translational Modification of Collagen Type I. The enzymes prolyl hydroxylase and lysyl hydroxylase are required for the hydroxylation of proline (A) and lysine (B) residues, respectively. (Note: While position 3 is shown above, prolyl residues may alternatively be hydroxylated at the 4-position). The hydroxylase enzymes modify amino acid residues after they have been incorporated into the protein as a post-translational modification and require vitamin C (ascorbate) as a cofactor. (C) Further modification of the hydroxylysine residues by glycosylation can lead to the incorporation of the disaccharide (galactose-glucose) at the hydroxy oxygen.

    Most collagen forms in a similar manner. The synthesis process for Collagen Type I is described below and showcases the complexity of protein folding and processing (Figure 2.31).

    1. Inside the cell
      1. Two types of alpha chains are formed during translation on ribosomes along the rough endoplasmic reticulum (RER): alpha-1 and alpha-2 chains. These peptide chains (known as preprocollagen) have registration peptides on each end and a signal peptide.
      2. Polypeptide chains are released into the lumen of the RER.
      3. Signal peptides are cleaved inside the RER and the chains are now known as pro-alpha chains.
      4. Hydroxylation of lysine and proline amino acids occurs inside the lumen. This process is dependent on ascorbic acid (vitamin C) as a cofactor.
      5. Glycosylation of specific hydroxylysine residues occurs.
      6. Triple alpha helical structure is formed inside the endoplasmic reticulum from two alpha-1 chains and one alpha-2 chain.
      7. Procollagen is shipped to the Golgi apparatus, where it is packaged and secreted by exocytosis.
      1. Registration peptides are cleaved and tropocollagen is formed by procollagen peptidase.
      2. Multiple tropocollagen molecules form collagen fibrils, via covalent cross-linking (aldol reaction) by lysyl oxidase which links hydroxylysine and lysine residues. Multiple collagen fibrils form into collagen fibers.
      3. Collagen may be attached to cell membranes via several types of protein, including fibronectin, laminin, fibulin and integrin.

      Figure 2.31. Synthesis of Collagen Type I. Polypeptide chains are synthesized in the endoplasmic reticulum and released into the lumen where they are hydroxylated and glycosylated. The procollagen triple helix is formed and transported through the golgi apparatus where it is further processed. Procollagen is secreted into the extracellular matrix where it is cleaved into tropocollagen. Tropocollagen assembles into a collagen fibril where crosslinking and hydrogen bonding occur to form the final collagen fiber.

      Vitamin C deficiency causes scurvy, a serious and painful disease in which defective collagen prevents the formation of strong connective tissue. Gums deteriorate and bleed, with loss of teeth skin discolors, and wounds do not heal. Prior to the 18th century, this condition was notorious among long-duration military, particularly naval, expeditions during which participants were deprived of foods containing vitamin C.

      An autoimmune disease such as lupus erythematosus or rheumatoid arthritis may attack healthy collagen fibers. Cortisol stimulates degradation of collagen into amino acids, suggesting that stress can worsen these disease states.

      Many bacteria and viruses secrete virulence factors, such as the enzyme collagenase, which destroys collagen or interferes with its production.

      Back to the top

      Globular Proteins

      Globular proteins or spheroproteinsare spherical (“globe-like”) proteins and are one of the common protein types. Globular proteins are somewhat water-soluble (forming colloids in water), unlike the fibrous or membrane proteins. There are multiple fold classes of globular proteins, since there are many different architectures that can fold into a roughly spherical shape.

      The term globin can refer more specifically to proteins including the globin fold. The globin fold is a common three-dimensional fold in proteins and defines the globin-like protein superfamily (Figure 2.32). This fold typically consists of eight alpha helices, although some proteins have additional helix extensions at their termini. The globin fold is found in its namesake globin protein families: hemoglobins and myoglobins, as well as in phycocyanins. Because myoglobin was the first protein whose structure was solved, the globin fold was thus the first protein fold discovered. Since the globin fold contains only helices, it is classified as an all-alpha protein fold.

      Figure 2.32 The Globin Fold. (A) An example of the globin fold, the oxygen-carrying protein myoglobin (PBD ID 1MBA) from the mollusc Aplysia limacina. (B) Structure of the tetrameric hemoglobin protein containing a total of four globin folds.

      The term globular protein is quite old (dating probably from the 19th century) and is now somewhat archaic given the hundreds of thousands of proteins and more elegant and descriptive structural motif vocabulary. The spherical structure is induced by the protein’s tertiary structure. The molecule’s apolar (hydrophobic) amino acids are bounded towards the molecule’s interior whereas polar (hydrophilic) amino acids are bound outwards, allowing dipole-dipole interactions with the solvent, which explains the molecule’s solubility.

      Unlike fibrous proteins which play a predominant structural function, globular proteins can act as:

      • Enzymes , by catalyzing organic reactions taking place in the organism in mild conditions and with a great specificity. Different esterases fulfill this role.
      • Messengers, by transmitting messages to regulate biological processes. This function is done by hormones, i.e. insulin etc.
      • Transporters of other molecules through membranes
      • Stocks of amino acids.
      • Regulatory roles are also performed by globular proteins rather than fibrous proteins.
      • Structural proteins, e.g., actin and tubulin, which are globular and soluble as monomers, but polymerize to form long, stiff fibers

      Many of the proteins that will be detailed in later chapters will fall into this class of proteins.

      Membrane Proteins

      Membrane proteinsare proteins that are part of, or interact with, biological membranes. They include: 1) integral membrane proteins, which are part of or permanently anchored to the membrane, and 2) peripheral membrane proteins, which are attached temporarily to the membrane via integral proteins or the lipid bilayer. The integral membrane proteins are further classified as transmembrane proteins that span across the membrane, or integral monotopic proteins, which are to attached to only one side of the membrane.

      Membrane proteins, like soluble globular proteins, fibrous proteins, and disordered proteins, are common. Symbolic of their importance in medicine, membrane proteins are the targets of over 50% of all modern medicinal drugs. It is estimated that 20–30% of all genes in most genomes encode for membrane proteins. Compared to other classes of proteins, determining membrane protein structures remains a challenge in large part due to the difficulty in establishing experimental conditions that can preserve the correct conformation of the protein in isolation from its native environment (Figure 2.33).

      Membrane proteins perform a variety of functions vital to the survival of organisms:

      • Membrane receptor proteins relay signals between the cell’s internal and external environments.
      • Transport proteins move molecules and ions across the membrane. They can be categorized according to the Transporter Classification database.
      • Membrane enzymes may have many activities, such as oxidoreductase, transferase or hydrolase.
      • Cell adhesion molecules allow cells to identify each other and interact. For example, proteins involved in immune response.

      Figure 2.33 Schematic representation of transmembrane proteins. 1. a single transmembrane α-helix (bitopic membrane protein) 2. a polytopic transmembrane α-helical protein 3. a polytopic transmembrane β-sheet protein. The membrane is represented in light-brown.

      Integral membrane proteins are permanently attached to the membrane. Such proteins can be separated from the biological membranes only using detergents, nonpolar solvents, or sometimes denaturing agents. They can be classified according to their relationship with the bilayer:

      • Integral polytopic proteinsare transmembrane proteins that span across the membrane more than once. These proteins may have different transmembrane topology. These proteins have one of two structural architectures:
        • helix bundle proteins, which are present in all types of biological membranes
        • beta barrel proteins, which are found only in outer membranes of Gram-negative bacteria, and outer membranes of mitochondria and chloroplasts.

        Figure 2.34 Schematic representation of the different types of interaction between monotopic membrane proteins and the cell membrane. 1. interaction by an amphipathic α-helix parallel to the membrane plane (in-plane membrane helix) 2. interaction by a hydrophobic loop 3. interaction by a covalently bound membrane lipid (lipidation) 4. electrostatic or ionic interactions with membrane lipids.

        Peripheral membrane proteins are temporarily attached either to the lipid bilayer or to integral proteins by a combination of hydrophobic, electrostatic, and other non-covalent interactions. Peripheral proteins dissociate following treatment with a polar reagent, such as a solution with an elevated pH or high salt concentrations.

        Integral and peripheral proteins may be post-translationally modified, with added fatty acid, diacylglycerol or prenyl chains, or GPI (glycosylphosphatidylinositol), which may be anchored in the lipid bilayer.

        Disordered Proteins

        An intrinsically disordered protein (IDP) is a protein that lacks a fixed or ordered three-dimensional structure (Figure 2.35). IDPs cover a spectrum of states from fully unstructured to partially structured and include random coils, (pre-)molten globules, and large multi-domain proteins connected by flexible linkers. They constitute one of the main types of protein (alongside globular, fibrous and membrane proteins).

        Figure 2.35 Conformational flexibility in SUMO-1 protein (PDB:1a5r). The central part shows relatively ordered structure. Conversely, the N- and C-terminal regions (left and right, respectively) show ‘intrinsic disorder’, although a short helical region persists in the N-terminal tail. Ten alternative NMR models were morphed. Secondary structure elements: α-helices (red), β-strands (blue arrows).

        The discovery of IDPs has challenged the traditional protein structure paradigm, that protein function depends on a fixed three-dimensional structure. This dogma has been challenged over the last twenty years by increasing evidence from various branches of structural biology, suggesting that protein dynamics may be highly relevant for such systems. Despite their lack of stable structure, IDPs are a very large and functionally important class of proteins. In some cases, IDPs can adopt a fixed three-dimensional structure after binding to other macromolecules. Overall, IDPs are different from structured proteins in many ways and tend to have distinct properties in terms of function, structure, sequence, interactions, evolution and regulation.

        In the 1930s -1950s, the first protein structures were solved by protein crystallography. These early structures suggested that a fixed three-dimensional structure might be generally required to mediate biological functions of proteins. When stating that proteins have just one uniquely defined configuration, Mirsky and Pauling did not recognize that Fisher’s work would have supported their thesis with his ‘Lock and Key’ model (1894). These publications solidified the central dogma of molecular biology in that the sequence determines the structure which, in turn, determines the function of proteins. In 1950, Karush wrote about ‘Configurational Adaptability’ contradicting all the assumptions and research in the 19th century. He was convinced that proteins have more than one configuration at the same energy level and can choose one when binding to other substrates. In the 1960s, Levinthal’s paradox suggested that the systematic conformational search of a long polypeptide is unlikely to yield a single folded protein structure on biologically relevant timescales (i.e. seconds to minutes). Curiously, for many (small) proteins or protein domains, relatively rapid and efficient refolding can be observed in vitro. As stated in Anfinsen’s Dogma from 1973, the fixed 3D structure of these proteins is uniquely encoded in its primary structure (the amino acid sequence), is kinetically accessible and stable under a range of (near) physiological conditions, and can therefore be considered as the native state of such “ordered” proteins.

        During the subsequent decades, however, many large protein regions could not be assigned in x-ray datasets, indicating that they occupy multiple positions, which average out in electron density maps. The lack of fixed, unique positions relative to the crystal lattice suggested that these regions were “disordered”. Nuclear magnetic resonance spectroscopy of proteins also demonstrated the presence of large flexible linkers and termini in many solved structural ensembles. It is now generally accepted that proteins exist as an ensemble of similar structures with some regions more constrained than others. Intrinsically Unstructured Proteins (IUPs)occupy the extreme end of this spectrum of flexibility, whereas IDPs also include proteins of considerable local structure tendency or flexible multidomain assemblies. These highly dynamic disordered regions of proteins have subsequently been linked to functionally important phenomena such as allosteric regulation and enzyme catalysis.

        Many disordered proteins have the binding affinity with their receptors regulated by post-translational modification, thus it has been proposed that the flexibility of disordered proteins facilitates the different conformational requirements for binding the modifying enzymes as well as their receptors. Intrinsic disorder is particularly enriched in proteins implicated in cell signaling, transcription and chromatin remodeling functions.

        Flexible linkers

        Disordered regions are often found as flexible linkers or loops connecting domains. Linker sequences vary greatly in length but are typically rich in polar uncharged amino acids. Flexible linkers allow the connecting domains to freely twist and rotate to recruit their binding partners via protein domain dynamics. They also allow their binding partners to induce larger scale conformational changes by long-range allostery.

        Linear motifs

        Linear motifs are short disordered segments of proteins that mediate functional interactions with other proteins or other biomolecules (RNA, DNA, sugars etc.). Many roles of linear motifs are associated with cell regulation, for instance in control of cell shape, subcellular localisation of individual proteins and regulated protein turnover. Often, post-translational modifications such as phosphorylation tune the affinity (not rarely by several orders of magnitude) of individual linear motifs for specific interactions. Unlike globular proteins IDPs do not have spatially-disposed active pockets. Nevertheless, in 80% of IDPs (

        3 dozens) subjected to detailed structural characterization by NMR there are linear motifs termed PreSMos (pre-structured motifs) that are transient secondary structural elements primed for target recognition. In several cases it has been demonstrated that these transient structures become full and stable secondary structures, e.g., helices, upon target binding. Hence, PreSMos are the putative active sites in IDPs.

        Coupled folding and binding

        Many unstructured proteins undergo transitions to more ordered states upon binding to their targets. The coupled folding and binding may be local, involving only a few interacting residues, or it might involve an entire protein domain. It was recently shown that the coupled folding and binding allows the burial of a large surface area that would be possible only for fully structured proteins if they were much larger. Moreover, certain disordered regions might serve as “molecular switches” in regulating certain biological function by switching to ordered conformation upon molecular recognition like small molecule-binding, DNA/RNA binding, ion interactions.

        Disorder in the bound state (fuzzy complexes)

        Intrinsically disordered proteins can retain their conformational freedom even when they bind specifically to other proteins. The structural disorder in bound state can be static or dynamic. In fuzzy complexesstructural multiplicity is required for function and the manipulation of the bound disordered region changes activity. The conformational ensemble of the complex is modulated via post-translational modifications or protein interactions. Specificity of DNA binding proteins often depends on the length of fuzzy regions, which is varied by alternative splicing. Intrinsically disordered proteins adapt many different structures in vivo according to the cell’s conditions, creating a structural or conformational ensemble.

        Therefore, their structures are strongly function-related. However, only few proteins are fully disordered in their native state. Disorder is mostly found in intrinsically disordered regions (IDRs) within an otherwise well-structured protein. The term intrinsically disordered protein (IDP) therefore includes proteins that contain IDRs as well as fully disordered proteins.

        The existence and kind of protein disorder is encoded in its amino acid sequence. In general, IDPs are characterized by a low content of bulky hydrophobic amino acids and a high proportion of polar and charged amino acids, usually referred to as low hydrophobicity. This property leads to good interactions with water. Furthermore, high net charges promote disorder because of electrostatic repulsion resulting from equally charged residues. Thus disordered sequences cannot sufficiently bury a hydrophobic core to fold into stable globular proteins. In some cases, hydrophobic clusters in disordered sequences provide the clues for identifying the regions that undergo coupled folding and binding (refer to biological roles).

        Many disordered proteins reveal regions without any regular secondary structure These regions can be termed as flexible, compared to structured loops. While the latter are rigid and contain only one set of Ramachandran angles, IDPs involve multiple sets of angles. The term flexibility is also used for well-structured proteins, but describes a different phenomenon in the context of disordered proteins. Flexibility in structured proteins is bound to an equilibrium state, while it is not so in IDPs. Many disordered proteins also reveal low complexity sequences, i.e. sequences with over-representation of a few residues. While low complexity sequences are a strong indication of disorder, the reverse is not necessarily true, that is, not all disordered proteins have low complexity sequences. Disordered proteins have a low content of predicted secondary structure.

        Back to the top

        2.6 Protein Folding, Denaturation and Hydrolysis

        Protein folding is the physical process by which a protein chain acquires its native 3-dimensional structure, a conformation that is usually biologically functional, in an expeditious and reproducible manner (Figure 2.36). It is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil. Each protein exists as an unfolded polypeptide or random coil when translated from a sequence of mRNA to a linear chain of amino acids. This polypeptide lacks any stable (long-lasting) three-dimensional structure (the left hand side of the first figure). As the polypeptide chain is being synthesized by a ribosome, the linear chain begins to fold into its three-dimensional structure. Folding begins to occur even during translation of the polypeptide chain. Amino acids interact with each other to produce a well-defined three-dimensional structure, the folded protein (the right hand side of the figure), known as the native state. The resulting three-dimensional structure is determined by the amino acid sequence or primary structure (Anfinsen’s dogma).

        Figure 2.36 Protein Before and After Folding

        The correct three-dimensional structure is essential to function, although some parts of functional proteins may remain unfolded or as in the case of IDPs remain flexible, so that protein dynamics is important. Failure to fold into native structure generally produces inactive proteins, but in some instances misfolded proteins have modified or toxic functionality. Several neurodegenerative and other diseases are believed to result from the accumulation of misfolded proteins, such as amyloid fibrils found in Alzheimer’s patients.

        Folding is a spontaneous process that is mainly guided by hydrophobic interactions, formation of intramolecular hydrogen bonds, van der Waals forces, and it is opposed by conformational entropy. The process of folding often begins co-translationally, so that the N-terminus of the protein begins to fold while the C-terminal portion of the protein is still being synthesized by the ribosome however, a protein molecule may fold spontaneously during or after biosynthesis. While these macromolecules may be regarded as “folding themselves”, the process also depends on the solvent (water or lipid bilayer), the concentration of salts, the pH, the temperature, the possible presence of cofactors and of molecular chaperones. Proteins will have limitations on their folding abilities by the restricted bending angles or conformations that are possible, as described by the Ramachandran plot.

        Figure 2.37 Hydrophobic collapse. In the compact fold (to the right), the hydrophobic amino acids (shown as black spheres) collapse toward the center to become shielded from aqueous environment.

        2. Differences between CD spectroscopy of soluble and membrane proteins

        2.1 Effects of fold characteristics of membrane proteins on spectral properties

        Fig. 1 Circular dichroism spectra typical of membrane proteins composed of different secondary structural types: predominantly antiparallel alpha-helical bundle (red: a sodium channel pore 48 ) predominantly beta-barrel (blue: BTUB outer membrane cobalamin transporter 23 ), mixed helical, beta sheet and unordered structure (green: WZA translocon for capsular polysaccharides 23 ). The CD spectra correspond to PCDDB 47 IDs CD0004012000, CD0000102000, CD0000128000, respectively. Inset are the crystal structures (PDB IDs 4F4L, 1NQE, and 2J58, respectively) of these proteins depicted in the same colour scheme.

        2.2 Effects of environmental and physical characteristics of membrane proteins on spectral properties

        The dielectric constant (∼1–2) of the hydrophobic core of a detergent micelle or phospholipid bilayer in which a membrane protein is embedded is considerably lower than that of water (∼80). This can cause both bathochromic and hypsochromic shifts in the CD spectrum of proteins measured in these environments compared to the spectra of proteins comprised of similar secondary structures but present in aqueous solution. 19,20 The extent and nature of the shift depends on which electronic transition in the peptide is examined, and on the relative location of the peptide bond with respect to the membrane environment. The direction and magnitudes of the shifts are ultimately related to the changes in the energy gap between the ground and excited states of the transitions, and the peak positions can vary substantially between the same type of secondary structure in aqueous solution and in membranes. 21 As the n → π* and π → π* transitions are differentially affected (Fig. 2), the wavelength dependence on solvent dielectric is non-linear and hence cannot be corrected simply by shifting the entire spectrum. Such shifts in peak positions can have significant effects on secondary structure analyses, and tend to produce inaccurate results when using standard deconvolution methods with the commonly-used reference datasets derived from soluble proteins. 21

        Fig. 2 Demonstration of spectral shifts observed for each of the different electronic transitions in membrane protein spectra relative to those in soluble protein spectra. Membrane proteins (black spectra) and soluble proteins (grey spectra) were selected to have matching secondary structures in each case. 21 Left: Predominantly helical proteins. Right: Predominantly beta sheet proteins. In both examples the arrows indicate the peak positions (in black and grey, respectively) for the membrane and soluble proteins. It is notable that not all of the peaks shift in the same direction, nor to the same extent.
        Fig. 3 Diagram indicating the nature of the phenomenon of light scattering. I 0 is the light incident onto the scattering sample (red circles represent membrane particles). I t is the transmitted light that impinges on the detector and is used to measure the light absorbed by the sample. I s is light scattered in a direction that does not intersect with the detector, and contributes to the additional “apparent” (but not actual) absorbed light. θ is the acceptance angle of the detector and describes the angle of the scattered light that impinges on the detector.
        Fig. 4 Light scattering effects in CD spectra: 14,15 effects of changing the detector acceptance angle ( θ ). Left: Sample (bacteriorhodopsin in octyl glucoside micelles) that does not exhibit scattering, measured at two different values of θ (2 degrees: dashed line and 90 degrees: solid line). It can be seen that the spectra are essentially identical. Right: Sample (bacteriorhodopsin in purple membranes) that exhibits scattering (2 degrees: dashed line and 90 degrees: solid line). In this case the spectra are very different, both in magnitudes and peak positions of the wavelength maxima/minima.

        The simplest methods involve reducing the size of the particles so they are much smaller (∼1/10) than the wavelengths of the UV light used in the investigation. 14 Most detergent micelles are too small to exhibit substantial scattering in the far UV region of the spectrum. However, there may be some concern about the conformation of a protein in a micellar environment being different from that in a membrane due to the packing constraints imposed by the relative size, shape and charge of the detergent head groups and the length and composition of the hydrophobic tail groups. 25 Hence whilst the dimensions of micelles may be ideal for eliminating scattering, their physical characteristics such as their geometry may have deleterious effect on the protein structure. Furthermore, the functions of many types of membrane proteins ( e.g. ion channels) cannot be assessed in micelles so there is no way to ensure their structure is not perturbed.

        Small unilamellar vesicles (SUVs) 14,15,26 (∼25 nm diameter) also tend to produce little scattering in the far UV region, and provide another solution. Such samples can be produced by mechanical means such as sonication or extrusion, but in very small vesicles the protein structure may be distorted by the process of producing the vesicles or by the curvature of the membranes. Alternatively the protein can be examined in bicelles, or nanodiscs, which can also have small dimensions relative to the wavelength of light used for CD studies. Again, however, there may be issues associated with protein integrity in such environments, and because the intra- and extra-cellular surfaces are not in separate compartments, some proteins such as channels also cannot be functionally assayed in these types of samples.

        Alternatively, it has been suggested that scattering effects may not preclude all measurements in large lipid unilamellar vesicles (LUVs), a conclusion largely based on comparisons of a soluble protein in the presence and absence of lipid vesicles. 27 However, in the test samples, 27 the scattering particles (LUVs) were not themselves chiral, nor were the chiral objects (proteins) scatterers, meaning that the scattering would be non-chiral, and would naturally not influence the shape of the CD spectrum. 15 Hence this may not have been an entirely suitable test for a chiral membrane protein in a scattering lipid vesicle. Nevertheless, their conclusions 27 that protein CD spectra obtained in the presence of LUVs could be used if the data were limited to wavelengths above 200 (or 215 nm, depending on the size of the LUVs) may provide another option for avoiding the effects of light scattering. However, it is noteworthy that virtually all secondary structure analysis algorithms require the availability of data down to at least 190 nm (due to the number of eigenvectors of information present) 28,29 so quantitative analyses in LUVs would not be accurate, but may be useful for the qualitative examination of large differences in more native-like environments.

        A simple physical solution to the scattering problem is to locate the sample as close to the instrument detector as possible so that the light scattered at relatively large angles can be captured, eliminating the apparent effect on the spectrum. The detector acceptance angle ( θ ) geometry is defined in Fig. 3. Different commercial CD instruments have different default θ angles, but most can be modified to enable large values of θ by moving the sample (or detector) so that the sample cell is adjacent to the detector face. Most SRCD beamlines have this type of geometry as their default. The effectiveness of this procedure depends on the geometry of the scattering object, with empty and filled spheres, filaments, and discs producing very different 3-dimensional scattering profiles. However for “empty spheres” such as lipid vesicles, the scattering is generally within 90 degrees of the forward direction, 14 an angle that can usually be achieved by the appropriate positioning of the cell and detector.

        A practical consideration issue when moving the detector to obviate the scattering measured from the sample is the sample cell geometry. All of the scattered light from a particular angle θ must reach the detector and not be subtended by the side edges of the sample cell. This thus requires the use of circular rather than rectangular cells, as the sides of the latter would intersect some of the scattered light in the forward directions.

        Fig. 5 Diagram indicating the nature of the phenomenon of absorption flattening. The top panel depicts an isotropic sample, whereas the bottom depicts a membrane sample. The small circles represent proteins, whilst the large circles represent membrane particles. t is the cell pathlength, I 0 is the incident light on each sample. I I is the transmitted light by the isotropic sample, whereas I M is the transmitted light for the membrane sample. I M/ I I = q (the flattening coefficient). In the limit of one protein per membrane, q = 1.
        Fig. 6 Spectra depicting the effects of absorption flattening on the spectra of a membrane protein, bacteriorhodopsin, which has a primarily helical secondary structure. 14 Purple membrane fragments (large particles with low lipid-to-protein ratios, dotted line) where flattening is large, and small unilamellar vesicles (SUVs) (with high lipid-to-protein ratios, solid line) where flattening is negligible. These are compared with the spectrum calculated from the protein crystal structure (dashed line), which would correspond to an “unflattened” spectrum. It clearly matches the SUVs spectrum, but not the spectrum of the membrane fragments.

        The amount of flattening is proportional to the extent of the sample non-uniformity. This would be less problematic if the effect were uniform across the wavelength range of the spectrum, as it could simply be compensated for by a scaling factor. However, this is not the case because absorption (and thus flattening) is a function of the extinction coefficient of the sample at a given wavelength. Hence, the spectrum will not be uniformly flattened at all wavelengths. In an optically active sample the extinction coefficients at a given wavelength are different for left circular polarised light (CPL) and right CPL. As a consequence, the CD peaks are not only depressed with respect to samples at lower concentrations, but further distorted by this differential effect. Differential flattening will be more apparent for CD peaks with higher extinction coefficients (in most cases, this is the electron transition at ∼190 nm). Put simply, the higher the absorbance, the more the flattening, thus not only is the overall spectral magnitude reduced, but also the magnitude of different peaks are reduced by different amounts, thereby distorting the shape of the spectrum. The extent of the flattening will also depend on the relative concentration of the proteins in the particles, with higher concentrations producing more flattening, and also on the geometry of the particles. 31

        An extreme example is the purple membrane containing the membrane protein bacteriorhodopsin, 10,30,31 in which the proteins are close-packed into two-dimensional crystals. Relative to a solution of isolated bacteriorhodopsin molecules, the spectrum of bacteriorhodopsin in purple membranes is not only much smaller, but the more intense peaks at ∼190 and 208 nm are significantly depressed relative to the less intensely absorbing peak at ∼222 nm. When compared with a dispersed sample of the protein in SUVs, or the back-calculated spectrum from the crystal structure, the spectrum of the highly concentrated protein patches is very different 14 (Fig. 6).

        The extent of the flattening ( q ) at any wavelength can be expressed as the ratio of the absorbance (or ellipticity in the case of CD) of the spectrum of the protein in the membrane particle ( A M) divided by absorbance (or ellipticity) of the spectrum of the same protein in a completely dispersed form ( A I).
 Solutions/corrections for the phenomenon. Obviously the simplest means of correcting for this phenomenon would be to completely disperse the protein in a form where there was one protein per particle. If the protein can be incorporated into particles (detergent micelles, lipid vesicles, bicelles, amphipols, nanodiscs, or membrane fragments) containing a single protein then the dispersed condition can be met. 30,31 However, that condition is challenging to meet, even for SUVs. It cannot be achieved simply by sonication of larger particles, as decreasing the particle size whilst maintaining the lipid-to-protein ratio will not significantly affect the distribution of absorbing proteins. The complete elimination of flattening in SUVs requires lipid-to-protein molar ratios of around 2000 [see Section 2.3]. However the CD signal from a sample with such a high lipid-to-proteins ratio will generally be compromised by the absorption of the phospholipid carboxyl groups which absorb strongly in the wavelengths of interest. So for membrane proteins there is often a trade-off of concerns for detergent effects versus spectral distortions produced by the particulate nature of lipid membranes.

        2.3 Effects of lipid-to-protein molar ratios

        However, the high light flux of SRCD beamlines [see Section 4.4] (which enables light penetration through relatively opaque samples) can mitigate against these problems by enabling the measurement of SUVs which have lipid-to-protein ratios of 250:1 or even as high as 2000:1. 6

        Access options

        Buy single article

        Instant access to the full article PDF.

        Tax calculation will be finalised during checkout.

        Subscribe to journal

        Immediate online access to all issues from 2019. Subscription will auto renew annually.

        Tax calculation will be finalised during checkout.

        Molecular Biology 02: 'Thermodynamics of protein folding'

        Continued from lecture 01. ω is always 0 or +180°. If you plot Φ and Ψ you find only a few clusters are well-represented: a range of α-helix combinations, a β-sheet area, and a third rarer area (called Lα and populated by left-handed α-helices). ω is ususally found in the trans conformation due to steric hindrance of the consecutive side chains, however, proline because it is anchored to the backbone has a unique twist that enables a cis conformation.

        Secondary structure

        α-helices and β-sheets are two ways of allowing the NH and C=O groups on the backbone to form hydrogen bonds. α-helices contain 3.6 residues per rotation, or in other words, each residue spans 100° of rotation. Consecutive rungs of an α-helix turns are separated by 5.4Å. α-helices are almost exclusively right-handed. In a right-handed α-helix, you turn counter-clockwise as you go up. In a left-handed α-helix you turn clockwise as you go up. Side chains point outward from the helix. If you plot out where each residue falls on the helix based on the 3.6 residues/turn rule, you find that amphipathic, half-buried helices have all the hydrophobic residues on one side and the hydrophilic ones on the other side. A fully buried helix will be all hydrophobic residues and a fully exposed helix will be all hydrophilic residues.

        In β-sheets, all potential H-bonds are satisfied except for the “flanking” strands at either end of the sheet. About 20% of β-sheets found in nature are mixed parallel and anti-parallel, the other 80% are pure one or the other. β-sheets are not flat, but pleated.

        Tertiary structure

        A single sheet or helix is not stable in water. Tertiary structure is the packing of these elements, and loops connecting them, onto each other.

        Thermodynamics of protein folding

        There are two fundamental problems in protein folding:

        1. Can we predict a protein’s structure from its sequence? is that sampling all possible possible conformations of a polypeptide chain to find the lowest-energy state would take millions of years rather than a few seconds, so how do proteins fold so quickly?

        As an example, consider the metalloprotease cleaveage of Notch to create the Notch intracellular domain (NICD), which then translocates to the nucleus and affects transcription. The proteolytic site of Notch is protected by Lin12/Notch repeats which are connected to the EGF repeats that interact with Notch’s ligand. The ligand is believed to apply a force that unfolds this region, allowing cleavage. Mutations which destabilize this fold and result in constitutive activation cause tumors.

        Thermodynamics can only describe whether a chemical reaction will occur spontaneously or not, not how fast it will occur (see Biochemistry 01).

        The energy of a system is its capacity to do work.

        Where U is internal energy, q is heat and w is work.

        Where C is the heat capacity and f and i mean final and initial.

        Where F is force and Δx is displacement along the x axis.

        If you dissolve urea in water at a 4M solution, it will dissolve spontaneously and the solution will become cold (just like guanidine, as I learned here).

        Gibb’s free energy is defined as:

        Where G, H, T and S are Gibb’s free energy, enthalpy, temperature and entropy respectively.

        If ΔG < 0 the reaction will proceed spontaneously.

        In the urea example, ΔH > 0 because energy is required to pull apart the interacting urea molecules, using heat from the water. Yet the reaction still occurs spontaneously because ΔS > 0 by a lot - the urea solution is much more entropic than urea and water separately.

        For the reaction A + B ↔ C + D, we define:

        ATP is a special molecule: its hydrolysis into ADP is spontaneous at physiological concentrations of the reactants and products, i.e. ΔG < 0 for this reaction:

        Le Chatelier’s principle says you could drive the reaction in reverse, making ATP spontaneously, simply by increasing the concentrations of the procuts. However [Pi] never gets high enough in the cell for ATP to be spontaneously generated from ADP. The unfavorable production of ATP is instead created via a coupled reaction with favorable reactions such as the release of protons across the mitochondrial membrane (see Biochemistry 08).


        Where U, P and V are internal energy, pressure and volume.

        In physiological conditions, changes in pressure and volume are almost always negligible, so H and U are closely coupled. In other words, in most biological systems, the enthalpy is equal to the internal energy.

        People have developed molecular dynamics simulations of the fundamental atomic forces that determine a protein’s enthalpy (dihedral angles, Van der Waals interactions, electrostatic interactions, etc) and attempt to minimize the energy to determine a protein’s fold. But there are so many degrees of freedom that computational expense prohibits running the simulation long enough to find the lowest energy state. Still there are attempts, such as [email protected], Foldit, and D.E. Shaw’s Anton. Anton holds the record for the longest molecular dynamics simulation - it ran for some untold amount of time, calculating the energy a protein would have at every femtosecond or something, in order to simulate 1 millisecond of the protein’s movement. Obviously, the time that Anton took to simulate that millisecond was more than a millisecond.


        Where kb is Boltzmann’s constant and W is the number of microstates that give rise to the macrostate of interest.

        My favorite explanation of this is that given by Richard Feynman. When I read it, I understood for the first time how physical entropy and information entropy are the same concept:

        So we now have to talk about what we mean by disorder and what we mean by order. … Suppose we divide the space into little volume elements. If we have black and white molecules, how many ways could we distribute them among the volume elements so that white is on one side and black is on the other? On the other hand, how many ways could we distribute them with no restriction on which goes where? Clearly, there are many more ways to arrange them in the latter case. We measure “disorder” by the number of ways that the insides can be arranged, so that from the outside it looks the same. The logarithm of that number of ways is the entropy. The number of ways in the separated case is less, so the entropy is less, or the “disorder” is less.

        — Richard Feynman, quoted here

        In biology, entropy is very often the driving force, for instance for the burial of hydrophobic protein domains. Imagine a water molecule in a tetrahedron. The tetrahedron has four corners, and the water has two hydrogens, so you can place the molecule in 4 choose 2 = 6 orientations. If you add a nonpolar group of a neighboring molecule at one corner of the tetrahedron, only three of the six states remain favorable (by still allowing hydrogen bonding). So ΔShydrophobic = kbln(3) - kbln(6) < 0, meaning that entropy has decreased.

        Consider the mixing of epoxy and hardener into cured epoxy. This reaction has ΔS < 0 because the solid has fewer microstates than the liquids did. Yet the reaction occurs spontaneously at room temperature, so it must be true that ΔH < 0. Heat is therefore released - in fact, the reaction is extremely exothermic. Joe measured the temperature of “5-minute epoxy” and it rose from 21°C to >40°C at the 5 minute mark.

        An incorrect and simplistic view of protein folding is as follows. An unfolded protein has high configurational entropy but also high enthalpy because it has few stabilizing interactions. A folded protein has far less entropy, but also far less enthalpy. There is a tradeoff between H and S here. Note that because ΔG = ΔH - TΔS, increased temperature weights the S term more heavily, meaning that higher temperature favors unfolding.

        That entire explanation only considers the energy of the protein and not that of the solvent. In fact, hydrophobic domains of a protein constrain the possible configurations of surrounding water (see explanation above), and so their burial upon folding increases the water’s entropy. Moreover, it turns out that the hydrogen bonding of polar residues and the backbone is satisfied both in an unfolded state (by water) and in a folded state (by each other). Therefore enthalpy is “zero sum,” and protein folding is driven almost entirely by entropy.

        Here is a description of a technique called differential scanning calorimetry. You apply equal amounts of heat to two solutions, one with only buffer and the other with buffer and protein, and you measure the temperature in each solution. Eventually the protein reaches its melting temperature Tm, where the protein is 50% folded and 50% unfolded and ΔG = 0. At Tm, the melting of the protein aborbs lots of the applied heat, and so the temperature does not rise as much as it does in the buffer-only solution.

        Another technique for measuring protein stability is the force required to unfold it using single molecule atomic force microscopy.

        Common denaturants are urea and guanidine hydrochloride. Amazingly, we still do not know how they work. It is thought that they stabilize all constituent parts of the unfolded protein. Guanidine may surround those unfavorable hydrophobic domains of the protein but then expose its own hydrophilic side to water, so that the movement of the water is not constrained.

        About Eric Vallabh Minikel

        Eric Vallabh Minikel is on a lifelong quest to prevent prion disease. He is a scientist based at the Broad Institute of MIT and Harvard.

        We often think of proteins as nutrients in the food we eat or the main component of muscles, but proteins are also microscopic molecules inside of cells that perform diverse and vital jobs. With the Human Genome Project complete, scientists are turning their attention to the human “proteome,” the catalog of all human proteins. This work has shown that the world of proteins is a fascinating one, full of molecules with such intricate shapes and precise functions that they seem almost fanciful.

        A protein’s function depends on its shape, and when protein formation goes awry, the resulting misshapen proteins cause problems that range from bad, when proteins neglect their important work, to ugly, when they form a sticky, clumpy mess inside of cells. Current research suggests that the world of proteins is far from pristine. Protein formation is an error-prone process, and mistakes along the way have been linked to a number of human diseases.

        The wide world of proteins:

        There are 20,000 to over 100,000 unique types of proteins within a typical human cell. Why so many? Proteins are the workhorses of the cell. Each expertly performs a specific task. Some are structural, lending stiffness and rigidity to muscle cells or long thin neurons, for example. Others bind to specific molecules and shuttle them to new locations, and still others catalyze reactions that allow cells to divide and grow. This wealth of diversity and specificity in function is made possible by a seemingly simple property of proteins: they fold.

        Proteins fold into a functional shape

        A protein starts off in the cell as a long chain of, on average, 300 building blocks called amino acids. There are 22 different types of amino acids, and their ordering determines how the protein chain will fold upon itself. When folding, two types of structures usually form first. Some regions of the protein chain coil up into slinky-like formations called “alpha helices,” while other regions fold into zigzag patterns called “beta sheets,” which resemble the folds of a paper fan. These two structures can interact to form more complex structures. For example, in one protein structure, several beta sheets wrap around themselves to form a hollow tube with a few alpha helices jutting out from one end. The tube is short and squat such that the overall structure resembles snakes (alpha helices) emerging from a can (beta sheet tube). A few other protein structures with descriptive names include the “beta barrel,” the “beta propeller,” the “alpha/beta horseshoe,” and the “jelly-roll fold.”

        These complex structures allow proteins to perform their diverse jobs in the cell. The “snakes in a can” protein, when embedded in a cell membrane, creates a tunnel that allows traffic into and out of cells. Other proteins form shapes with pockets called “active sites” that are perfectly shaped to bind to a particular molecule, like a lock and key. By folding into distinct shapes, proteins can perform very different roles despite being composed of the same basic building blocks. To draw an analogy, all vehicles are made from steel, but a racecar’s sleek shape wins races, while a bus, dump truck, crane, or zamboni are each shaped to perform their own unique tasks.

        Why does protein folding sometimes fail?

        Folding allows a protein to adopt a functional shape, but it is a complex process that sometimes fails. Protein folding can go wrong for three major reasons:

        1: A person might possess a mutation that changes an amino acid in the protein chain, making it difficult for a particular protein to find its preferred fold or “native” state. This is the case for inherited mutations, for example, those leading to cystic fibrosis or sickle cell anemia. These mutations are located in the DNA sequence or “gene” that encodes one particular protein. Therefore, these types of inherited mutations affect only that particular protein and its related function.

        2: On the other hand, protein folding failure can be viewed as an ongoing and more general process that affects many proteins. When proteins are created, the machine that reads the directions from DNA to create the long chains of amino acids can make mistakes. Scientists estimate that this machine, the ribosome, makes mistakes in as many as 1 in every 7 proteins! These mistakes can make the resulting proteins less likely to fold properly.

        3: Even if an amino acid chain has no mutations or mistakes, it may still not reach its preferred folded shape simply because proteins do not fold correctly 100% of the time. Protein folding becomes even more difficult if the conditions in the cell, like acidity and temperature, change from those to which the organism is accustomed.

        A failure in protein folding causes several known diseases, and scientists hypothesize that many more diseases may be related to folding problems. There are two completely different problems that occur in cells when their proteins do not fold properly.

        One type of problem, called “loss of function,” results when not enough of a particular protein folds properly, causing a shortage of “specialized workers” needed to do a specific job. For example, imagine that a properly folded protein is perfectly shaped to bind a toxin and break it into less toxic byproducts. Without enough of the properly folded protein available, the toxin will build up to damaging levels. As another example, a protein may be responsible for metabolizing sugar so that the cell can use it for energy. The cell will grow slowly due to lack of energy if not enough of the protein is present in its functional state. The reason the cell gets sick, in these cases, is due to a lack of one specific, properly folded, functional protein. Cystic fibrosis, Tay-Sachs disease, Marfan syndrome, and some forms of cancer are examples of diseases that result when one type of protein is not able to perform its job. Who knew that one type of protein among tens of thousands could be so important?

        Proteins that fold improperly may also impact the health of the cell regardless of the function of the protein. When proteins fail to fold into their functional state, the resulting misfolded proteins can be contorted into shapes that are unfavorable to the crowded cellular environment. Most proteins possess sticky, “water-hating” amino acids that they bury deep inside their core. Misfolded proteins wear these inner parts on the outside, like a chocolate-covered candy that has been crushed to reveal a gooey caramel center. These misfolded proteins often stick together forming clumps called “aggregates.” Scientists hypothesize that the accumulation of misfolded proteins plays a role in several neurological diseases, including Alzheimer’s, Parkinson’s, Huntington’s, and Lou Gehrig’s (ALS) disease, but scientists are still working to discover exactly how these misfolded, sticky molecules inflict their damage on cells.

        One misfolded protein stands out among the rest to deserve special attention. The “prion” protein in Creutzfeldt-Jakob disease, also known as mad cow disease, is an example of a misfolded protein gone rogue. This protein is not only irreversibly misfolded, but it converts other functional proteins into its twisted state.

        How do our cells protect themselves from misfolded proteins?

        Recent research shows that protein misfolding happens frequently inside of cells. Fortunately, cells are accustomed to coping with this problem and have several systems in place to refold or destroy aberrant protein formations.

        Chaperones are one such system. Appropriately named, they accompany proteins through the folding process, improving a protein’s chances of folding properly and even allowing some misfolded proteins the opportunity to refold. Interestingly, chaperones are proteins themselves! There are many different types of chaperones. Some cater specifically to helping one type of protein fold, while others act more generally. Some chaperones are shaped like large hollow chambers and provide proteins with a safe space, isolated from other molecules, in which to fold. Production of several chaperones is boosted when a cell encounters high temperatures or other conditions making protein folding more difficult, thus earning these chaperones the alias, “heat shock proteins.”

        Another line of cell defense against misfolded proteins is called the proteasome. If misfolded proteins linger in the cell, they will be targeted for destruction by this machine, which chews up proteins and spits them out as small fragments of amino acids. The proteasome is like a recycling center, allowing the cell to reuse amino acids to make more proteins. The proteasome itself is not one protein but many acting together. Proteins frequently interact to form larger structures with important cellular functions. For example, the tail of a human sperm is a structure composed of many types of proteins that work together to form a complex rotary engine that propels the sperm forward.

        Future research about protein folding and misfolding:

        Why is it that some misfolded proteins are able to evade systems like chaperones and the proteasome? How can sticky misfolded proteins cause the neurodegenerative diseases listed above? Do some proteins misfold more often than others? These questions are at the forefront of current research seeking to understand basic protein biology and the diseases that result when protein folding goes awry.

        The wide world of proteins, with its great assortment of shapes, bestows cells with capabilities that allow for life to exist and allow for its diversity (e.g., the differences between eye, skin, lung or heart cells, and the differences between species). Perhaps for this reason, the word “protein” is from the Greek word “protas,” meaning “of primary importance.”

        –Contributed by Kerry Geiler, a 4th year Ph.D student in the Harvard Department of Organismic and Evolutionary Biology

        Results and discussion

        Amino acid propensities for the α-helical or β-strand conformation

        For individual amino acids, a P α of <0.9 denotes an α-helix breaker, a P α of >1.1 denotes an α-helix-favored amino acid, and values between 0.9 and 1.1 denote that the amino acid is neutral in this regard [31]. The same principle applies to P β . The amino acid propensities calculated using our dataset (P α i and P β i ) are shown in Table 2. Their standard deviations ranged from 0.001 to 0.004. The results are in good agreement with previous reports [1, 6, 10].

        We also calculated the amino acid propensities for exposed and buried residues (P exp i and P bur i) in the secondary structural elements (Table 2). For α-helices, the three mean propensities P α i, P αexp i and P αbur i have similar trends. On the other hand, mean propensities for exposed residues (P βexp i) and buried residues (P βbur i) for β-strands differ significantly (Table 2). It is especially interesting that Lys and Arg, but not two other charged residues, Asp and Glu, are preferred as exposed residues in β-strands. Not surprisingly, all charged amino acids are disfavored as buried residues in β-strands. The buried regions disfavor charged amino acids for β-strands, whereas the α-helix can tolerate charged amino acids.

        As previously reported in statistical studies, charged amino acids (including Lys and Arg) yield low values for P β [1, 6, 10, 13], which is in agreement with the mean propensities, P β i, determined in the present work. Our results, however, show that Lys and Arg have relatively high P βexp values for exposed residues, but this property is masked when comparing mean propensities. In our dataset, the fraction of exposed residues in β-strands is low (29%) compared to α-helices (46%). Most residues in β-strands are buried inside proteins and covered by α-helices or loop regions exposed residues are thus less frequently encountered in β-strands, and their contributions to the mean P β i are therefore small. Jiang and coworkers [10] have suggested that the hydrophobicities of amino acid side chains are the key determinant of β-sheet structures, but our data suggest that this result is true for buried residues but not for exposed residues in β-sheet structures. Minor and Kim [27] measured the propensity of the 20 amino acids for the β-sheet formation in a variant of the IgG-binding domain from protein G, which have four antiparallel β-strands. Amino acid substitutions were made at a guest site on the solvent-exposed surface of the center strand. The propensities from those experiments show a strong correlation with the logarithmic P βexp i values obtained here (R = 0.82), although they show a weaker correlation with our logarithmic P βbur i values (R = 0.63). Furthermore, there is poor correlation between the propensities determined by Minor and Kim [27] and those of Chou and Fasman [1]. These results show that the preference for β-strands differs for exposed and buried sites.

        Fold dependency of amino acid propensities for α-helices

        The propensities of amino acid i in the helical region of fold j, P α ij, and the β-strand region of fold j, P β ij, were thus calculated for 39 and 24 of SCOP folds, respectively (Figure 1). Their standard deviations range from 0.01 to 0.05. With the exception of Met, Cys, Trp, Asn, Asp and His for P α ij, and with the exception of Met, Pro and Cys for P β ij, the population of amino acids differed (>90% confidence level) for more than one pair of folds.

        Amino acid propensities for each SCOP fold. Box plots of amino acid propensities for each SCOP fold for α-helices (A) and β-strands (B). Each box encloses 50% of the data with the median value displayed as a line. The top and bottom of the box mark the limits of ±25% of the data. The lines extending from the top and bottom of each box mark the minimum and maximum values within the data set that fall within an acceptable range. Any value outside of this range, called an outlier, is displayed as an individual point. Underlining of certain residues (one-letter code) on the horizontal axis denotes that the results from the Fisher-Irwin population proportion test indicated that differences in propensities are statistically significant between folds.

        In particular, a wide range of P α ij values was obtained for the aromatic residues Phe (0.66–2.00) and Tyr (0.58–1.89), depending on fold type, and the mean propensity for all folds is approximately 1.0 for these amino acids (Figure 1A and Table 2). The propensities of the charged residues Lys (0.65–1.56) and Arg (0.80–1.71) also varied widely depending on a fold. On the other hand, in >80% of SCOP folds, Leu or Glu are favored in the α-helical conformation, whereas Val, Pro, Ser, Thr, Asn, Asp and Gly are disfavored. Ala is favored in the α-helical conformation in the majority of the folds (79%) but is disfavored in two folds (Protein kinase-like and 4-helical cytokines). In particular, the value of the propensity of Ala for the "4-helical cytokines" fold is quite low (P α ij = 0.64). Met, Cys, Trp and His do not have a fold-type population difference at the >90% confidence level in any pair of folds, although their propensities vary widely among the various folds. Therefore, we did not further assess these amino acids.

        Richardson et al. showed that Ala is not favored in ends of α-helix [7], suggesting that a short α-helix does not favor Ala. The mean length of α-helix of the 4 helical cytokines fold is, however, the third longest of those of 39 folds (The longest and the second longest are those of "Ferritin-like" and "Four-helical up-and-down bundle" folds, respectively). Then, the correlation coefficient between the mean length of α-helix and the amino acid propensity for each amino acid were calculated, so that they were smaller than 0.4. This result indicates that there is no relationship between the mean length of α-helix and the helical propensity of any amino acid.

        Engel et al. show that most helices are amphiphilic [7, 12], suggesting that the propensities for α-helix depend on the exposed residue fraction. So, we examined the correlations between the exposed residue fraction and the frequency of amino acids in α-helices. No amino acid showed a strong correlation (R < −0.7 or R > 0.7) between the exposed residue fraction and the amino acid frequency, although the charged residues, Lys and Asp have a relatively strong positive correlation (RK = 0.66, RD = 0.54). In contrast, the correlation coefficients of Glu and Arg (also charged amino acids) are small (RE = 0.26, RR = 0.07).

        Figure 2 also presents propensities for exposed and buried amino acids for each SCOP fold. For the exposed regions of an α-helix (Figure 2A), less than ten amino acids show the population difference with 90% confidence for at least one pair of folds. Probably, this results from the fact that the dataset was limited to exposed residues. Glu (P αexp ij: 1.0–1.92) is favored in exposed regions (Figure 2A) whereas Leu (P αbur ij: 0.97–1.88) is favored in buried regions (Figure 2B) for more than 80% of the folds. Pro and Gly are extremely disfavored in both exposed and buried regions for more than 92% of the folds. The propensities of Ala in the exposed and buried regions of α-helix have a similar tendency as P α ij. Ala is favored in the α-helical conformation in both exposed and buried regions for 72% and 79% of the folds, respectively, whereas Ala is disfavored by 8% and 13% of the folds when exposed or buried, respectively. For the "4-helical cytokines" fold, the values of the propensity of Ala in both exposed and buried regions are also low (P αexp ij = 0.72 and P αbur ij = 0.60). A wide range of P αbur ij values was obtained for the aromatic residues Phe and Tyr, depending on fold type (Figure 2B), like as P α ij.

        Amino acid propensities for exposed and buried residues. Box plots of Amino acid propensities for each SCOP fold for exposed (A) and buried (B) residues in α-helices and for exposed (C) and buried (D) residues in β-strands. The propensities for β-strands for Trp in the “PH domain-like barrel” SCOP fold and for Lys in the “Protein kinase-like” SCOP fold were out of range (4.3 in C and 3.8 in D, respectively) and are not shown. Underlining of certain residues on the horizontal axis denotes that the results from the Fisher-Irwin population proportion test indicated that differences in propensities are statistically significant between folds.

        Fold dependency of amino acid propensities for β-strands

        As shown in Figure 1B, a wide range of P β ij values was obtained for Trp (0.45–2.22), Thr (0.73–1.87), Lys (0.46–1.45) and Arg (0.51–1.42) depending on fold type. For Lys, although P β ij was <0.9 in 18 of 24 folds (mean value of P β ij = 0.79), three folds (the lipocalins fold, OB-fold, and protein kinase–like fold) yielded P β ij values > 1.2, which had the population differences corresponding to 90% confidence level with that of other folds. These three folds are “all-β” or “α + β”, and all have largely exposed β-strands, whereas β-strands are usually covered by α-helical or loop regions, especially in “α/β” proteins (Table 1). It has long been thought that β-strands prefer hydrophobic residues [1, 6, 10] however, it now appears that largely exposed β-sheet structures prefer hydrophilic residues such as Lys. In contrast, the four amino acids Val, Ile, Phe and Tyr are favored (P β ij > 1.1) in β-strands of more than 80% of folds, with Val (1.40–2.68) and Ile (1.17–2.33) having particularly high propensities in this regard. The six amino acids Pro, Ala, Asn, Asp, Glu and Gly are disfavored (P β ij < 0.9) in β-strands for more than 80% of folds, and Pro (0.16–0.71) and Asp (0.22–0.91) have quite low propensities.

        The exposed residue fractions were observed in the range from about 10% to 46% for 24 folds (Table 1) and Glu and Lys have strong and positive correlations between the amino acid propensities and the exposed residue fractions of β-strands in each fold (RE = 0.76, RK = 0.73). Gln, Arg and Ile also have relatively strong correlations, although the correlation for Ile is negative (RQ = 0.67, RR = 0.5, RI = −0.68). As opposed to the strong positive correlation found for Glu, there is no correlation for the other negatively charged amino acid, Asp. The exposed residue fraction appears to be one of the major factors governing charged amino acid composition of folds for β-strands.

        For residues exposed in a β-strand (Figure 2C), a wide range of P βexp ij values was obtained for Ser (0.42–1.69), Lys (0.84–1.58) and Arg (0.68–1.85). A wide range of P βbur ij values was obtained for Cys (0.61–2.61), Phe (0.66–1.83), Tyr (0.64–1.92), Trp (0.31–1.77) and His (0.41–1.87) for residues buried in a β-strand (Figure 2D). P βexp ij values of Val, Ile, Phe, Tyr, Trp and Thr are high (P βexp ij > 1.1) for more than 75% of folds, indicating that these amino acids, which have a β-branched or aromatic side chain, are favored in the exposed regions of β-strands in all fold types. In contrast, amino acids that are disfavored in all folds in β-strands are Pro (0.22–0.87), Ala (0.28–0.70) and Gly (0.23–0.88) for exposed regions, and Pro (0.12–0.87) for buried regions. It is interesting that P βexp ij values for all folds for Ala are lower by comparison (P βexp ij < 0.7), indicating that an exposed residue on a β-strand is an extremely unfavorable position for Ala as well as for Pro and Gly. These strong tendencies support that the backbone solvation is a major factor determining thermodynamic β-propensities [32].

        Correlations between amino acid propensities and SCOP fold

        To investigate the factors that determine the fold dependence of the amino acid propensity for the secondary structures, correlation coefficients were calculated using amino acid propensities obtained from 39 SCOP folds for α-helices (Figure 3A) and 24 SCOP folds for β-strands (Figure 3B). Figure 4, for example, shows the relationships between the propensities of Glu and Lys for α-helices and β-strands. Each data point represents a fold in which more than 2,000 residues are found in each of α-helices and β-strands. For β-strands (Figure 4B), these two amino acid propensities have a correlation coefficient of 0.70, which suggests that folds rich in Glu are likely to also be rich in Lys. In contrast, for α-helices (Figure 4A) no significant correlation was observed. For β-strands, “α/β” proteins (□ in Figure 4B) show low propensities for Glu and Lys, although lipocalins and OB-folds (both “all-β”, + in Figure 4B) show higher propensities for Glu and Lys. For “α+β” proteins ( ▵ in Figure 4B), there is no correlation between the propensities of Glu and Lys. The correlation coefficients for “all-β” proteins and “α/β” proteins are 0.83 and 0.86, respectively.

        Correlation coefficients between amino acid propensities. Correlation coefficients between amino acid propensities for α-helices (A) and β-strands (B). Strong negative correlations (R < −0.7) are indicated by dark blue, and positive correlations (R > 0.7) are indicated by dark red. Comparatively strong negative correlations (R < −0.5) are indicated by light blue and positive correlations (R > 0.5) by pink.

        Relationship between the amino acid propensities. Amino acid propensities, P, for Glu and Lys for each SCOP fold for α-helices (A) and β-strands (B). The SCOP classes are: all-α proteins ( ○ ), α/β proteins (□), α + β proteins (Δ) and all-β proteins (+).

        Overall, there is a greater number of strong correlations (R < −0.7 or R > 0.7) for β-strands than for α-helices (Figure 3). For example, four strong positive correlations and five strong negative correlations are observed for β-strands, but there are only two paired strong correlations for α-helices (Ala and Gly, Tyr and Trp). Most of the positive correlations for β-strands involve paired amino acids having similar physicochemical characters (shown along the diagonal in Figure 3B), such as Val and Ile, Tyr and Trp, Ser and Gln/Thr/Asn, Asn and Thr, and Glu and Lys/Arg. In contrast, most of the negative correlations for β-strands involve pairs of amino acids having different physicochemical characters, such as Val and Tyr/Trp/Gln/Ser, Ile and Trp/Gln/Ser/Glu/Arg, Leu and Ser/Thr/Asn, Met and Asn, and Ala and Lys.

        Interestingly, the aromatic amino acid, Phe, shows low correlations with Trp and Tyr, for both α-helices and β-strands, although strong positive correlations between Trp and Tyr are observed for both α-helices and β-strands.

        Correlations between SCOP fold and propensities for exposed or buried amino acids

        We also calculated correlation coefficients for amino acid propensities of exposed and buried residues for α-helices (Figure 5), β-strands (Figure 6) and other conformation (Data not shown). Although amino acid propensities for α-helices have two strong correlations (Figure 3A), there is no strong correlation for exposed (Figure 5A) and buried (Figure 5B) residues for α-helices. The strong positive correlation between Trp and Tyr for all residues was absent for exposed residues, but a weak positive correlation was observed for buried residues. These results indicate that a fold that favors Trp on the interior side of an α-helix also favors Tyr in a interior of α-helices. Again, Phe had no correlation with Trp or Tyr for exposed or buried residues. The positive correlations among Ser, Asn and Thr, and the negative correlations between Ser/Thr and Glu, were observed only for exposed residues. Although some new correlations were observed, these values were relatively low for α-helices. For other conformation, strong correlation was not observed for both exposed and buried residues.

        Correlation coefficients between α-helix propensities for exposed residues and buried residues. Correlation coefficients between α-helix propensities for exposed residues (A) and buried residues (B). Strong negative correlations (R < −0.7) are indicated by dark blue, and positive correlations (R > 0.7) are indicated by dark red. Comparatively strong negative correlations (R < −0.5) are indicated by light blue and positive correlations (R > 0.5) by pink.

        Correlation coefficients between β-sheet propensities for exposed residues and buried residues. Correlation coefficients between β-sheet propensities for exposed residues (A) and buried residues (B). Strong negative correlations (R < −0.7) are indicated by dark blue, and positive correlations (R > 0.7) are indicated by dark red. Comparatively strong negative correlations (R < −0.5) are indicated by light blue and positive correlations (R > 0.5) by pink.

        Correlation for buried amino acids in β-strand

        In contrast, for β-strands, most of the correlations shown in Figure 3B are strong correlations for exposed (Figure 6A) and buried (Figure 6B) residues. The strong negative correlations for Val/Ile and Tyr/Trp/Gln were observed for buried but not exposed residues. In other words, a fold type that prefers Val or Ile does not prefer Tyr, Trp or Gln, especially for buried residues.

        By visually inspecting buried residues for β-strands in the SCOP fold group of “concanavalin A–like lectins/glucanases” (concanavalin A), in addition to buried Tyr and Trp residues we found many polar amino acids such as Gln, Ser or Thr, and charged amino acids such as Glu, Lys or Arg, involved in H-bonds with each other to counterbalance the polarity in the hydrophobic environment. For the buried residues, we calculated the correlation coefficients between the combined frequencies of hydrophobic amino acids (Val, Ile and Leu) and some polar amino acids (Table 3 and Figure 7). The correlation coefficients calculated from the frequencies are the same as those calculated from the propensities, and thus it is easier to understand the amino acid occurrences. The combined frequencies of Trp, Tyr and Gln that are buried have a strong correlation (R = −0.87) with those of hydrophobic amino acids (Val, Ile and Leu). The inclusion of Ser in the group with Trp, Tyr and Gln increased the correlation coefficient to −0.93 (Figure 7). The fact that the correlation coefficients for Val/Ile/Leu and Tyr/Trp/Gln/Ser range from −0.19 to −0.75 indicates synergy in the correlation of the combined frequencies for β-strands that does not exist for α-helices and other conformation (Table 3). The synergy between these amino acid groups suggests that the amino acids within the same group can be exchanged. For example, in a fold type where Leu is preferred for buried residues, Ile will also be preferred. Thus, at buried sites, fold types with many aliphatic residues (Val, Ile and Leu) also contain low quantities of Tyr, Trp, Gln and Ser. Figure 7 also shows that “all-β” proteins tend to have a higher content of Tyr, Trp, Gln and Ser, whereas “α/β” proteins have a higher content of aliphatic amino acids at buried sites. The top six folds for the content of Tyr, Trp, Gln and Ser at buried sites in β-strands are “all-β” proteins and have two large β-sheets packed together (lipocalins, concanavalin A, 6-bladed beta-propeller (6-bb-propeller), galactose-binding domain-like (Gbd), double-stranded β-helix (DS β-helix), and immunoglobulin-like beta-sandwich folds (Ig)). Other “all-β” proteins that consisted of only one small β-sheet or small β-barrel structure have a small hydrophobic core. The H-bonds between the buried side chains may be necessary for correct alignment of two large β sheets in particular.

        Relationship between the frequencies of buried residues. Relationship between the frequencies of buried Val, Ile and Leu residues, f VIL, and buried Trp, Tyr, Gln and Ser residues, f WYQS, in β-strands. The SCOP classes are: α/β proteins (□), α + β proteins (Δ) and all-β proteins (+).

        Correlation for exposed amino acids in β-strand

        Negative correlations for Ile/Leu and Ser/Thr/Asn were observed in the exposed residues (Figure 6A), although the correlations for Ile and Thr/Asn were not observed when both exposed and buried residues were calculated together (Figure 3B). Negative correlations were also observed for Glu and Ser/Asn and for Arg and Thr. We examined the correlation of the combined frequencies for these exposed amino acids in β-strands as shown in Table 4. This result shows that strong correlations exist in the frequencies of certain hydrophobic amino acids (Ile, Leu), charged amino acids (Glu, Lys, Arg), and polar amino acids (Ser, Thr, Asn) in the exposed regions of β-strands. It is interesting that the frequencies of hydrophobic (Ile, Leu) and charged (Glu, Lys, Arg) amino acids correlate negatively with those for polar amino acids (Ser, Thr, Asn). A common feature for Ile, Leu, Glu, Lys and Arg is that they have relatively long side chains, including more than two hydrophobic methylene groups, whereas Ser, Thr and Asn have short side chains.

        Figure 8 shows a strong correlation between the combined groupings of Ser, Thr and Asn with Ile, Leu, Glu, Lys and Arg (R = −0.90). For the exposed regions of β-strands, it is clear that in all “α/β” proteins and all “α+β” proteins, Ile, Leu, Glu, Lys and Arg are preferred and that Ser, Thr and Asn are disfavored. Fold types that prefer Ser, Thr or Asn have a relatively low content of Ile, Leu, Glu, Lys, or Arg, and they are “all-β” proteins. Figure 8 also shows the widespread distribution of the folds of “all-β” proteins. For the two SCOP folds DS β-helix and OB-fold of “all-β” proteins, the residues Ile, Leu, Glu, Lys or Arg are preferred in the exposed regions of the β-strands. These fold types have twisted and bent β-strands. Some Cα atoms in the β-strands are positioned at the bottom of the narrow and deep valley formed by the twisted and bent β-strands (Figure 9D and E). At such positions, the short, polar side chain of Ser, Thr or Asn is unable to reach the solvent, so amino acids with long side chains are favored. Much the same is true for “α/β” proteins (Figure 9F and G). The β-sheet is covered by α-helices and twists in “α/β” proteins, leaving only narrow spaces for the residues at the ends of the β-strands to reach solvent. In contrast, the two SCOP folds concanavalin A and single-stranded right-handed β-helix (SS β-helix) have a remarkably high content of Ser, Thr and Asn in the exposed regions of β-strands and have largely exposed and flat β-sheets (Figure 9A, B and C). Figure 9C shows that Ser, Asn and Thr are dominant in the flat β-sheet, and they do not significantly make contact with each other. These results suggest that amino acid composition in the exposed regions of β-strands governs the formation of a twist in β-sheets.

        Relationship between the frequencies of exposed residues. Relationship between the frequencies of exposed Ile, Leu, Glu, Lys and Arg residues, f ILEKR, and exposed Ser, Thr and Asn residues, f STN, in β-strands. The SCOP classes are: α/β proteins (□), α + β proteins (Δ) and all-β proteins (+).

        Amino acid residues on β-strands of three folds. Amino acid residues in β-strands of concanavalin A (A, B and C, PDB ID:1IOA), DS β-helix (D and E, PDB ID:1ODM), and TIM barrel (F and G, PDB ID:1SFS). The residues for α-helices are colored magenta, and those for β-strands are colored yellow. The side chains of residues in β-strands are colored by atom type (nitrogen: blue, oxygen: red, carbon: grey) in C.

        Wang et al. [33] showed that isolated β-strands in molecular dynamics simulations are not twisted, suggesting that the stabilization of the twist must be due to inter-strand interactions. Another computer simulation study found that inter-strand interactions by side chains induce a twist and that β-branched side chains are important for twist formation [34]. On the other hand, Koh et al. [35] and Bosco et al. [36] used statistical analyses to show that β-sheet structure is mainly determined by the backbone, and the contribution of side chains is small. This indicates that twisting is an inherent property of a polypeptide chain, implying that a β-strand should twist regardless of its amino acid sequence. However, some folds have a large/flat β-sheet, such as the SCOP groups concanavalin A and SS β-helix. Previous studies have targeted only the twisted β-strand and not focused on the flat β-sheet. Our results suggest that the amino acid composition in the exposed regions of β-strands may be related to the twist and bend of the strand, showing that side chain interactions are also an important factor for β-strand twisting. An intuitive explanation is that the long side chains of Leu, Ile, Lys, Arg and Glu in the exposed regions come close together to form the hydrophobic core, resulting in the formation of a twist and/or bend in β-strands. In contrast, the side chains of Ser, Thr and Asn have low hydrophobicities and are short so that the hydrophobic interactions between the side chains are weak and produce a flat β-sheet. Therefore, it seems that the strain within a β-sheet is one of the major factors governing amino acid propensities of folds for β-strands.

        The types of β-sheets and the amino acid propensity

        The folds can be classified by their β-sheet types into three parallel, antiparallel and mixed β-sheet. For "all-β" protein class and "α + β" protein class, β-sheets of all folds used in this study are completely antiparallel β-sheet except for SS β-helix which has completely parallel β-sheet. The folds of "α/β" protein class have completely or mainly parallel β-sheets. β-sheets of the three folds, "Flavodoxin-like", "NAD(P)-binding Rossmann-fold domains" and "TIM beta/alpha-barrel" are completely parallel, whereas "Periplasmic binding protein-like II" and "Thioredoxin fold" have mixed β-sheet.

        For the exposed residues of β-strands (Figure 8), the plots for the folds of "all-β" proteins class were widely distributed, although they are commonly completely antiparallel β-sheet except for SS β-helix. Furthermore, the folds of "α/β" proteins class have different amino acid compositions from that of SS β-helix, although they have parallel β-sheets. Figure 7 shows that the plots for the folds of "all-β" proteins class were widely distributed and the plot of SS β-helix is in the center of the graph. The residue fractions (f βbur VIL) of the three folds that have completely parallel β-sheets were also widely distributed (51.4, 47.2 and 42.7%).

        These results indicate that the correlations found in Figure 7 and 8 cannot be explained by the types of β-sheets. Consequently, we think that the propensities do not depend on the types of β-sheets.

        Robustness of the dataset

        We checked the robustness of our results using the dataset of more than 1,500 residues and less than 2,000 residues, which is not included in the dataset used in this study six folds for α-helix and eight folds for β-strands. For β-strands, strong correlations were also observed for buried residues (RWYQS-VIL = −0.81) and for exposed residues (RILEKR-STN = −0.78). There are no strong correlations for buried residues (RWYQS-VIL = −0.64) and for exposed residues (RILEKR-STN = −0.48) in α-helices. These results are the same as those obtained for the dataset containing more than 2,000 residues. Therefore, the results presented here seem to be independent of the dataset selection.

        Part 7: Standalone questions and answers

        Question 1: Which of the following amino acids is naturally found in the R-configuration?

        Question 2: Why are hydrophobic residues often found on the interior of a protein?

        A) They lower the entropy of the system

        B) They are often less bulky than hydrophilic residues

        C) Their van der Waals interactions are stronger than hydrogen bonds

        D) The solvation layer is less ordered near hydrophilic residues

        Question 3: Which of these is a form of primary structure interaction?

        C) Interactions between N-H and C=O of the protein backbone

        D) Hydrophobic interactions

        Question 4: Researchers hope to design a polypeptide inhibitor to a site with the repeated sequence A-Q-E-K-K. Which of these inhibitor sequences is most likely to succeed?

        Question 5: Which of the following best describes the result of peptide bond hydrolysis?

        A) The amino group of one amino acid attacks the carboxyl group of another amino acid

        B) A water molecule is released into solution

        C) A carboxylate group and an amine group are produced from an amide group

        D) The protein has been denatured

        Answers to standalone questions

        Answer choice B is correct. R- and S-configuration refer to the chirality at the alpha carbon of the amino acid. Due to the priority rankings of all side chains, all 19 chiral amino acids are of the S-configuration except cysteine. This is due to the higher priority of the thiol side chain on the cysteine molecule, making it an R-configuration (choice B is correct). All other amino acids except glycine (which is achiral) are found in the S-configuration (choices A, C, and D are incorrect).

        Answer choice D is correct. From a thermodynamic standpoint, the solvation layer is more ordered when nearby hydrophobic residues are nearby because there are fewer possible ways to create hydrogen bonds (choice D is correct). Hydrogen bonds are stronger than van der Waals interactions, and nonpolar residues cannot participate in hydrogen bonding (choice C is incorrect). Localization of hydrophobic residues has less to do with bulkiness than entropy in fact, some hydrophilic chains such as aspartate and glutamate are quite bulky (choice B is incorrect). Hydrophobic residues lower the entropy of the system when on the exterior of the protein rather than the interior (choice A is incorrect).

        Answer choice B is correct. Peptide bonds are formed between amino acids, and so comprise the primary structure of a protein (choice B is correct). Interactions between N-H and C=O groups of the protein backbone form alpha helices and beta sheets, forms of secondary structure (choice C is incorrect). Tertiary structure gives shape to globular proteins and arises from interactions between side chains of amino acids—including hydrophobic side chains (choice D is incorrect). Disulfide bonds between cysteine residues are found contributing to both tertiary structure and quaternary structure (choice A is incorrect).

        Answer choice D is correct. To maximize attraction between the inhibitor and the site, the two sequences must exhibit similar attractions. The catalytic site, with a repeated sequence of alanine-glutamine-glutamate-lysine-lysine, has nonpolar-polar-acidic-basic-basic residues. As a result, the desired inhibitor should have: nonpolar-polar-basic-acidic-acidic residues to maximize hydrophobic interactions between nonpolar residues and ionic attractions between acidic and basic residues (choice D is correct). Placing alanine in the 4th and 5th positions would result in weak interactions with lysine (choice A is incorrect). Placing an acidic residue at position 1 would result in weak interactions with nonpolar alanine (choice B is incorrect). Tryptophan is not a charged residue, and so would not be paired well with glutamate (choice C is incorrect).

        Answer choice C is correct. Peptide bonds are formed through nucleophilic attack (choice A is incorrect). Peptide bond formation also results in dehydration, or the release of one water molecule (choice B is incorrect). Denaturation occurs when secondary, tertiary, and quaternary structures are disrupted rather than primary structure (choice D is incorrect).

        Watch the video: Amazing and weird animal behaviors (August 2022).