Ouyang Projects

Predicting protein folding rates from amino acid sequences

Abstract

Protein folding speeds are known to vary over more than 8 orders of magnitude. Plaxco, Simons, and Baker first showed a correlation of folding speed with the topology of the native protein. That and subsequent studies showed that if the native structure of a protein is known, it's folding speed can be predicted reasonably well through a logarithmic correlation with the "localness" of the contacts in the protein. In the present work, we develop a related measure, the geometric contact number, N_alpha, which is the number of nonlocal contacts that are well-packed, by a Voronoi criterion. We found, first, that in 80 proteins, the largest such database of proteins yet studied, N_alpha is an excellent predictor of folding speeds of both two-state fast-folders and more complex multi-state folders. It supports the view that folding occurs by a mechanism of zipping and assembly, where shorter loops are entropically faster to form than longer ones. Second, we show that folding rates can also be predicted from amino acid sequences directly, without the need to know the native topology.

Downloads

  • 80 protein dataset
    • Table 5: The set of 45 two-state proteins

      PDB

      Protein

      Structure class

      Length

      N_alpha

      ln(K_f)

      ref

      1L2Y

      Tryptophan cage

      alpha

      20

      2

      12.40

      12

      1VII

      Villin head piece

      alpha

      36

      7

      11.51

      14

      2PDD

      PSBD (1ebd)

      alpha

      43

      10

      9.69

      15

      1PRB

      Albumin binding domain

      alpha

      53

      10

      12.90

      16

      1BA5

      Human TRF1 Myb domain

      alpha

      53

      4

      5.91

      17

      1IDY

      c-Myb-transforming protein

      alpha

      54

      2

      8.73

      17

      1FEX

      Human RAP1 Myb domain

      alpha

      59

      11

      8.19

      17

      1BDD

      B domain of protein A

      alpha

      60

      8

      11.69

      18

      2A3D

      Alpha 3D

      alpha

      73

      7

      12.7

      19

      1IMQ

      Im9

      alpha

      86

      39

      7.28

      20

      1LMB

      Lambda-repressor

      alpha

      87

      29

      8.50

      21

      1ENH

      Engrailed homedomain

      alpha

      54

      4

      10.53

      56

      1PGB_b

      Beta-hairpin of protein G

      beta

      16

      9

      12.0

      24

      1PIN

      WW domain pin

      beta

      32

      19

      9.37

      25

      1E0M

      WW prototype

      beta

      37

      22

      8.85

      26

      1E0L

      WW domain FBP28

      beta

      37

      19

      10.37

      26

      1K9Q

      WW domain YAP65

      beta

      40

      22

      8.37

      27

      1FMK

      Src SH3

      beta

      57

      43

      4.05

      28

      1SHG

      Alpha-spectrin SH3

      beta

      57

      45

      2.10

      29

      1NYF

      Fyn SH3

      beta

      58

      43

      4.54

      30

      1PKS

      PI3 kinase SH3

      beta

      76

      51

      -1.06

      31

      1C8C

      Chromosomal protein Sso7d

      beta

      64

      39

      6.95

      32

      1PSE

      Photosystem I accessory protein E (PSAE)

      beta

      69

      43

      1.17

      33

      1C9O

      Bc-Csp

      beta

      66

      47

      7.20

      34

      1G6P

      Tm-Csp

      beta

      66

      50

      6.30

      34

      1CSP

      Bs-CspB

      beta

      67

      48

      6.54

      34

      1MJC

      CspA

      beta

      69

      47

      5.23

      35

      2AIT

      Tendamistat

      beta

      74

      61

      4.21

      36

      1K8M

      hbLBD of BCKD

      beta

      87

      59

      -0.71

      37

      1TEN

      TNfn3

      beta

      89

      75

      1.06

      38

      1FNF_9

      FNfn9

      beta

      90

      76

      -0.92

      39

      1WIT

      Twitchin

      beta

      93

      80

      0.41

      40

      1QTU

      Oncoprotein P13MTCP1

      beta

      115

      70

      -0.36

      42

      1DIV_n

      N-terminal of L9

      alpha/beta

      56

      25

      6.61

      43

      2PTL

      Protein L (B1 domain)

      alpha/beta

      62

      36

      4.10

      44

      2CI2

      Chymotrypsin inhibitor CI2

      alpha/beta

      65

      35

      3.87

      45

      1RFA

      Ras-binding domain of C-RAF-1

      alpha/beta

      78

      49

      7.0

      46

      2HQI

      MerP

      alpha/beta

      72

      51

      0.18

      47

      1HDN

      Phosphotransferase HPr

      alpha/beta

      85

      51

      2.69

      49

      1URN

      U1A

      alpha/beta

      96

      49

      5.76

      50

      2ACY

      Common-type AcP

      alpha/beta

      98

      61

      0.84

      51

      1APS

      Muscle AcP

      alpha/beta

      98

      64

      -1.47

      52

      1DIV_c

      C-terminal of L9

      alpha/beta

      93

      52

      0.0

      53

      1N88

      Ribosomal protein L23

      alpha/beta

      96

      49

      3.0

      54

      1FKB

      FKBP12

      alpha/beta

      107

      80

      1.45

      55

      Table 6: The set of 35 multi-state proteins

      PDB

      Protein

      Structure class

      Length

      N_alpha

      ln(K_f)

      ref

      2ABD

      Acyl-CoA binding protein ACBP

      alpha

      86

      33

      6.48

      89

      2CRO

      434 Cro protein

      alpha

      65

      26

      5.35

      57

      1UZC

      FF domain from HYPA/FBP11

      alpha

      69

      17

      8.68

      58

      1CEI

      Colicin E7 immunity protein

      alpha

      85

      22

      5.8

      20

      1BRS

      Barstar

      alpha

      90

      43

      3.37

      59

      2A5E

      P16 protein

      alpha

      156

      85

      3.50

      62

      1TIT

      Twitchin Ig repeat 27

      beta

      89

      72

      3.6

      40

      1FNF_10

      human tenascin (FNfn10)

      beta

      93

      75

      5.48

      40

      1HNG

      CD2, 1st domain

      beta

      96

      73

      1.8

      64

      1ADW

      Apo-pseudoazurin

      beta

      123

      84

      0.64

      65

      1EAL

      ileal lipid binding protein (ILBP)

      beta

      127

      80

      1.3

      66

      1IFC

      IFABP from rat

      beta

      131

      94

      3.4

      67

      1OPA

      CRBP II

      beta

      133

      97

      1.4

      67

      1HCD

      Hisactophilin

      beta

      118

      78

      1.1

      68

      1BEB

      Bovine beta-lactoglobulin

      beta

      156

      103

      -2.20

      69

      1B9C

      Green fluorescent protein

      beta

      224

      171

      -2.76

      70

      1I1B

      Interleukin-1b

      beta

      151

      105

      -4.01

      91

      1PGB_ab

      Protein G

      alpha/beta

      56

      33

      6.40

      71

      1UBQ

      Ubiquitin

      alpha/beta

      76

      49

      5.90

      72

      1GXT

      N-terminal domain of HypF

      alpha/beta

      89

      53

      4.39

      73

      1SCE

      Cell cycle regulatory protein

      alpha/beta

      97

      22

      4.17

      74

      1HMK

      Goat alpha-lactalbumin

      alpha/beta

      121

      57

      2.79

      80

      3CHY

      CheY

      alpha/beta

      128

      62

      1.0

      76

      1HEL

      Lysozyme (hen egg white)

      alpha/beta

      129

      69

      1.25

      77

      1DK7

      GroEL apical domain

      alpha/beta

      146

      97

      0.83

      81

      1JOO

      Staphylococcal nuclease

      alpha/beta

      149

      75

      0.30

      78

      2RN2

      Ribonuclease HI

      alpha/beta

      155

      85

      1.41

      90

      1RA9

      Dihydrofolate reductase DHFR

      alpha/beta

      159

      96

      -2.46

      82

      1PHP_n

      PGK.n

      alpha/beta

      175

      97

      2.30

      84

      1PHP_c

      PGK.c

      alpha/beta

      219

      121

      -3.44

      85

      2BLM

      Exo small beta-lactamase

      alpha/beta

      260

      136

      -1.24

      86

      1QOP_a

      Tryptophan synthase alpha-subunit

      alpha/beta

      268

      132

      -2.5

      87

      1QOP_b

      Tryptophan synthase beta2-subunit

      alpha/beta

      392

      218

      -6.9

      88

      1BTA

      Barnase (G specific endonuclease)

      alpha/beta

      89

      48

      1.11

      92

      1L63

      Phage T4 lysozyme

      alpha/beta

      162

      64

      4.10

      93

References

[12] Qiu, L. L., Pabit, S. A., Roitberg, A. E. & Hagen, S. J. (2002). Smaller and faster: the 20-residue Trp cage protein folds in 41s. J. Am. Chem. Soc. 124, 14548-14549.
[14] Kubelka, J., Eaton, W. A. & Hofrichter, J. (2003). Experimental tests of villin subdo-main folding simulations. J. Mol. Biol. 329, 625-630.
[15] Spector, S. & Raleigh, D. P. (1999). Submillisecond folding of the peripheral subunit-binding domain. J. Mol. Biol. 293, 763-768.
[16] Wang, T., Zhu, Y. & Gai, F. (2004). Folding of a three-helix bundle at the folding speed limit. J. Am. Chem. Soc. 108, 3694-3697.
[17] Gianni, S. et al. (2003). Unifying features in protein-folding mechanisms. Proc. Natl. Acad. Sci. USA 100, 13286-13291.
[18] Myers, J. K. & Oas, T. G. (2001). Preorganized secondary structure as an important determinant of fast protein folding. Nature Struct. Biol. 8, 552-558.
[19] Zhu, Y. et al. (2003). Ultrafast folding of ?3D: a de novo designed three-helix bundle protein. Proc. Natl. Acad. Sci. USA 100, 15486-15491.
[20] Ferguson, N., Capaldi, A. P., James, R., Kleanthous, C. & Radford, S. E. (1999). Rapid folding with and without populated intermediates in the homologous four-helix proteins Im7 and Im9. J. Mol. Biol. 286, 1597-1608.
[21] Burton, R. E., Huang, G. S., Daugherty, M. A., Fullbright, P. W. & Oas, T. G. (1996). Microsecond protein folding through a compact transition state. J. Mol. Biol. 263, 311-322.
[24] Munoz, V., Thompson, P. A., Hofrichter, J. & Eaton, W. A. (1997). Folding dynamics and mechanism of beta-hairpin formation. Nature 390, 196-199.
[25] Jager, M., Nguyen, H., Crane, J.C., Kelly, J.W. & Gruebele, M. (2001). The folding mechanism of a beta-sheet: the WW domain. J. Mol. Biol. 311, 373-393.
[26] Ferguson, N., Johnson, C.M., Macias, M., Oschkinat, H. & Fersht, A. R. (2001). Ultrafast folding of WW domains without structured aromatic clusters in the denatured state. Proc. Natl. Acad. Sci. USA 98, 13002-13007.
[27] Crane, J.C., Koepf, E.K., Kelly, J.W. & Gruebele, M. (2000). Mapping the transition state of the WW domain beta sheet. J. Mol. Biol. 298, 283-292.
[28] Grantcharova, V. P. & Baker, D. (1997). Folding dynamics of the src SH3 domain. Biochemistry 36, 15685-15692.
[29] Viguera, A., Martinez, J., Filimonov, V., Mateo, P. & Serrano, L. (1994). Thermodynamic and kinetic-analysis of the SH3 domain of spectrin shows a two-state folding transition. Biochemistry 33, 2142-2150.
[30] Plaxco, K. W., Guijarro, J. I., Morton, C. J., Pitkeathly, M., Campbell, I.D. & Dobson, C. M. (1998). The folding kinetics and thermodynamics of the Fyn-SH3 domain. Biochemistry 37, 2529-2537.
[31] Guijarro, J. I., Morton, C. J., Plaxco, K. W., Campbell, I. D. & Dobson, C. M. (1998). Folding kinetics of the SH3 domain of PI3 kinase by real-time NMR combined with optical spectroscopy. J. Mol. Biol. 276, 657-667.
[32] Guerois, R. & Serrano, L. (2000). The SH3-fold family: experimental evidence and prediction of variations in the folding pathways. J. Mol. Biol. 304, 967-982.
[33] Bowers, P. & Baker, D. unpublished results.
[34] Perl, D., Welker, C., Schindler, T., Schroder, K., Marahiel, M. A., Jaenicke, R. & Schmid, F. X. (1998). Conservation of rapid two-state folding in mesophilic, thermophilic and hyperthermophilic cold shock proteins. Nature Struct. Biol. 5, 229-235.
[35] Reid, K. L., Rodriguez, H. M., Hillier, B. J. & Gregoret, L. M. (1998). Stability and folding properties of a model beta-sheet protein, Escherichia coli CspA. Protein Sci. 7, 470-479.
[36] Schonbrunner, N., Koller, K.-P. & Kiefhaber, T. (1997). Folding of the disulfide-bonded beta-sheet protein tendamistat: rapid two-state folding without hydrophobic collapse. J. Mol. Biol. 268, 526-538.
[37] Naik, M. T., Chang, Y. C. & Huang, T. H. (2002). Folding kinetics of the lipoic acid-bearing domain of human mitochondrial branched chain alpha-ketoacid dehydrogenase complex. FEBS Lett. 530, 133-138.
[38] Viguera, A. R., Serrano, L. & Wilmanns, M. (1996). Different folding transition states may result in the same native structure. Nature Struct. Biol. 3, 874-880.
[39] Plaxco, K. W., Spitzfaden, C., Campbell, I. D. & Dobson, C. M. (1997). A comparison of the folding kinetics and thermodynamics of two homologous fibronectin type III modules. J. Mol. Biol. 270, 763-770.
[40] Clarke, J., Cota, E., Fowler, S. B. & Hamill, S. J. (1999). Folding studies of immunoglobulin-like beta-sandwich proteins suggest that they share a common folding pathway. Structure 7, 1145-1153.
[42] Roumestand, C., Boyer, M., Guignard, L., Barthe, P. & Royer, C. A. (2001). Characterization of the folding and unfolding reactions of a small beta-barrel protein of novel topology, the MTCP1 oncogene product P13. J. Mol. Biol. 312, 247-259.
[43] Kuhlman, B., Luisi, D. L., Evans, P. A. & Raleigh, D. P. (1998). Global analysis of the effects of temperature and denaturant on the folding and unfolding kinetics of the N-terminal domain of the protein L9. J. Mol. Biol. 284, 1661-1670.
[44] Kim, D. E., Fisher, C. & Baker, D. (2000). A breakdown of symmetry in the folding transition state of protein L. J. Mol. Biol. 298, 971-984.
[45] Jackson, S. E. & Fersht, A. R. (1991). Folding of chymotrypsin inhibitor-2: 1. Evidence for a two-state transition. Biochemistry 30, 10428-10435.
[46] Vallee-Belisle, A., Turcotte, J. F. & Michnick, S. W. (2004). raf RBD and ubiquitin proteins share similar folds, folding rates and mechanisms despite having unrelated amino acid sequences. Biochemistry 43, 8447-8458.
[47] van Nuland, N. A. J., Meijberg, W., Warner, J., Forge, V., Scheek, R. M., Robillard, G. T., & Dobson, C. M. (1998). Slow cooperative folding of a small globular protein Hpr. Biochemistry 37, 622-637.
[49] Aronsson, G., Brorsson, A.-C., Sahlman, L. & Jonsson, B.-H. (1997). Remarkably slow folding of a small protein. FEBS Lett. 411, 359-364.
[50] Silow, M. & Oliveberg, M. (1997). High-energy channeling in protein folding. Biochemistry 36, 7633-7637.
[51] Taddei, N., Chiti, F., Paoli, P., Fiaschi, T., Bucciantini, M., Stefani, M., Dobson, C.M. & Ramponi, G. (1999). Thermodynamics and kinetics of folding of common-type acylphosphatase: Comparison to the highly homologous muscle isoenzyme. Biochemistry 38, 2135-2142.
[52] Chiti, F., Taddei, N., White, P. M., Bucciantini, M., Magherini, F., Stefani, M. & Dobson, C. M. (1999). Mutational analysis of acylphosphatase suggests the importance of topology and contact order in protein folding. Nature Struct. Biol. 6, 1005-1009.
[53] Sato, S. & Raleigh, D. P. (2002). pH-dependent stability and folding kinetics of a protein with an unusual alpha-beta topology: The C-terminal domain of the ribosomal protein L9. J. Mol. Biol. 318, 571-582.
[54] Hedberg, L. & Oliveberg, M. (2004). Scattered Hammond plots reveal secondary level of site-specific information in protein folding: phi'(beta++). Proc. Natl. Acad. Sci. USA 101, 7606-7611.
[55] Main, E. R. G., Fulton, K. F. & Jackson, S. E. (1999). Folding pathway of FKBP12 and characterization of the transition state. J. Mol. Biol. 291, 429-444.
[56] Mayor, U., Johnson, C. M., Daggett, V. & Fersht, A. R. (2000). Protein folding and unfolding in microseconds to nanoseconds by experiment and simulation. Proc. Natl. Acad. Sci. USA 97, 13518-13522.
[57] Laurents, D. V., Corrales, S., Elias-Arnanz, M., Sevilla, P., Rico, M. & Padmanabhan,(2000). Folding Kinetics of Phage 434 Cro Protein. Biochemistry 39, 13963-13973.
[58] Jemth, P., Gianni, S., Day, R., Li, B., Johnson, C. M., Daggett, V. & Fersht, A. R. (2004). Demonstration of a low-energy on-pathway intermediate in a fast-folding protein by kinetics, protein engineering, and simulation. Proc. Natl. Acad. Sci. USA 101, 6450-6455.
[59] Schreiber, G. & Fersht, A. R. (1993). The refolding of cis-peptidylprolyl and trans-peptidylprolyl isomers of barstar. Biochemistry 32, 11195-11203.
[62] Tang, K. S., Guralnick, B. J., Wang, W. K., Fersht, A. R. & Itzhaki, L. S. (1999). Stability and folding of the tumour suppressor protein p16. J. Mol. Biol. 285, 1869-1886.
[64] Parker, M. J., Dempsey, C. E., Lorch, M. & Clarke, A. R. (1997). Acquisition of native beta-strand topology during the rapid collapse phase of protein folding. Biochemistry 36, 13396-13405.
[65] Reader, J. S., Van Nuland, N. A. J., Thompson, G. S., Ferguson, S. J., Dobson, C.M. & Radford, S. E. (2001). A partially folded intermediate species of the beta-sheet protein apo-pseudoazurin is trapped during proline-limited folding. Protein Sci. 10, 1216-1224.
[66] Dalessio, P. M. & Ropson, I. J. (2000). beta-sheet proteins with nearly identical structures have different folding intermediates. Biochemistry 39, 860-871.
[67] Burns, L. L., Dalessio, P. M. & Ropson, I. J. (1998). Folding mechanism of three structurally similar beta-sheet proteins. Proteins 33, 107-118.
[68] Liu, C. S., Gaspar, J. A., Wong, H. J. & Meiering, E. M. (2002). Conserved and nonconserved features of the folding pathway of hisactophilin, a beta-trefoil protein. Protein Sci. 11, 669-679.
[69] Kuwajima, K., Yamaya, H. & Sugai, S. (1996). The burst-phase intermediate in the refolding of beta-lactoglobulin studied by stopped-flow circular dichroism and absorption spectroscopy. J. Mol. Biol. 264, 806-822.
[70] Enoki, S. & Kuwajima, K. unpublished results.
[71] McCallister, E. L., Alm, E. & Baker, D. (2000). Critical role of beta-hairpin formation in protein G folding. Nature Struct. Biol. 7, 669-673.
[72] Khorasanizadeh, S., Peters, I. D. & Roder, H. (1996). Evidence for a three-state model of protein folding from kinetic analysis of ubiquitin variants with altered core residues. Nature Struct. Biol. 3, 193-205.
[73] Calloni, G., et al. (2003). Comparison of the folding processes of distantly related proteins: Importantce of hydrophobic content in folding. J. Mol. Biol. 330, 577-591.
[74] Schymkowitz, J. W. H., Rousseau, F., Irvine, L. R. & Itzhaki, L. S. (2000). The folding pathway of the cell-cycle regulatory protein p13suc1: clues for the mechanism of domain swapping. Struct. Fold. Des. 8, 89-100.
[76] Munoz, V., Lopez, E. M., Jager, M. & Serrano, L. (1994). Kinetic characterization of the chemotactic protein from Escherichia coli, CheY: Kinetic analysis of the inverse hydrophobic effect. Biochemistry 33, 5858-5866.
[77] Kiefhaber, T. (1995). Kinetic traps in lysozyme folding. Proc. Natl. Acad. Sci. USA 92, 9029-9033.
[78] Maki, K., Cheng, H., Dolgikh, D. A., Shastry, M. C. R. & Roder, H. (2004). Early events during folding of wild-type staphylococcal nuclease and a single-tryptophan variant studied by ultrarapid mixing. J. Mol. Biol. 338, 383-400.
[80] Saeki, K., Arai, M., Yoda, T., Nakao, M. & Kuwajima, K. (2004). Localized nature of the transition-state structure in goat alpha-lactalbumin folding. J. Mol. Biol. 341, 589-604.
[81] Golbik, R., Zahn, R., Harding, S. E. & Fersht, A. R. (1998). Thermodynamic stability and folding of GroEL minichaperones. J. Mol. Biol. 276, 505-515.
[82] Parker, M. J. & Marqusee, S. (1999). The cooperativity of burst phase reactions explored. J. Mol. Biol. 293, 1195-1210.
[84] Parker, M. J., Spencer, J. & Clarke, A. R. (1995). An integrated kinetic analysis of intermediates and transition states in protein folding reactions. J. Mol. Biol. 253, 771-786.
[85] Parker, M. J., Sessions, R. B., Badcoe, I. G. & Clarke, A. R. (1996). The development of tertiary interactions during the folding of a large protein. Folding & Design 1, 145-156.
[86] Santos, J. et al. (2004). Folding of an abridged beta-lactamase. Biochemistry 43, 1715-1723.
[87] Ogasahara, K. & Yutani, K. (1994). Unfolding-refolding kinetics of the tryptophan synthase alpha subunit by CD and fluorescence measurements. J. Mol. Biol. 236, 1227-1240.
[88] Ogasahara, K. & Yutani, K. (1990). An early immunoreactive folding intermediate of the tryptophan synthase beta 2 subunit is a "molten globule". FEBS Lett. 263, 51-56.
[89] Teilum, K., Maki, K., Kragelund, B. B., Poulsen, F. M. & Roder, H. (2002). Early kinetic intermediate in the folding of acyl-CoA binding protein detected by fluorescence labeling and ultrarapid mixing. Proc. Natl Acad. Sci. USA, 99, 9807–9812.
[90] Raschke, T. M. & Marqusee, S. (1997). The kinetic folding intermediate of ribonuclease H resembles the acid molten globule and partially unfolded molecules detected under native conditions. Nature Struct. Biol. 4, 298–304.
[91] Finke, J. M. & Jennings, P. A. (2002). Interleukin-1b folding between pH 5 and 7: experimental evidence for three-state folding behavior and robust transition state positions late in folding. Biochemistry, 41, 15056–15067.
[92] Ikura, T. & Fersht, A. R. (2001). [Folding mechanism and folding rate]. Tanpakushitsu Kakusan Koso, 46, 1553–1559.
[93] Parker, M. J. & Marqusee, S. (1999). The cooperativity of burst phase reactions explored. J. Mol. Biol. 293, 1195–1210.