Protein binding site and function prediction

How protein structure determines protein function is a fundamental problem in molecular biology. We study the geometry, shape, physicochemical texture of binding surfaces to understand how they work and how functional roles of proteins can be predicted. We have built a protein surface library of >2 million surfaces for most structures in the Protein Data Bank, which provide analytical measurement of volume and area of surface pockets and interior voids of proteins. By assessing similarity in binding pocket residues, shape, and orientation, we have developed a method to discover functional relationship between similar binding surfaces. This method can uncovered novel examples of related protein functions on protein structures of different fold or different class. In collaboration with Dr. Andrej Jochimiak, this method has been applied to infer functions of protein structures solved from the structural genomics project.

We are also studying the relationship of binding surface mutation, residue conservation, and their relationship with disease causing single nucleotide polymorphism (SNP analysis). Our work showed that nonsynonymous SNPs have important structural characteristics that can be associated with those SNPs that affect the phenotype of affected individuals.

Protein-protein interactions

Protein-protein interactions are involved in most cellular processes, including signal transduction, immune response, metabolism, and cell cycle control. We are interested in identifying structural features of binding surfaces of protein-protein interactions. With our collaborators, recently we developed the concept of complemented pockets that are strong structural features indicative of "hot-spot" of protein-protein interactions. Our work also point out new possible directions for protein-docking by locating those important pockets that pre-exist in uncomplemented protein structures.

Protein and peptide design

In protein design, one seeks to identify amino acid sequences that are compatible to a desired structural template but are incompatible with any other structural template. A key ingredient for the success in protein design is to develop an effective scoring function to discriminate good sequences from incompatible sequences and sequences that do not fold. We have developed a theoretical framework of designing optimal nonlinear scoring function in the form of Gaussian mixtures that can identify >200 native nonhomologous proteins simultaneously from >3million sequences-structure decoys.

A promising strategy in developing therapeutics against disease is to design protein-like peptide drugs that can interfere with undesired protein-protein interactions. An effective strategy is to screening combinatorial library of peptides displayed on phages. However, once the length of the peptide goes beyond 7 residues, it is difficult to exhaustively search for all possible sequences. We have developed a weighted method that can introduce beneficial bias in the sequences of designed phage library. We have also developed a universal codon schema that helps to enrich active peptides in constructing a phage library by 1000-fold.

Membrane protein

Membrane proteins are an important class of proteins where structural information is relatively sparse compared to soluble proteins. We have characterized various physicochemical properties of residues in transmembrane helices, including propensity for lipid interactions (join work with Dr. DeGrado and colleagues), propensity for interstrand interactions, and propensity for higher order interactions. Our work also showed that almost all TM helices contain hydrogen bond, and spatial motifs such as serine zipper and polar clamps we discovered play important roles in TM helix assembly.

Biophsyics of protein structures and protein structure prediction

Tight packing has been thought as an important properties of native protein structures. We have developed methods for quantitative characterization of protein pakcing. We identified specific scaling relationship of protein packing and protein chain length, and showed that such a scaling relationship is due to generic properties of compact chain polymers rather than evolution. This conclusion was drawn with the state-of-the-art sequential Monte Carlo sampling method. Using this sampling method, we have developed simple lattice models to study the critical effects of chirality and side chain inflexibility in reducing the entropy of protein folding. We are developing simplified protein models combining both local sequence and contact propensity for protein structure predictions.

Evolution of protein structure and folding

We are also interested in protein evolution. A controversy in protein folding is whether protein folding core is conserved. We have shown using a maximum-likelihood method that the folding core of proteins important for kinetics is not particularly conserved. We have also reconstructed ancestral folding cores, which are predicted to lead to similar folding behavior despite this ancestral core has never appeared in proteins from extant species. We are continuing our study of they effects of evolution on protein folding, and protein function.