3D Chromosome folding in cell nucleus

The 3D folding of chromosomes and organization of cellular nucleus play key roles in many fundamental cellular processes such as regulation of gene expression, DNA repliction, and cellular specialization. Malfunctions in chromosome folding are likely related to many diseases such as cancer. The exciting advancement in techniques of chromosome conformation capture techniques such as Hi-C, as well as single-cell measurements has provided a wealth of information on chromatin organization.

To better our understanding the nature of chromosome organization in cells, we are developing models and computational tools to study 3D ensembles of chromatins. Our study showed that simple principles of self-avoiding polymer chains in the spatial confinement of cell nucleus can give rise to expeirmentally measrued scapling rules, such as looping probability and spatial distance with genomic seperation. With a few simple biological landmarks, our model can reproduce genome-wide Hi-C measurements of budding yeast , including both inter- and intra-chromsome interactions, at 15 kB resolution. Our model also uncovered the spatial proximity relationship among fragile sites in yeast determined independently by genetic analysis. Our 3D ensemble model also predicted functional interactions and uncovered a novel site of 3-body interactions in alpha-globin.

We are further developing theory, algorithms, and software tools to study chromosome folding. We can now reconstruct large ensembles of single-cell 3D chromatin model conformations from population Hi-C measurements. Our results revealed that a small set of specific interactions (5–6% in Drosophila) of measured Hi-C frequencies, surprisingly, are sufficient to drive chromatin folding, giving rise to much of observed Hi-C topological features. Our models reveal details on how varying single-cell domain boundaries become fixated through-out development in Drosophila, with strong preferred-positioning at binding sites of insulator complexes.

We are also working on uncovering the functional landscape of higher order many-body chromatin interactions (Genome Biology, In press). As emerging evidence suggests many-body spatial interactions play important roles in condensing super-enhancer regions into a cohesive transcriptional apparatus in cells, the nature of their presence is uncharacterized. Our computational model (CHROMATIX) can identify significant many-body interactions reconstructed from structural ensembles based on Hi-C data. This overcomes the limitation of the pairwise and population-averaged nature of Hi-C studies. For a diverse set of highly-active transcriptional loci with at least 2 super-enhancers, we have constructed detailed many-body functional landscape. Our results further revealed that epigenietic marks such as DNase-accessibility, POLR2A binding, and decreased H3K27me3 are predictive of enriched regions of many body interactions.

Stochastistic model of phenotype switching and cellular fate

How do cells decide their fate? This is a fundamental question in biology that belies key events such as cell differentiation and phenotype switching, and has great implications in understanding stem cell development, viral infection, and cancer development. A fundamental framework for modeling the relevant networks of molecular interctions at mesoscopic scale is that of the chemical reaction kinetics. Viewed from that of reaction trajectories, this problem is often modeled using Stochastic Simulation Algorithm (also called Gillespie algorithm). Viewed from that of probability density, this is often called the chemical master equation, and is often approached using Fokker-Planck equation after certain approximations.

We have developed the state-of-the-art algorithm for exact computation of the probability landscape of stochastic reaction networks. The ACME method we developed can now be used to study a broad class of problems that were previously inaccessible to rigorous computational investigations. Applications of our method include mechanistic understanding of the origin of systems stability against perturbation, robustness against genetic mutations, regulation mechanism of switch of cellular fate, as well as heritable epigenetic state. These questions can be answered through analysis of probablitly landscape computed using the ACME method, as shown in a study of phage lambda. This approach also showed promise in designing new strategy to address the latency problem in comabting viral infection, as we showed in our studies of HIV.

We are extending our model so discrete probability flux can be formulated and studied. This approach has uncovered the phenomenon of stochastic oscilation in toggle-switch at weak binding condition.

Protein binding site and function prediction

How protein structure determines protein function is a fundamental problem in molecular biology. We study the geometry, shape, physicochemical texture of binding surfaces to understand how they work and how functional roles of proteins can be predicted. We have built a protein surface library of >2 million surfaces for most structures in the Protein Data Bank, which provide analytical measurement of volume and area of surface pockets and interior voids of proteins. By assessing similarity in binding pocket residues, shape, and orientation, we have developed a method to discover functional relationship between similar binding surfaces. This method can uncovered novel examples of related protein functions on protein structures of different fold or different class. In collaboration with Dr. Andrej Jochimiak, this method has been applied to infer functions of protein structures solved from the structural genomics project.

We are also studying the relationship of binding surface mutation, residue conservation, and their relationship with disease causing single nucleotide polymorphism (SNP analysis). Our work showed that nonsynonymous SNPs have important structural characteristics that can be associated with those SNPs that affect the phenotype of affected individuals.

Protein-protein interactions

Protein-protein interactions are involved in most cellular processes, including signal transduction, immune response, metabolism, and cell cycle control. We are interested in identifying structural features of binding surfaces of protein-protein interactions. With our collaborators, recently we developed the concept of complemented pockets that are strong structural features indicative of "hot-spot" of protein-protein interactions. Our work also point out new possible directions for protein-docking by locating those important pockets that pre-exist in uncomplemented protein structures.

In protein design, one seeks to identify amino acid sequences that are compatible to a desired structural template but are incompatible with any other structural template. A key ingredient for the success in protein design is to develop an effective scoring function to discriminate good sequences from incompatible sequences and sequences that do not fold. We have developed a theoretical framework of designing optimal nonlinear scoring function in the form of Gaussian mixtures that can identify >200 native nonhomologous proteins simultaneously from >3million sequences-structure decoys.

A promising strategy in developing therapeutics against disease is to design protein-like peptide drugs that can interfere with undesired protein-protein interactions. An effective strategy is to screening combinatorial library of peptides displayed on phages. However, once the length of the peptide goes beyond 7 residues, it is difficult to exhaustively search for all possible sequences. We have developed a weighted method that can introduce beneficial bias in the sequences of designed phage library. We have also developed a universal codon schema that helps to enrich active peptides in constructing a phage library by 1000-fold.

Membrane protein

Membrane proteins are an important class of proteins where structural information is relatively sparse compared to soluble proteins. We have characterized various physicochemical properties of residues in transmembrane helices, including propensity for lipid interactions (join work with Dr. DeGrado and colleagues), propensity for interstrand interactions, and propensity for higher order interactions. Our work also showed that almost all TM helices contain hydrogen bond, and spatial motifs such as serine zipper and polar clamps we discovered play important roles in TM helix assembly.

Biophsyics of protein structures and protein structure prediction

Tight packing has been thought as an important properties of native protein structures. We have developed methods for quantitative characterization of protein pakcing. We identified specific scaling relationship of protein packing and protein chain length, and showed that such a scaling relationship is due to generic properties of compact chain polymers rather than evolution. This conclusion was drawn with the state-of-the-art sequential Monte Carlo sampling method. Using this sampling method, we have developed simple lattice models to study the critical effects of chirality and side chain inflexibility in reducing the entropy of protein folding. We are developing simplified protein models combining both local sequence and contact propensity for protein structure predictions.

Evolution of protein structure and folding

We are also interested in protein evolution. A controversy in protein folding is whether protein folding core is conserved. We have shown using a maximum-likelihood method that the folding core of proteins important for kinetics is not particularly conserved. We have also reconstructed ancestral folding cores, which are predicted to lead to similar folding behavior despite this ancestral core has never appeared in proteins from extant species. We are continuing our study of they effects of evolution on protein folding, and protein function.