Readme file for supplementary materials of the paper: Empirical potential function for simplified protein models: Combining contact and local sequence-structure descriptors Proteins: structure, function and bioinformatics, by Jinfeng Zhang, Rong Chen and Jie Liang, to be published in Proteins, 2005 angle_cluster.dat is clustered discrete state angles. Detailed descriptions are given in the file. calsp.txt is potential function CASLP with 610 descriptors. There is only one line in the file containing 610 coefficients for the 610 descriptors. The first 400 numbers are for local sequence structure descriptors (this is the same for RCALSP1 and RCALSP2) and the last 210 parameters are for contact descriptors. rcalsp1.txt is potential function RCALSP1 with 400 local sequence-structure descriptors 55 contact descriptors. rcalsp2.txt is potential function RCALSP2 with 400 local sequence-structure descriptors and 165 contact descriptors in three contact order bins. Please see the paper for details. The way to determine the index of the descriptors is as following: For the first 400 local sequence-structure descriptors (from 0 to 399), N = (5*s_i+a_i)*20 + (5*s_{i+1} + a_{i+1}), where N is the index of the descriptor, s_i and s_{i+1} are the discrete state (0-3) at position i and i+1, respectively, a_i and a_{i+1} are the geometricly simplfied alphabet type (0-4) at position i and i+1, respectively. For contact descriptors (starting from 400): N = 20 * R_i + R_j, where N is the index of descriptor, R_i and R_j are the integer representation of the amino acid type at position i and j, respectively. Residue i and j form a contact in this case. The amino acid and their integer representation used in determining the descriptor index is as following: Amino acid integer representation for 20 letter alphabet for contact descriptors in CALSP: A 0 C 1 D 2 E 3 F 4 G 5 H 6 I 7 K 8 L 9 M 10 N 11 P 12 Q 13 R 14 S 15 T 16 V 17 W 18 Y 19 Simplified alphabet based on contact propensities, where 20 amino acid alphabet is simplified to an alphabet with 10 letters, for RCALSP1 and RCALSP2: A 8 C 0 D 1 E 1 F 6 G 9 H 4 I 5 K 2 L 5 M 6 N 3 P 4 Q 3 R 2 S 3 T 3 V 5 W 7 Y 7 Geometrically simplified alphabet with 5 letters for local sequence-structure descriptors used in all potential functions: A 1 C 0 D 4 E 1 F 0 G 2 H 1 I 0 K 1 L 0 M 0 N 4 P 3 Q 1 R 1 S 1 T 1 V 0 W 0 Y 0