+++++++++++++++++++++++++++++++++++++++++ | Rna3Dmotif | | Model Catalogue Compare | | | | by Mahassine Djelloul, April 2010 | +++++++++++++++++++++++++++++++++++++++++ *************** * A. CONTENTS * *************** Rna3Dmotif package contains six independant programs in the form of shell scripts: 1. Catalog 2. Listing 3. Lecns 4. Ljcns 5. Simlocmot 6. Simintmot These programs input the annotated basepair interactions, base stacking interactions and backbone connectivity relations of an RNA structure. These annotations were downloaded (April 3, 2010) as a zip archive "FR3D_AnalyzedStructures_All.zip" and uncompressed in the folder "Rna3Dmotif/DATA/". The base pair interactions follow the Leontis-Westhof nomenclature. Base stacking interactions are restricted in Rna3Dmotif to the s55 and s33 types. Backbone connectivity relations are used to infer phosphodiester connections between adjacent bases. For more details on these interactions, visit : http://rna.bgsu.edu/FR3D/AnalyzedStructures/All/ In addition of the annotated interactions (base pair, stacking and connectivity), the programs Catalogue and Listing require PDB files (corresponding to the structures to be processed). These PDB files must be downloaded from the Protein Data Bank website (http://www.pdb.org/pdb/home/home.do) and saved in the folder "Rna3Dmotif/DATA/". The other programs (i.e. Lecns, Ljcns, Simlocmot and Simintmot) do not require PDB files. *************************** * B. DETAILED DESCRIPTION * *************************** 1. Catalog ------------ Produces the catalogue of secondary structure elements of a set of RNA structures. The input is a text file, say "sample.txt", where the pairs (PDB_ID,Chain_ID) are formatted as follows: PDB-ID\tabChain_ID\n PDB-ID\tabChain_ID\n PDB-ID\tabChain_ID\n Example: >>>>>>>>> 1S72 0 1S72 9 2AVY A 2TRA A >>>>>>>>> To produce the catalogue, make sure that the PDB files corresponding to your structures are available in the folder "Rna3Dmotif/DATA/", then type: ./Catalog.sh sample.txt This should create a folder "CATALOGUE" containing an html file "index.html" and two folders DESC and VIEW3D. To browse the catalogue, move to folder CATALOGUE and double click on the file "index.html". 2. Listing ---------- Produces the listing of interaction motifs of a set of RNA structures. The input is a text file, say "sample.txt", where the pairs (PDB_ID,Chain_ID) are formatted as follows: PDB-ID\tabChain_ID\n PDB-ID\tabChain_ID\n PDB-ID\tabChain_ID\n Example: --------- 1S72 0 1S72 9 2AVY A 2TRA A --------- To produce the listing, make sure that the PDB files corresponding to your structures are available in the folder "Rna3Dmotif/DATA/", then type: ./Listim.sh sample.txt This should create a folder "LISTING" containing an html file "index.html" and two folders DESC and VIEW3D. To browse the listing, move to folder LISTING and double click on the file "index.html". 3. Lecns -------- This program extracts the LECNS of two secondary structure elements given by their PDB_ID, Chain_ID and Catalogue Identifiers. Example: ./Lecns.sh 1S72 9 7 1J5E A 62 The output is a mapping of the bases and basepairs which looks like this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LECNS of 1S72.9.7 [eilmm] and 1J5E.A.62 [eilmm]: [eilmm] Bases mapping: 77_A || 889_A 104_A || 908_A 79_U || 891_U 103_A || 907_A 78_G || 890_G 80_A || 892_A 102_G || 906_G 76_G || 888_G 105_A || 909_A 101_G || 905_U 81_C || 893_C 106_C || 910_C 75_G || 887_G Base-pairs mapping: 77_A--C/C --- 78_G || 889_A--C/C --- 890_G 77_A--H/h t-- 104_A || 889_A--H/h t-- 908_A 104_A--C/C --- 105_A || 908_A--C/C --- 909_A 79_U--W/H t-- 103_A || 891_U--W/H t-- 907_A 78_G--S/H c-- 79_U || 890_G--S/H c-- 891_U 79_U--C/C --- 80_A || 891_U--C/C --- 892_A 103_A--C/C --- 104_A || 907_A--C/C --- 908_A 78_G--C/C --- 79_U || 890_G--C/C --- 891_U 80_A--H/S t-- 102_G || 892_A--H/S t-- 906_G 80_A--C/C --- 81_C || 892_A--C/C --- 893_C 102_G--C/C --- 103_A || 906_G--C/C --- 907_A 76_G--C/C --- 77_A || 888_G--C/C --- 889_A 76_G--S/H t-- 105_A || 888_G--S/H t-- 909_A 105_A--C/C --- 106_C || 909_A--C/C --- 910_C 101_G--C/C --- 102_G || 905_U--C/C --- 906_G 75_G--C/C --- 76_G || 887_G--C/C --- 888_G >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. Ljcns -------- Like Lecns, this program extracts the LJCNS of two interaction motifs given by their PDB_ID, Chain_ID and Listing Identifiers. Example: ./Ljcns.sh 2AW4 B 49 2AVY A 31 The output is a mapping of the bases and basepairs which looks like this: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LJCNS of 2AW4.B.49 [ffg] and 2AVY.A.31 [ffg]: [ffg] Bases mapping: 1837_C ||1218_C 1927_A ||1015_G 1836_C ||1217_C 1928_A ||1016_A 1904_G || 988_G Base-pairs mapping: 1837_C--S/s c--1927_A || 1015_G--s/S c--1218_C 1927_A--C/C ---1928_A || 1015_G--C/C ---1016_A 1836_C--C/C ---1837_C || 1217_C--C/C ---1218_C 1836_C--s/S c--1928_A || 1016_A--S/s c--1217_C 1904_G--s/S t--1928_A || 988_G--s/S t--1016_A >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. Simlocmot ------------ This program inputs two structures given by their PDB_ID and Chain_ID and outputs : - on the standard output (i.e the screen), the detailed list of all possible LECNS of their secondary structure elements, - in a spreadsheet file "Summary_Simlocmot*.xls", the summary of these LECNS. Example: If you type the command: ./Simlocmot.sh 1S72 9 1J5E A You will produce, on the standard output,the detailed mappings of all possible LECNS of pairs of secondary structure elements belonging to the two input structures. *** Note that if you want to redirect the standard output to a file "result.txt", you should type: *** *** ./Simlocmot.sh 1S72 9 1J5E A > result.txt *** In a file "Summary_Simlocmot_1S72_9_1J5E_A.xls", only the information on the names and non-canonical labels are listed. The first 10 lines should look like these: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LECNS of 1J5E.A.2 cgmm and 1J5E.A.32 eghiim : gm LECNS of 1J5E.A.2 cgmm and 1J5E.A.59 cjmm : cm LECNS of 1J5E.A.2 cgmm and 1J5E.A.62 eilmm : mm LECNS of 1J5E.A.2 cgmm and 1J5E.A.67 effilmm : mm LECNS of 1J5E.A.4 hhiiimm and 1J5E.A.17 iiii : ii LECNS of 1J5E.A.4 hhiiimm and 1J5E.A.38 ijm : im LECNS of 1J5E.A.4 hhiiimm and 1J5E.A.42 imm : im LECNS of 1J5E.A.4 hhiiimm and 1J5E.A.53 bggimm : im LECNS of 1J5E.A.4 hhiiimm and 1J5E.A.57 imm : im LECNS of 1J5E.A.4 hhiiimm and 1J5E.A.59 cjmm : mm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> You can then open this file with a spreadsheet program and sort the data by the last column to get the clusters of potential recurrent local motifs having the same non-canonical common labels. Example, after sorting "Summary_Simlocmot_1S72_9_1J5E_A.xls", the lines : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LECNS of 1S72.9.7 eilmm and 1J5E.A.67 effilmm : eilm LECNS of 1J5E.A.62 eilmm and 1J5E.A.67 effilmm : eilm LECNS of 1S72.9.7 eilmm and 1J5E.A.62 eilmm : eilmm >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> correpond to a sarcin-ricin cluster (eilm) composed of the secondary structure elements 1S72.9.7, 1J5E.A.62 and 1J5E.A.67. 6. Simintmot ------------ This program inputs two structures given by their PDB_ID and Chain_ID and outputs : - on the standard output (i.e the screen), the detailed list of all possible LJCNS of their interaction motifs, - in a spreadsheet file "Summary_Simintmot*.xls", the summary of these LJCNS . Example: If you type the command: ./Simintmot.sh 2AVY A 3EOH A You will produce, on the standard output,the detailed mappings of all possible LJCNS of pairs of interaction motifs belonging to the two input structures. *** Note that if you want to redirect the standard output to a file "result.txt", you should type: *** *** ./Simintmot.sh 2AVY A 3EOH A > result.txt *** In a file "Summary_Simintmot_2AVY_A_3EOH_A.xls", only the information on the names and non-canonical labels are listed. You can then open this file with a spreadsheet program and sort the data by the last column to get the clusters of potential recurrent interaction motifs having the same non-canonical common labels. Example, after sorting, the 21 first lines should look like these: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> LJCNS of 2AVY.A.13 ffg and 2AVY.A.32 ffg : ffg LJCNS of 2AVY.A.21 fffgg and 3EOH.A.4 ffg : ffg LJCNS of 2AVY.A.21 fffgg and 2AVY.A.32 ffg : ffg LJCNS of 2AVY.A.21 fffgg and 2AVY.A.35 ffg : ffg LJCNS of 2AVY.A.13 ffg and 2AVY.A.35 ffg : ffg LJCNS of 2AVY.A.13 ffg and 2AVY.A.41 ffg : ffg LJCNS of 2AVY.A.21 fffgg and 3EOH.A.7 ffg : ffg LJCNS of 2AVY.A.21 fffgg and 2AVY.A.41 ffg : ffg LJCNS of 2AVY.A.35 ffg and 2AVY.A.41 ffg : ffg LJCNS of 2AVY.A.32 ffg and 2AVY.A.41 ffg : ffg LJCNS of 2AVY.A.32 ffg and 2AVY.A.35 ffg : ffg LJCNS of 2AVY.A.13 ffg and 3EOH.A.7 ffg : ffg LJCNS of 2AVY.A.13 ffg and 3EOH.A.4 ffg : ffg LJCNS of 3EOH.A.4 ffg and 3EOH.A.7 ffg : ffg LJCNS of 2AVY.A.35 ffg and 3EOH.A.7 ffg : ffg LJCNS of 2AVY.A.41 ffg and 3EOH.A.4 ffg : ffg LJCNS of 2AVY.A.32 ffg and 3EOH.A.4 ffg : ffg LJCNS of 2AVY.A.35 ffg and 3EOH.A.4 ffg : ffg LJCNS of 2AVY.A.32 ffg and 3EOH.A.7 ffg : ffg LJCNS of 2AVY.A.13 ffg and 2AVY.A.21 fffgg : ffg LJCNS of 2AVY.A.41 ffg and 3EOH.A.7 ffg : ffg >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> They correspond to a potential A-minor cluster (ffg).