Structural Genomics @CNAG · CRG
Software developed in our group
TADbit is a Python library that deals with 3C-based data to detect Topologically Associating Domains (TADs), model their 3D structure and analyze them.
TADkit is a visualizer for 3D genomic data that helps researchers to easy annotate 3D models generated by TADbit.
- TCAMS-TB dataset
The TCAMS-TB dataset includes a series of GSK compounds and its predicted targets in Mycobacterium tuberculosis. This effort is part of the Tropical Disease Initiative.
- TDI Kernel
The TDI Kernel is a database of protein structure models and predicted binding sites of small molecules for 10 Tropical Disease Genomes. This effort is part of the Tropical Disease Initiative.
- Omidios (a.k.a. SeqProfCod)
Omidios, a database of pre-caluclated likely impact of a Single Nucleotide Polymorphism in the human genome.
- SARA (now hosted at the Capriotti Lab)
SARA (Structure Alignment of Ribonucleic Acids) is a fully automated method for aligning two RNA structures by using a unit-vector root mean square (URMS) strategy.
BBibTeX is a PHP script that helps in browsing and managing your BibTeX files as a MySQL database.
- Toy models for 3DGenomics assessment
All data files supplementary data for the article Trussart et al 2015 in NAR
- RNA structure sets
Series of RNA datasets used to develop and benchmark the SARA program.
| ||RNA chains||Alignments|
Series of RNA datasets used for the RNA structural space analysis.
Additionally, an initial version of the set of alignments and their derived probability density functions for MODELLER can be donwloaded here. Users should be aware that such files will most likely change in a near future.
- EvP models sets
The evaluation of the EvPs in model assessment was based on a set of 4,444 structural models divided in 1,877 correct and 2,567 incorrect models. A correct model was defined as a model that superimposed at least 30% of the C-α atoms within 3.5Å to the real structure thus based on proper fold assignment and a relatively accurate sequence/structure alignment. Incorrect models (ie, superimposing less than 15% of the C-α atoms within 3.5Å) were built using a wrong fold or based on the correct fold, but containing a large fraction of misalignments. All PDB formated files for each of the models in the dataset can be donwloaded here (~67Mb)
All calculated EvPs can be downlowded here.
- SNP annotation sets
Two datasets were extracted from the SWISS-PROT subset of the UniProt database. SWISS-PROT classifies protein variants as disease related (i.e., with pathological effects), polymorphism (i.e., with no effect on human health) or unclassified. The SP-Dec05 dataset was derived from the SWISS-PROT release 48 (Dec 2005) and the SP-Dec06 dataset includes only mutations from protein sequences deposited in SWISS-PROT from January to November 2006 (release 51). The SP-Dec05 and SP-Dec06 datasets included a total of 8,987 and 2,008 protein variants, respectively.
- Functional annotation sets
Four different testing sets were selected to evaluate the accuracy of the AnnoLite and AnnoLyze programs: a) a set of non-redundant functionally annotated chains (annotation), b) a set of non-redundant protein structures co-crystallized with small ligands (ligand), c) a set of non-redundant protein structures co-crystallized with other protein structures or domains (partner), and d) a set of non-redundant protein structures for localizing binding sites (localize).
- MODPIPE set
A total of 168,632 comparative models were calculated by our automated comparative modeling protocol MODPIPE for the PDB-select40 list (6,877 sequences as of March 2005). All models shorter than 100 residues or larger than 250 residues were removed from the testing set. This length restriction reduced the set size to 80,593 models for 4,011 different sequences. The RMSD binning of the models in the MODPIPE set shows that ~5% of models are within 1 A RMSD to the native structure (very good models), ~13% are within 1-3A RMSD (good models), ~20% are within the RMSD range 3-5A (acceptable models), and ~62% superimpose to the native structure with an RMSD >5A (bad models). All scores for models in this set generated for the SVMod paper can be found here (~31Mb).
- MOULDER set
Twenty target/template pairs of protein sequences with known structures ranging from 81 to 340 residues in length were randomly selected from the Fischer set of remotely related homologs. The 20 targets do not share significant structural similarity to each other. For each of the 20 targets, the structural template specified by the Fischer set was used as the template. The target-template alignments were obtained using MOULDER (see above) with MODELLER to create 300 different target-template alignments. The 300 alignments uniformly ranged from approximately 0 to 100% of both the native overlap and the correctly aligned positions with respect to the CE structure-based alignment. A comparative model was built from each target-template alignment using the default parameters for the model routine in MODELLER. Thus, the final decoy set consisted of a total of 300 models for each of the 20 targets. All scores for models in this set generated for the SVMod paper can be found here (~4Mb)
- IMP A program for contributing to a comprehensive structural characterization of biomolecules
- MODELLER A program for comparative modeling of protein three-dimensional structures
- ModBase A database of comparative protein structure models
- ModWeb A server for automatic comparative protein structure prediction
- LigBase A database of ligand binding proteins aligned to structural templates.
- Patcher A program for binding site localization in protein structures.
- DBAli A comprehensive database of pairwise and multiple structure alignments
c/ Baldiri Reixac, 4. PCB - Tower I, 10th floor,
Barcelona 08028, Spain ::
Tel. +34 934 033 743 ::
Fax. +34 934 037 279
2017 © SGL :: 13821 visitors :: last modified on October 15, 2015