Software developed in our group

Software :: Databases :: Servers :: Datasets :: Contributions

Software

TADbit
TADbit is a Python library that deals with 3C-based data to detect Topologically Associating Domains (TADs), model their 3D structure and analyze them.
TADdyn
TADdyn is a Python library that allows to model and explore single or time-series 3C-based data.
TADpole
TADpole is a computational tool designed to identify and analyze the entire hierarchy of topologically associated domains (TADs) in intra-chromosomal interaction matrices.
TADkit
TADkit is a visualizer for 3D genomic data that helps researchers to easy annotate 3D models generated by TADbit.
Binless
Resolution-independent normalization of Hi-C data..

Databases

TCAMS-TB dataset
The TCAMS-TB dataset includes a series of GSK compounds and its predicted targets in Mycobacterium tuberculosis. This effort was part of the Tropical Disease Initiative.
TDI Kernel
The TDI Kernel is a database of protein structure models and predicted binding sites of small molecules for 10 Tropical Disease Genomes. This effort was part of the Tropical Disease Initiative.
Omidios (a.k.a. SeqProfCod)
Omidios, a database of pre-caluclated likely impact of a Single Nucleotide Polymorphism in the human genome.

Servers

SARA (now hosted at the Capriotti Lab)
SARA (Structure Alignment of Ribonucleic Acids) is a fully automated method for aligning two RNA structures by using a unit-vector root mean square (URMS) strategy.
BBibTeX
BBibTeX is a PHP script that helps in browsing and managing your BibTeX files as a MySQL database.

Data sets

Toy models for 3DGenomics assessment
All data files supplementary data for the article Trussart et al 2015 in NAR

RNA structure sets
Series of RNA datasets used to develop and benchmark the SARA program.

	RNA chains	Alignments
PDBNov06	2,179	--
NR95	277	38,226
NR95-HR	51	1,275
NR95-SCOR	60	1,770
OPT	141	300
RAND	300	44,850
RNA09	451	101,475
BgALI	451	50,995
FSCOR	419	-
R-FSCOR	192	-
T-FSCOR	227	-

Series of RNA datasets used for the RNA structural space analysis.

	RNA chains	Alignments
RNA09	451	101,475
NR-RNA09	451	50,995
HA-RNA09	114	589
FSCOR	419	87,571

Additionally, an initial version of the set of alignments and their derived probability density functions for MODELLER can be donwloaded here. Users should be aware that such files will most likely change in a near future.

EvP models sets
The evaluation of the EvPs in model assessment was based on a set of 4,444 structural models divided in 1,877 correct and 2,567 incorrect models. A correct model was defined as a model that superimposed at least 30% of the C-α atoms within 3.5Å to the real structure thus based on proper fold assignment and a relatively accurate sequence/structure alignment. Incorrect models (ie, superimposing less than 15% of the C-α atoms within 3.5Å) were built using a wrong fold or based on the correct fold, but containing a large fraction of misalignments. All PDB formated files for each of the models in the dataset can be donwloaded here (~67Mb)
All calculated EvPs can be downlowded here.
SNP annotation sets
Two datasets were extracted from the SWISS-PROT subset of the UniProt database. SWISS-PROT classifies protein variants as disease related (i.e., with pathological effects), polymorphism (i.e., with no effect on human health) or unclassified. The SP-Dec05 dataset was derived from the SWISS-PROT release 48 (Dec 2005) and the SP-Dec06 dataset includes only mutations from protein sequences deposited in SWISS-PROT from January to November 2006 (release 51). The SP-Dec05 and SP-Dec06 datasets included a total of 8,987 and 2,008 protein variants, respectively.
Functional annotation sets
Four different testing sets were selected to evaluate the accuracy of the AnnoLite and AnnoLyze programs: a) a set of non-redundant functionally annotated chains (annotation), b) a set of non-redundant protein structures co-crystallized with small ligands (ligand), c) a set of non-redundant protein structures co-crystallized with other protein structures or domains (partner), and d) a set of non-redundant protein structures for localizing binding sites (localize).
MODPIPE set
A total of 168,632 comparative models were calculated by our automated comparative modeling protocol MODPIPE for the PDB-select40 list (6,877 sequences as of March 2005). All models shorter than 100 residues or larger than 250 residues were removed from the testing set. This length restriction reduced the set size to 80,593 models for 4,011 different sequences. The RMSD binning of the models in the MODPIPE set shows that ~5% of models are within 1 A RMSD to the native structure (very good models), ~13% are within 1-3A RMSD (good models), ~20% are within the RMSD range 3-5A (acceptable models), and ~62% superimpose to the native structure with an RMSD >5A (bad models). All scores for models in this set generated for the SVMod paper can be found here (~31Mb).
MOULDER set
Twenty target/template pairs of protein sequences with known structures ranging from 81 to 340 residues in length were randomly selected from the Fischer set of remotely related homologs. The 20 targets do not share significant structural similarity to each other. For each of the 20 targets, the structural template specified by the Fischer set was used as the template. The target-template alignments were obtained using MOULDER (see above) with MODELLER to create 300 different target-template alignments. The 300 alignments uniformly ranged from approximately 0 to 100% of both the native overlap and the correctly aligned positions with respect to the CE structure-based alignment. A comparative model was built from each target-template alignment using the default parameters for the model routine in MODELLER. Thus, the final decoy set consisted of a total of 300 models for each of the 20 targets. All scores for models in this set generated for the SVMod paper can be found here (~4Mb)

Other contributions

IMP A program for contributing to a comprehensive structural characterization of biomolecules
MODELLER A program for comparative modeling of protein three-dimensional structures
ModBase A database of comparative protein structure models
ModWeb A server for automatic comparative protein structure prediction
LigBase A database of ligand binding proteins aligned to structural templates.
Patcher A program for binding site localization in protein structures.
DBAli A comprehensive database of pairwise and multiple structure alignments

Structural Genomics @CNAG · CRG