Liife sciences data pertain to the following six themes:

 

 

Modeling and Molecular Dynamics (MD)

study of important drug targets: existing and generated datasets for further analysis, e.g. MD trajectories of oncogenic proteins with mutations relevant to the SEEM area), Workflows for simulating a protein with commonly-used codes (e.g. NAMD, GROMACS), repository of analysis tools for MD trajectories.

Computer-aided drug design (CADD)

Databases of interesting targets for CADD relevant to the SEEM region, Existing and generated datasets for further analysis (describe), Workflows for CADD with commonly used programs (e.g. Glide, Autodock), Workflows/analysis tools for post-processing CADD results.

Analysis of Next Generation DNA sequencing data

Existing and generated datasets for further analysis (e.g. database with genotype and phenotype data about patients with specific disorders), Workflows/pipelines for genomic data processing to address the identification of genetic mutations that cause rare diseases in families and of genetic variants that contribute to complex diseases such as autism and cancer, Easily-accessible workflow for non computing users to perform analysis, Workflows to identify the thermodynamic pattern of the genome influence fundamental processes of the cell such as transcription and RNA processing, Repositories of analysis tools for NGS datasets. Regarding the analysis of Next Generation DNA sequencing data, it will be ensured that all data follow the EU guidelines and regulations regarding anonymization and sharing of data:

  1. “Practice Guidelines for the Evaluation of Pathogenicity and the Reporting of Sequence Variants in Clinical Molecular Genetics”, Association for Clinical Genetic Science (2013)
  2. “Practice guidelines for Targeted Next Generation Sequencing Analysis and Interpretation”, Association for Clinical Genetic Science (2015)
  3. “Guidelines for diagnostic next-generation sequencing”, European Journal of Human Genetics (2016) 24, 2–5
 

Synchrotron data analysis

Existing and generated datasets for further analysis, Workflows for determining protein structures using infrared microspectroscopy and bioinformatics software, Workflows for processing large structural data segments, Repository of analysis tools for processing large structural data segments.

Image processing for biological applications

Workflow for image processing for biological applications, Databases of medically- relevant images

Computational simulation of DNA and RNA

existing and generated datasets for further analysis, Workflows for simulating a DNA or RNA with commonly-used codes (e.g. NAMD, GROMACS), repository of analysis tools for MD trajectories.



Available Modeling and Molecular Dynamics data

Description: This dataset contains Molecular Dynamics simulations trajectories of the dimer RAR-RXRa in the RXRa normal (wild-type) and mutated RXRa S427F form. The input and output files are in GROMACS format.

Use Licence: Free upon contacting the authors at This email address is being protected from spambots. You need JavaScript enabled to view it.
Contact: Zoe Cournia, Ioannis Galdadas
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Description: This dataset contains Molecular Dynamics simulations trajectories of the tetramer RXRa in the RXRa normal (wild-type) and three variations of the mutated RXRa S427F forms. The input and output files are in GROMACS format.

Access datasets

Charge: Free upon contacting the authors at This email address is being protected from spambots. You need JavaScript enabled to view it.
Contact: Zoe Cournia, Ioannis Galdadas
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

Access datasets

Charge: CC0 1.0 Universal
Contact: Bigovic, Miljan - Zecevic, Zarko

Description: Retinoic acid receptors (RARs) and Retinoid X nuclear receptors (RXRs) are ligand-dependent transcriptional modulators that execute their biological action through the generation of functional heterodimers. RXR acts as an obligate dimer partner in many signalling pathways, gene regulation by rexinoids depending on the ligated state of the specific heterodimeric partner. One of these dimers is formed with the retinoic acid receptor (RAR) or with , which is a type of nuclear receptor, which acts as a transcription factor. The dimer formation is protective in cancer. A single point mutation on RXRa, S427F, which is found in 5% of patients with bladder cancer, is located exactly at the dimerization interface; however its mechanism of action is unknown. To address the question of the effect of mutation on the dimerization of RXRa and RAR, we performed MD simulations to understand how the change of serine to phenyalanine at position 427 is suppressing the dimer function and is thus implicated in cancer.
This dataset contains Molecular Dynamics simulations trajectories of the normal (wild-type, WT) PI3Ka and the mutant H1047R PI3Ka. We provide two different datasets:\n1) MD simulations of only the catalytic subunit of PI3Ka (p110a), and without the regulatory subunit (p85a), in the WT and mutant H1047R forms (PI3Ka-with-p110a-without-p85a. These simulations are given in five independent replicate trajectories. The input and output files are in NAMD format. These simulations are described in Gkeka et al:\nhttp://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003895\n2) MD simulations of both the catalytic subunit of PI3Ka (p110a) and the regulatory subunit (p85a), in the WT and mutant H1047R forms (PI3Ka-with-p110a-with-p85a). One simulation set is performed in complex with the PI3Ka inhibitor, PIK-108, and another simulation set without an inhibitor (apo form). The apo forms are provided in both double and single precision for comparison. These simulations are described in Gkeka et al:\nhttp://pubs.acs.org/doi/abs/10.1021/jp506423e The input and output files are in GROMACS format.

Access datasets

Charge: Free upon citing the publications\nhttp://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003895\nhttp://pubs.acs.org/doi/abs/10.1021/jp506423e
Contact: Zoe Cournia, Paraskevi Gkeka

Description: This dataset contains Molecular Dynamics simulations trajectories of the drug gemcitabine inside the pores of Metal-Organic FrameworksIRMOF-74-III and the functionalized OH-IRMOF-74-III. The input and output files are in GROMACS format.

Access datasets

Charge: Free with citation of publication
Contact: Zoe Cournia, Ioannis Galdadas

Description: This dataset contains Molecular Dynamics simulations trajectories of the normal (wild-type, WT) PI3Ka and the mutant H1047R PI3Ka. We provide two different datasets: 1) MD simulations of only the catalytic subunit of PI3Ka (p110a), and without the regulatory subunit (p85a), in the WT and mutant H1047R forms (PI3Ka-with-p110a-without-p85a. These simulations are given in five independent replicate trajectories. The input and output files are in NAMD format. These simulations are described in Gkeka et al: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003895. 2) MD simulations of both the catalytic subunit of PI3Ka (p110a) and the regulatory subunit (p85a), in the WT and mutant H1047R forms (PI3Ka-with-p110a-with-p85a). One simulation set is performed in complex with the PI3Ka inhibitor, PIK-108, and another simulation set without an inhibitor (apo form). The apo forms are provided in both double and single precision for comparison. These simulations are described in Gkeka et al: http://pubs.acs.org/doi/abs/10.1021/jp506423e. The input and output files are in GROMACS format.

Access datasets

Charge: Free upon citing the publications http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003895 http://pubs.acs.org/doi/abs/10.1021/jp506423e
Contact: Zoe Cournia, Paraskevi Gkeka

Description: This dataset contains Molecular Dynamics simulations trajectories of the full length E545K oncogenic mutant of PI3Ka. These simulations are provided in four independent replicate trajectories (Run 1 and Replicate 1,2,3). We also provide an MD simulation of an artificial mutant where residues E545, K382, Q546, E542, R358 have been all mutated to alanine (ALA-mut.tar).

Access datasets

Charge: Free upon contacting the authors
Contact: Hari Leontiadou, Ioannis Galdadas, Zoe Cournia

Description: This dataset contains Molecular Dynamics simulations trajectories of the full length Wild-Type (normal) PI3Ka. These simulations are provided in two independent replicate trajectories (Run 1 and Replicate 1).

Access datasets

Charge: Free upon contacting the authors
Contact: Hari Leontiadou, Ioannis Galdadas, Zoe Cournia

Available Biological Computational simulation of DNA and RNA data

In life sciences section, the data will pertain to the six themes described in the initial application:

  1. Modeling and Molecular Dynamics (MD) study of important drug targets: existing and generated datasets for further analysis, e.g. MD trajectories of oncogenic proteins with mutations relevant to the SEEM area), Workflows for simulating a protein with commonly-used codes (e.g. NAMD, GROMACS), repository of analysis tools for MD trajectories.
  2. Computer-aided drug design (CADD): Databases of interesting targets for CADD relevant to the SEEM region, Existing and generated datasets for further analysis (describe), Workflows for CADD with commonly used programs (e.g. Glide, Autodock), Workflows/analysis tools for post-processing CADD results. 
  3. Analysis of Next Generation DNA sequencing data: Existing and generated datasets for further analysis (e.g. database with genotype and phenotype data about patients with specific disorders), Workflows/pipelines for genomic data processing to address the identification of genetic mutations that cause rare diseases in families and of genetic variants that contribute to complex diseases such as autism and cancer, Easily-accessible workflow for non computing users to perform analysis, Workflows to identify the thermodynamic pattern of the genome influence fundamental processes of the cell such as transcription and RNA processing, Repositories of analysis tools for NGS datasets. Regarding the analysis of Next Generation DNA sequencing data, it will be ensured that all data follow the EU guidelines and regulations regarding anonymization and sharing of data:
    1. “Practice Guidelines for the Evaluation of Pathogenicity and the Reporting of Sequence Variants in Clinical Molecular Genetics”, Association for Clinical Genetic Science (2013)
    2. “Practice guidelines for Targeted Next Generation Sequencing Analysis and Interpretation”, Association for Clinical Genetic Science (2015)
    3. “Guidelines for diagnostic next-generation sequencing”, European Journal of Human Genetics (2016) 24, 2–5
     
  4. Synchrotron data analysis: Existing and generated datasets for further analysis, Workflows for determining protein structures using infrared microspectroscopy and bioinformatics software, Workflows for processing large structural data segments, Repository of analysis tools for processing large structural data segments. 
  5. Image processing for biological applications: Workflow for image processing for biological applications, Databases of medically- relevant images
  6. Computational simulation of DNA and RNA: existing and generated datasets for further analysis, Workflows for simulating a DNA or RNA with commonly-used codes (e.g. NAMD, GROMACS), repository of analysis tools for MD trajectories.

 

It will be ensured that all data are non-confidential after agreement with the research groups that will provide it.

 

 

Application Acronym

Regional Community Dataset

Level of preservation

Level of access

Data Type/Format

MD-Sim

MD trajectories of oncogenic proteins with mutations relevant to the SEEM area

Medium term

Open

Simulation

DICOMNetwork

Generalized statistical datasets. Patient dataset available after special permission or relevant anonymization of data.

Long term

Restricted

Image / XML, JSON

CNCADD

Produce and share parameter sets relevant to the community

Long term

Open

Simulation

PSOMI

Datasets with molecule synthesis results.

Medium term

Open

Experimental / PDB, GRO, NAMD, PSF, PDF

SQP-IRS

Biological dataset. Computational vs. Experimental database for proteins secondary structures. Crystallographic vs. Spectroscopic database for selected targeted proteins

Medium term

Restricted

Simulation and experimental

THERMOGENOME

Datasets with data for thermodynamic stability of RNA/DNA and DNA/DNA duplexes for all transcripts, exons, introns, 5-UTRs, 3-UTRs for Homo sapiens (human), A. thaliana, C. elegans, D. melanogaster, D. rerio.

Long term

Restricted

Experimental

MDSMS

The molecular dynamics simulation of mixed systems

Medium term

Open

Simulation / PDB, TRR, XTC, DCD