www.fgks.org   »   [go: up one dir, main page]

SlideShare a Scribd company logo
Quantifying the content of Biomedical Semantic
Resources as a core for Drug Discovery
Platforms
Ali Hasnain and Dietrich Rebholz-Schuhmann
May 2017
Agenda
• Introduction
• Motivation
• Ontologies
• Biomedical Ontologies
• Drugs and Chemical Compound Ontologies
• Upper level Ontologies
• Data Repositories/ Databases for Drug Discovery
• Gene, Gene Expression and Protein Databases
• Pathway databases
• Chemical and Structure Databases
• Disease Specific Databases for Prevention
• Literature databases
• Life Sciences Linked Open Data Cloud
• Linked Open Drug Data (LODD)
• Bio2RDF
• LinkedLifeData
• Related Work
• Conclusion
2
Introduction
• Biomedical data exists as ontologies, repositories, and
other open data resources e.g, Life Science Linked
Open Data (LS- LOD) relevant in the context of Drug
Discovery and Cancer Chemoprevention.
• The analysis gives an overview of which resources
have to be considered, what amount of data requires
integration and provides the opportunity to tailor
semantic solutions to specific needs in terms of size
and performance.
We live in a world of data
Motivation
4

Recommended for you

Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing

Role of bioinformatics in different steps of drug discovery and drug designing. Various tools and software used in Computer-aided drug designing.

drug designingbioinformaticscadd
Project Hippocrates
Project HippocratesProject Hippocrates
Project Hippocrates

Medical innovation calls for new models for collaborations that facilitates, government, academia and industry. Barriers to research and ultimate commercialization will be lowered by bringing best practices from industry and academic settings. Hippocrates platform facilitates early drug development extending from basic research to drug invention and commercialization significantly saving time and money. The platform is designed in such way to facilitate collaboration amongst stakeholders as well as taking advantage of the vast resources currently available on the web to generate and aggregate content based on the needs of the research of the end-user.

MURI Summer
MURI SummerMURI Summer
MURI Summer

This study aims to identify drug candidates that can be repurposed to treat three subtypes of leukemia by analyzing drug, protein, and disease interaction networks. The researchers gathered data on FDA-approved drugs and drugs in clinical trials for leukemia and related diseases. They then constructed networks showing interactions between drugs, proteins, and diseases. The top related diseases to leukemia were identified, and their associated drugs were considered candidates for repurposing. The researchers developed a website to import and analyze the collected data to identify the most suitable drug candidates based on the interaction networks.

Linked Data for Cancer Chemoprevention
• Because Biomedical Data is heterogeneous and spread
across multiple sources
5
~5 molecs testable in
the lab
~2000 small
molecs
~100 molecs
~ 10 interesting
pathways
Literature
Insilicomodels
Browsedatabases
Hypothesis
Generation
Linked Data
Heterogeneous Data – Multiple Data sources
DrugBank
DailyMed
CheBI, KEGG
Reactome
Sider
BioPax
Medicare
6
Biomedical Data Integration
nih:EGFR
epidermal growth
factor receptor
Homo
sapiens
CCCCGGCGCAGCGCGGCCGCAGCA
GCCTCCGCCCCCCGCACGGTGTGA
GCGCCCGACGCGGCCGAGGCGG …
nih:EGF
nci:has_description
nih:sequence
nih:organism
nih:interacts
nih:organism
rea:EGFR
rea:Membrane
rea:Receptor
rea:Transferase
rea:keyword
rea:keyword
rea:keyword
NCBI Reactome
sameAs
7
Ontologies
These ontologies can fall into three main categories:
1. The Biomedical ontologies are mainly used by biomedical
applications and define the basic biological structures
(e.g. genes, pathways etc).
2. The Drugs and Chemical Compound Ontologies are
related to the clinical drugs and their active ingredients.
3. The upper level ontologies describe general concepts that
many biomedical ontologies share.
8

Recommended for you

INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López

1) Personalized medicine currently faces challenges in processing large-scale genomic data, interpreting the functional effects of genomic variations, integrating systems-level data, and translating discoveries into medical practice. 2) Bioinformatics can help address these challenges through algorithms for mapping and aligning sequencing data, predicting functional effects, prioritizing genes, integrating multi-omics data into networks, and disseminating discoveries through databases to inform medical practice. 3) Fully realizing personalized medicine will require overcoming limitations of current approaches, validating computational predictions, and updating medical practice and education to routinely incorporate genomic information.

biomedical informaticsinbiomedvisionbioinformatics
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XVIUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV

1) Researchers have created a new online resource called the IUPHAR/MMV Guide to Malaria Pharmacology (GtoMPdb) to curate information on antimalarial compounds and their molecular targets in Plasmodium. 2) The database currently contains 25 Plasmodium molecular targets and 57 antimalarial ligands that were manually curated from scientific literature. 3) A new customized online portal provides open access to the antimalarial data and allows browsing by parasite lifecycle stage, target species, and other features to help malaria research.

malariapharmacologyiuphar
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...

tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals International Dave Marberg, Takeda We have used the tranSMART platform to construct a warehouse containing data from several Takeda clinical trials, proprietary preclinical drug activity studies, 1600 Gene Expression Omnibus studies, and data from TCGA, CCLE, and other sources. All gene expression data has been globally normalized. We extended the tranSMART platform with a set of R function calls to enable cross-study queries and analysis via the rich toolset available in R. The utility of the data warehouse is exemplified by a study in which we built a predictive model for drug sensitivities. The model was trained on gene expression and IC50 data from cell lines and was found to correctly predict drug activity in oncology indications.

takedamilleniumtransmart
Ontology spectrum by Jimeno et. al [1]
[1]: Antonio Jimeno-Yepes, Ernesto Jim´enez-Ruiz, Rafael Berlanga, and Dietrich Rebholz-Schuhmann. Use of shared lexical resources for efficient
ontological engineering. In Semantic Web Applications and Tools for Life Sciences Workshop (SWAT4LS). CEUR WS Proceedings, volume 435, pages 93–
136, 2008
9
Biomedical Ontologies (selected)
• Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO)
– data exchange in oncology, integration of clinical and molecular data
• Biological Pathway Exchange (BioPAX)
– metabolic, biochemical, transcription regulation, protein synthesis, signal transduction
pathways
• Experimental Factor Ontology (EFO)
– enhance and promote consistent annotation, automatic annotation to integrate external
data
• Gene Ontology (GO)
– for describing biological processes, molecular functions and cellular components of gene
products
• Medical Subject Headings (MeSH)
– hierarchical structure for indexing, cataloguing, and searching for biomedical/ health-related
data.
• Microarray Gene Expression Data Ontology (MGED)
– the biological sample, the treatment sample and the micro-array chip technology in the
experiment
• National Cancer Institute (NCI) Thesaurus
– integrates molecular and clinical cancer-related information to integrate, retrieve and relate
concepts
• Ontology for biomedical Investigations (OBI)
– designs, protocols, instrumentation, materials, processes, data in biological & biomedical
investigations 10
Drugs and Chemical Compound Ontologies (selected)
• RxNorm
– standard names for clinical drugs active drug ingredient, dosage
strength, physical form) and links
• Basic Formal Ontology (BFO)
– formalise entities such as 3D enduring objects and comprehending
processes
• OBO Relation Ontology (RO)
– formal definitions of basic relations that cross-cut the biomedical domain
• Provenance Ontology (PROVO)
– provides classes, properties and restrictions for provenance information
11
Generic and Upper Ontologies (selected)
Statistical overview of implementation details of
Ontologies (selected)
Ontology Category Year* Topic Implementation Classes Properties Individuals Depth
ACGT-MO Biomedical 2008 Cancer OWL/CVC/RDF/XML 1769 260 61 18
BioPAX Biomedical 2010 Pathways OWL/CVC/RDF/XML 68 96 0 4
EFO Biomedical 2015 Experimental Factors OWL/CVC/RDF/XML 18596 35 0 14
GO Biomedical 2016 Genomics and Proteomic OWL/CVC/RDF/XML 4419 9 0 16
MeSH Biomedical 2009 Health RDF/TTL/ CSV 252375 38 0 15
MGED Biomedical 2009 Microarray Experiment OWL/CVC/RDF/XML 233 121 698 8
NCIT Biomedical 2007 Clinical care OWL/CVC/RDF/XML 118167 173 45715 16
OBI Biomedical 2008 Experimental Data OWL/CVC/RDF/XML 2932 106 178 16
UMLS Biomedical 1993 Biomedical/ Health RDF 3221702 - - -
RxNorm Drugs 1993 Clinical Drugs OWL/CVC/RDF/XML 118555 46 0 0
BFO Generic 2003 Genuine Upper Ontology OWL/CVC/RDF/XML 35 0 0 5
RO Generic 2005
Relations used in all OBO
ontologies
OWL/CVC/RDF/XML - - - -
PROVO Generic 2012 PROV Data Model OWL/CVC/RDF/XML 30 50 4 3
*Statistics as of Aug 2016 - listed at BioPortal- Year specify the time when the last- most recent version is
produce. “-” means information not available.
12

Recommended for you

Applicationsofbioinformaticsindrugdiscoveryandprocess
ApplicationsofbioinformaticsindrugdiscoveryandprocessApplicationsofbioinformaticsindrugdiscoveryandprocess
Applicationsofbioinformaticsindrugdiscoveryandprocess

Bioinformatics plays an important role in drug discovery and development by enabling target identification, rational drug design, compound refinement, and other processes. Key applications of bioinformatics include virtual screening of large compound libraries to identify potential drug leads, homology modeling of protein structures to inform drug design, and similarity searches to find analogs of existing drug molecules. The overall drug development process involves studying the disease, identifying drug targets, designing compounds, testing and refining candidates, and conducting clinical trials. Computational techniques expedite many steps but experimental validation is still needed.

Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data

Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018 The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.

deep learningartificial intellegenceai
BCATSfinal
BCATSfinalBCATSfinal
BCATSfinal

This study aimed to identify existing drug therapies that could potentially be repurposed to treat gastric cancer. Differentially expressed genes from microarray analyses of gastric cancer tissue samples were submitted to the Connectivity Map database to identify candidate drugs that could reverse disease signatures. Vorinostat and trichostatin A, both histone deacetylase inhibitors, were predicted as potential therapies based on their expression profiles opposing those of gastric cancer tumors. Future work will integrate biological pathway knowledge to link predicted drugs to relevant pathways.

Classes vs. Properties plot (Selected Ontologies)
13
1769
260
68
96
18596
35
4419
9
252375
38
233
121
118167
173
2932
106
3221702
118555
4635 30
50
CLASSES PROPERTIES
ACGT-MO BioPAX EFO GO MeSH MGED NCIT
OBI UMLS RxNorm BFO RO PROVO
Public Data Repositories for Drug Discovery
• The databases are separated into the following
categories:
– Gene, Gene Expression and Protein Databases for
gene and protein annotations as well as the expression
levels and related clinical data.
– Pathway Databases denoting the protein interactions and
the overall functional outcomes.
– Chemical and Structure Databases including Biological
Activities for the information related to drugs and other
chemicals including also toxicity observations and clinical
trials.
– Disease Specific Databases for Prevention which
deliver content specific to the prevention of cancer.
– Literature Databases
14
Gene, Gene Expression and Protein Databases
• GenBank
– over 65 B nucleotide bases in more than 61 M sequences
• ArrayExpress
– 65060 experiments 1'973'776 assays, annotated data for gene
expression from biological experiments
• Gene Expression Omnibus (GEO)
– 3'848 datasets gene expression for specific studies
• Universal Protein Resource (UniProt)
– 63'686'057 sequences, 21'364'768'379 amino acids
classifications, cross-references, annotation of proteins
• Protein Data Bank (PDB)
– 118280 Biological Structures evidence of experimentally
validated protein structures
• Protein Database
– 30'047Protein Entries, 41'327PPIs translated coding regions
from GenBank, TPA, SwissProt, PIR, PRF, UniProt and PDB.
15
Pathway Databases
• Kyoto Encyclopedia of Genes and Genomes (KEGG)
– 432'883PathwayMaps, 153'776hierarchies, genome
sequencing and high-throughput experimental technologies
• Reactome
– 9'386 Proteins and pathway data for signalling,
transcriptional regulation, translation, apoptosis, other
• Wikipathways
– 2'475 pathways complementing e g. KEGG, Reactome,
Pathway Commons
• cPath: Pathway Database Software
– 31'698 pathways, 1'151'476 interactions, pathway
visualisation, analysis and modelling
16

Recommended for you

United States Patent Application Publication
United States Patent Application PublicationUnited States Patent Application Publication
United States Patent Application Publication

The present invention provides medicinally active extracts and fractions; and a method for preparing the same by extracting and fractioning constituents from the tissue of plant components of the Anoectochilus family. These active extracts and fractions are useful for preventing or inhibiting tumor groWth.

lan gấmung thưcỏ nhung
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development

This document discusses the role of genomics and proteomics in drug discovery and development. It explains that genomics and proteomics technologies can help identify new drug targets by comparing gene and protein expression between healthy and diseased cells. Proteomics in particular analyzes changes in protein levels and can quantify individual proteins using techniques like 2D gel electrophoresis and mass spectrometry. The integration of genomics and proteomics provides a more comprehensive understanding of biological systems and is improving the drug discovery process.

Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity

Establishing other new medical usages for already known drugs, including approved drugs. Drug repurposing lies in repurposing an active pharmaceutical ingredient for a new indication that is already on the market. Drug repurposing is a promising approach and mainly applied for the treatment of both common and rare genetic diseases, and it also offers significant benefits to the pharmaceutical industries. "At its simplest, drug repurposing is taking an existing drug and seeing whether it can be used as an effective treatment for another condition.“ “Repurposing generally refers to studying drugs that are already approved to treat one disease or condition to see if they are safe and effective for treating other diseases”.

drug repurposing
Chemical and Structure Databases including
Biological Activities
• Chemical Compounds Database (Chembase)
– 150'000 pages, compounds, their physical and chemical properties, mass spectra
• Chemical Entities of Biological Interest (ChEBI)
– 48'296 compounds, natural and synthetic atom, molecule, ion, radical, conformer
• DrugBank
– 8,261 drugs, 4,164 targets, 243 Enzymes, 118 Transporters, drug (chemical,
pharmaceutical), drug target (sequence, structure, pathway)
• PubChem
– 89'124'401 Compounds, compound neighbouring, sub/superstructure, bioactivity
data
• Aggregated Computational Toxicology Resource (ACToR)
– more than 500 public source , environmental chemicals searchable by name and
structure
• ClinicalTrials
– 213'868 studies , offers information for locating clinical trials for diseases and
conditions
• TOXicology Data NETwork (TOXNET)
– toxicology, hazardous chemicals, environmental health and related areas 17
Disease Specific Databases for
Prevention• Colon Chemoprevention Agents Database (CCAD)
– 1,137 agents and literature data for colon chemoprevention in human, rats,
mice
• Dietary Supplements Labels Database
– 5'000 brands of dietary supplements to compare label ingredients in different
brands. Links to other databases such as MedlinePlus and PubMed
• REPAIRtoire Database
– DNA damage links, pathways, proteins for DNA re-pair, diseases related to
mutations
• Pubmed
– journal citations i.e. Primary source of information for bio-medical researchers
• PubMed Dietary Supplement Subset
– dietary supplement literature including vitamin, mineral, botanical/herbal
supplements
18
Literature Databases
Statistical overview of implementation details of
libraries and databases (selected)
Database Category Year* Topic Implementation Size/ Stats
PubMed Literature 1996 Biomedical Literature WebBased/ CSV 11 M Journal citations
PDSS Literature 1999 Citations of dietary supplement WebBased X
DSLD Chemoprevention 2013 Ingredients of dietary supplement WebBased > 5000 selected brands
ClinicalTrials Toxicity 2000 Clinical Trials WebBased 213,868 studies
TOXNET Toxicity 1987 Toxicology Database WebBased X
ACToR Compound 2008 Chemical Toxicity Data WebBased >500 public sources
DrugBank Compound 2008 Drug Data WebBased/LOD 8206 drugs
ChEBI Compound X Small Molecular entities WebBased/LOD 48,296 compounds
PubChem Compound 2004 Compound Structure WebBased/LOD 89,124,401 compounds
ChemSpider Chemical 2007 Compound Structure WebBased >40 million structures
KEGG Pathway 1995 Genomic, Chemical, systemic WebBased/LOD 432883pathway maps
Reactome Pathway 2003 Pathways WebBased 9386 proteins
Wikipathway Pathway 2007 Biological pathways WebBased 2475 pathways
cPath Pathway 2005 Biological pathways Desktop/WebBased 31698 pathways
Uniprot Protein 2002 Protein Sequence WebBased/LOD 63686057sequences
PDB Protein 1971 3D structural data of Proteins WebBased/LOD 30,047protein
*Statistics as of Aug 2016 - Year specify the time when the last- most recent version is produce. “X” means
information not available.
19
Life Sciences Linked Open Data Cloud
• Linked biomedical datasets relevant in a Cancer Chemoprevention
and drug discovery scenario:
– Linked Open Drug Data (LODD)
• Set of linked datasets relevant to Drug Discovery that includes data
from several datasets including Drugbank, LinkedCT, DailyMed,
Diseasome, SIDER, STITCH, Medicare, RxNorm, ClinicalTrials.gov,
NCBI Entrez Gene and OMIM.
– Bio2RDF
• Contains multiple linked biological databases including pathways
databases such as KEGG, PDB and several NCBIs databases. An
open-source project that uses Semantic Web technologies to build
and provide the largest network of Linked Data for the Life Sciences.
– LinkedLifeData
• A semantic data integration platform for the biomedical domain
containing 5 billion RDF statements from various sources including
UniProt, PubMed, EntrezGene and 20 more.
20

Recommended for you

Drug repurposing
Drug repurposingDrug repurposing
Drug repurposing

Drug repurposing involves finding new uses for existing drugs to treat different diseases. It provides a more efficient and lower cost alternative to traditional drug development. Computational approaches like network-based, text mining, and semantic methods are used to discover novel drug-disease relationships for drug repurposing. These include identifying modules in biological networks, propagating information across networks, extracting relationships from literature, and constructing semantic networks to predict new associations. Drug repurposing reduces costs and risks compared to de novo drug development.

drug repurposingrepurposingpharmacology
Bioinformatics in drug discovery
Bioinformatics in drug discoveryBioinformatics in drug discovery
Bioinformatics in drug discovery

INTRODUCTION A PERFECT THERAPEUTIC DRUG DRUG DISCOVERY- HISTORY MODERN DRUG DISCOVERY BIOINFORATICS IN DRUG DISCOVERY DRUG DISCOVERY BASED ON BIOINFORMATIC TOOLS BIOINFORMATICS IN COMPUTER-AIDED DRUG DISCOVERY ECONOMICS OF DRUG DISCOVERY CONCLUSION REFERENCES

bioinformatics in drug discoverybioinformaticsdrug discovery
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenzi

Phil Lorenzi discusses pathway analysis approaches and their uses in biomedical research and drug development. He compares strategies for analyzing the autophagy and apoptosis pathways, finding that integrating multiple methods provides the most comprehensive understanding. Lorenzi also provides examples of how pathway analysis could have predicted problems with COX-2 inhibitors and helped explain past failures of AKT inhibitors. He concludes that pathway analysis is consistent with approvals of EGFR, MEK, RANKL and PARP inhibitors and may support development of GLS inhibitors.

pathway studioelsevier
The Linked Open Data Cloud
“Life sciences will drive adoption of the Semantic Web, just as high-energy physics
drove the early Web.”
- Sir Tim Berners-Lee, 2005
Proteins
Molecules
Genes
Diseases
21
Meaningful Biomedical Correlation
Proteins
Molecules
Genes
Diseases
:Protein
:Molecule
:Gene
:Disease
Uniprot
PDB
Pfam PROSITE
ProDom
Uniref
UniPark Daily
medDrug
Bank ChemBL
Pub
Chem KEGG
Gene
Ontology
GeneID
Affy
metrix
Homo
gene
MGI
Disea
some
SIDER
22
Statistical overview of datasets involved in LS-
LOD, Bio2RDF and LLD (selected)
Dataset Category Year* Topic Size/ Coverage
Drugbank LODD 2010 Drugs 766920 triples, 4800 drugs
LinkedCT LODD X Clinical Trials 25 M triples, 106000 trials
DailyMed LODD 2010 Drugs 1604983 triples, >36K products
Dbpedia LODD 2009 Drugs/ Diseases/Proteins 218M triples, 2300 drugs, 2200 proteins
Diseasome LODD 2010 Diseases/ Genes 91182 triples, 2600 genes
SIDER LODD 2010 Diseases/ Side Effects 192515 triples, 63K effects, 1737 genes
STITCH LODD 2010 Chemicals/ Proteins 7.5 M chemicals, 0.5 M proteins
ChEMBLE LODD 2010 Assay/ Proteins/ Organisms 130 M triples
Affymetrix Bio2RDF 2014 Microarrays 8694237 triples, 6679943 entities
BioModels Bio2RDF 2014 Biological/ mathematical models 2380009 triples, 188308 entities
BioPortal Bio2RDF 2014 Biological/ biomedical entities 19920395 triples, 2199594 entities
KEGG Bio2RDF 2014 Genes 50197150 triples, 6533307 entities
PharmaG-KB Bio2RDF 2014 Genotypes/ Phenotypes 278049209 triples, 25325504 entities
PubMed Bio2RDF 2014 Citations 5005343905 triples, 412593720 entities
Taxonomy Bio2RDF 2014 Taxonomy 21310356 triples, 1147211 entities
LLD LLD 2014 Drugs, Chromosomes 10192641644 statements
*Statistics as of Aug 2016 (source DataHub) - Year specify the time when the last- most recent version is
produced. “X” means information not available.
23
Triples vs. Unique Entities (selected LS-LOD datasets)
24
86942371
6679943
2380009
188380
19920395
2199594
409942525
50061452
98835804
7337123
326720894
19768641
8801487
530538
3672531
316950
73048
6995
11663
1129
97520151
5950074
3628205
372136
7189769
869985
2323345
176579
3306107223
364255265
48781511
3110993
50197150
6533307
2174579
59776
55914
5032
7323864
305401
# OF TRIPLES # OF UNIQUE ENTITIES
[affymetrix] [biomodels] [bioportal] [chembl] [clinicaltrials] [ctd] [dbsnp] [drugbank] [genage] [gendr]
[goa] [hgnc] [homologene] [interpro] [iproclass] [irefindex] [kegg] [linkedspl] [lsr] [mesh]

Recommended for you

In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases

Neglected and rare diseases traditionally have not been the focus of large pharmaceutical company research as biotech and academia have primarily been involved in drug discovery efforts for such diseases. This area certainly represents a new opportunity as the pharmaceutical industry investigates new markets. One approach to speed up drug discovery is to examine new uses for existing approved drugs; this is termed drug repositioning or drug repurposing and has become increasingly popular in recent years. Analysis of the literature reveals that using high-throughput screening there have been many examples of FDA approved drugs found to be active against additional targets that can be used to therapeutic advantage for repositioning for other diseases. To date there are far fewer such examples where in silico approaches have allowed for the derivation of new uses. It is suggested that with current technologies and databases of chemical compounds (drugs) and related data, as well as close integration with in vitro screening data, improved opportunities for drug repurposing will emerge. In this publication a review of the literature will highlight several proof of principle examples from areas such as finding new inhibitors for drug transporters with 3D pharmacophores and uncovering molecules active against Mycobacterium tuberculosis (Mtb) using Bayesian models of compound libraries. Research into neglected or rare/orphan diseases can likely benefit from in silico drug repositioning approaches and accelerate drug discovery for these diseases.

drug discoverydrug repositioningneglected diseases
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development

The document discusses the process of drug discovery, including target selection, lead discovery, medicinal chemistry, in vitro and in vivo studies, and clinical trials. Target selection involves identifying cellular or genetic targets involved in disease through techniques like genomics, proteomics, and bioinformatics. Lead discovery focuses on identifying small molecule modulators of protein function through methods like synthesis, combinatorial chemistry, assay development, and high-throughput screening. Medicinal chemistry then works to optimize these leads. [/SUMMARY]

drug discoveryclinical trialdrug development
Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...

Presented at the British Pharmacological Society Focused meeting in April 2015, this poster summarises the current coverage of our curation of enzyme drug targets and supplements our previous poster covering this target class

drug targetsenzymesdatabase
[2]: Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer
Chemoprevention linked-data sources. Semantic Web (2013)
[3]: Hasnain, A.’ et al.: Linked Biomedical Dataspace: Lessons Learned integrating Data for Drug Discovery. In:
International Semantic Web Conference (In-Use Track), October 2014 (2014)
Related Work (selected)
• Zeginis et al. [2] proposed “meet-in-the-middle” approach to develop
the semantic model relevant for cancer chemoprevention. Relevant
data was analysed in a bottom-up fashion from analysing the
domain whereas a top-down approach was considered to collect
ontologies, vocabularies and data models.
• Hasnain et al. [3] proposed Linked Biomedical Dataspace (access
and use biomedical resources relevant for cancer chemoprevention)
with components namely:
– a) knowledge extraction,
– b) link creation,
– c) query execution and
– d) knowledge publishing.
25
Conclusion
• In this paper we introduce and classify different tiers of biomedical Data
relevant to Cancer Chemoprevention and Drug Discovery domain.
• This involves Ontologies, databases and Life Science Linked Open Data in
Healthcare, Life Sciences and Biomedical Domain
• We classify ontologies into three main classes:
– i) biomedical Ontologies (e.g. EFO, OBI, GO etc),
– ii) Drugs and Chemical Compound Ontologies (e.g. RxNorm) and
– iii) Generic and Upper Ontologies (e.g. BFO, RO, PROV).
• Similarly we categorise libraries and databases in five categories:
– (i) Gene, Gene Expression and Protein Databases,
– (ii) Pathway databases,
– (iii) Chemical and Structure Databases including Biological Activities,
– (iv) Disease Specific Databases for Prevention, and
– (v) Literature databases.
26
Thank you

More Related Content

What's hot

ISO 20428 Intro
ISO 20428 IntroISO 20428 Intro
ISO 20428 Intro
Soo-Yong Shin
 
IUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to PharmacologyIUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to Pharmacology
Guide to PHARMACOLOGY
 
Thesis Defence 05-26-2016
Thesis Defence 05-26-2016Thesis Defence 05-26-2016
Thesis Defence 05-26-2016
Alexandre Borrel
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
W Roseybala Devi
 
Project Hippocrates
Project HippocratesProject Hippocrates
MURI Summer
MURI SummerMURI Summer
MURI Summer
Zachary East
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision
 
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XVIUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
Guide to PHARMACOLOGY
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
David Peyruc
 
Applicationsofbioinformaticsindrugdiscoveryandprocess
ApplicationsofbioinformaticsindrugdiscoveryandprocessApplicationsofbioinformaticsindrugdiscoveryandprocess
Applicationsofbioinformaticsindrugdiscoveryandprocess
jaidev53ster
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
Joel Saltz
 
BCATSfinal
BCATSfinalBCATSfinal
BCATSfinal
Shruti Marwaha
 
United States Patent Application Publication
United States Patent Application PublicationUnited States Patent Application Publication
United States Patent Application Publication
Cây thuốc Việt
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
SuchittaU
 
Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity
Dr Seema Kohli
 
Drug repurposing
Drug repurposingDrug repurposing
Drug repurposing
Aaqib Naseer
 
Bioinformatics in drug discovery
Bioinformatics in drug discoveryBioinformatics in drug discovery
Bioinformatics in drug discovery
KAUSHAL SAHU
 
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenzi
Ann-Marie Roche
 
In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
rahul_pharma
 

What's hot (20)

ISO 20428 Intro
ISO 20428 IntroISO 20428 Intro
ISO 20428 Intro
 
IUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to PharmacologyIUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to Pharmacology
 
Thesis Defence 05-26-2016
Thesis Defence 05-26-2016Thesis Defence 05-26-2016
Thesis Defence 05-26-2016
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
Project Hippocrates
Project HippocratesProject Hippocrates
Project Hippocrates
 
MURI Summer
MURI SummerMURI Summer
MURI Summer
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XVIUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
 
Applicationsofbioinformaticsindrugdiscoveryandprocess
ApplicationsofbioinformaticsindrugdiscoveryandprocessApplicationsofbioinformaticsindrugdiscoveryandprocess
Applicationsofbioinformaticsindrugdiscoveryandprocess
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
 
BCATSfinal
BCATSfinalBCATSfinal
BCATSfinal
 
United States Patent Application Publication
United States Patent Application PublicationUnited States Patent Application Publication
United States Patent Application Publication
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
 
Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity
 
Drug repurposing
Drug repurposingDrug repurposing
Drug repurposing
 
Bioinformatics in drug discovery
Bioinformatics in drug discoveryBioinformatics in drug discovery
Bioinformatics in drug discovery
 
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenzi
 
In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 

Similar to Quantifying the content of biomedical semantic resources as a core for drug discovery platforms

Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Guide to PHARMACOLOGY
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
BigData_Europe
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
ogishima
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
Yannick Pouliot
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
Sachin Kumar
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Chris Southan
 
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesPredicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Aureus Sciences
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
open_phacts
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Remedy Informatics
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
UpendraSharmaUS1
 
iOMICS Clinical & Omnia
iOMICS Clinical & OmniaiOMICS Clinical & Omnia
iOMICS Clinical & Omnia
InterpretOmics
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciences
Flávio Codeço Coelho
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
Rick Silva
 
Introduction to bioinformatics.pptx
Introduction to bioinformatics.pptxIntroduction to bioinformatics.pptx
Introduction to bioinformatics.pptx
MortezaGhandadi1
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
ramadevi824914
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
Oramadevi1
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Vaticle
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
AkanshaChauhan15
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
JesminBinti
 

Similar to Quantifying the content of biomedical semantic resources as a core for drug discovery platforms (20)

Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
 
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesPredicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
iOMICS Clinical & Omnia
iOMICS Clinical & OmniaiOMICS Clinical & Omnia
iOMICS Clinical & Omnia
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciences
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
 
Introduction to bioinformatics.pptx
Introduction to bioinformatics.pptxIntroduction to bioinformatics.pptx
Introduction to bioinformatics.pptx
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 

More from Syed Muhammad Ali Hasnain

Fair data vs 5 star open data final
Fair data vs 5 star open data finalFair data vs 5 star open data final
Fair data vs 5 star open data final
Syed Muhammad Ali Hasnain
 
SHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow ProvenanceSHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow Provenance
Syed Muhammad Ali Hasnain
 
SHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenanceSHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenance
Syed Muhammad Ali Hasnain
 
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Syed Muhammad Ali Hasnain
 
An Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between GenesAn Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between Genes
Syed Muhammad Ali Hasnain
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
Syed Muhammad Ali Hasnain
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
Syed Muhammad Ali Hasnain
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Syed Muhammad Ali Hasnain
 
Improving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data CloudImproving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data Cloud
Syed Muhammad Ali Hasnain
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web Technologies
Syed Muhammad Ali Hasnain
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
Syed Muhammad Ali Hasnain
 

More from Syed Muhammad Ali Hasnain (11)

Fair data vs 5 star open data final
Fair data vs 5 star open data finalFair data vs 5 star open data final
Fair data vs 5 star open data final
 
SHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow ProvenanceSHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow Provenance
 
SHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenanceSHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenance
 
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
 
An Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between GenesAn Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between Genes
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Improving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data CloudImproving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data Cloud
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web Technologies
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 

Recently uploaded

BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
alishyt102010
 
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDYSUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
Dr Kirpa Ram Jangra
 
Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)
saloniswain225
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Sérgio Sacani
 
degree Certificate of Aston University
degree Certificate of Aston Universitydegree Certificate of Aston University
degree Certificate of Aston University
ebgyz
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Sérgio Sacani
 
Science grade 09 Lesson1-2 NLC-pptx.pptx
Science grade 09 Lesson1-2 NLC-pptx.pptxScience grade 09 Lesson1-2 NLC-pptx.pptx
Science grade 09 Lesson1-2 NLC-pptx.pptx
JoanaBanasen1
 
lipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptxlipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptx
muralinath2
 
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Ashkbiz Danehkar
 
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
University of Maribor
 
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
JoanaBanasen1
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Hossein Fani
 
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of AngiospermsDalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
Gurjant Singh
 
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDEANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
RanjithaSL
 
CONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLCCONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLC
ROLANARIBATO3
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
James AH Campbell
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
WALTONMARBRUCAL
 
Keys of Identification for Indian Wood: A Seminar Report
Keys of Identification for Indian Wood: A Seminar ReportKeys of Identification for Indian Wood: A Seminar Report
Keys of Identification for Indian Wood: A Seminar Report
Gurjant Singh
 
Protein Digestion123334444556678890.pptx
Protein Digestion123334444556678890.pptxProtein Digestion123334444556678890.pptx
Protein Digestion123334444556678890.pptx
muralinath2
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
Sérgio Sacani
 

Recently uploaded (20)

BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptxBIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
BIOPHYSICS Interactions of molecules in 3-D space-determining binding and.pptx
 
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDYSUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
SUBJECT SPECIFIC ETHICAL ISSUES IN STUDY
 
Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)Properties of virus(Ultrastructure and types of virus)
Properties of virus(Ultrastructure and types of virus)
 
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...
 
degree Certificate of Aston University
degree Certificate of Aston Universitydegree Certificate of Aston University
degree Certificate of Aston University
 
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetHydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanet
 
Science grade 09 Lesson1-2 NLC-pptx.pptx
Science grade 09 Lesson1-2 NLC-pptx.pptxScience grade 09 Lesson1-2 NLC-pptx.pptx
Science grade 09 Lesson1-2 NLC-pptx.pptx
 
lipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptxlipids_233455668899076544553879848657.pptx
lipids_233455668899076544553879848657.pptx
 
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
Unveiling the Stability of Supermassive Black Hole Spin: Principal Component ...
 
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...
 
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxScieNCE grade 08 Lesson 1 and 2 NLC.pptx
ScieNCE grade 08 Lesson 1 and 2 NLC.pptx
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
 
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of AngiospermsDalghren, Thorne and Stebbins System of Classification of Angiosperms
Dalghren, Thorne and Stebbins System of Classification of Angiosperms
 
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDEANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
ANTIGENS_.pptx ( Ranjitha SL) PRESENTATION SLIDE
 
CONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLCCONSOLSCI8_Lesson1. presentation for NLC
CONSOLSCI8_Lesson1. presentation for NLC
 
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...
 
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptxSCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
SCIENCEgfvhvhvkjkbbjjbbjvhvhvhvjkvjvjvjj.pptx
 
Keys of Identification for Indian Wood: A Seminar Report
Keys of Identification for Indian Wood: A Seminar ReportKeys of Identification for Indian Wood: A Seminar Report
Keys of Identification for Indian Wood: A Seminar Report
 
Protein Digestion123334444556678890.pptx
Protein Digestion123334444556678890.pptxProtein Digestion123334444556678890.pptx
Protein Digestion123334444556678890.pptx
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
 

Quantifying the content of biomedical semantic resources as a core for drug discovery platforms

  • 1. Quantifying the content of Biomedical Semantic Resources as a core for Drug Discovery Platforms Ali Hasnain and Dietrich Rebholz-Schuhmann May 2017
  • 2. Agenda • Introduction • Motivation • Ontologies • Biomedical Ontologies • Drugs and Chemical Compound Ontologies • Upper level Ontologies • Data Repositories/ Databases for Drug Discovery • Gene, Gene Expression and Protein Databases • Pathway databases • Chemical and Structure Databases • Disease Specific Databases for Prevention • Literature databases • Life Sciences Linked Open Data Cloud • Linked Open Drug Data (LODD) • Bio2RDF • LinkedLifeData • Related Work • Conclusion 2
  • 3. Introduction • Biomedical data exists as ontologies, repositories, and other open data resources e.g, Life Science Linked Open Data (LS- LOD) relevant in the context of Drug Discovery and Cancer Chemoprevention. • The analysis gives an overview of which resources have to be considered, what amount of data requires integration and provides the opportunity to tailor semantic solutions to specific needs in terms of size and performance.
  • 4. We live in a world of data Motivation 4
  • 5. Linked Data for Cancer Chemoprevention • Because Biomedical Data is heterogeneous and spread across multiple sources 5 ~5 molecs testable in the lab ~2000 small molecs ~100 molecs ~ 10 interesting pathways Literature Insilicomodels Browsedatabases Hypothesis Generation Linked Data
  • 6. Heterogeneous Data – Multiple Data sources DrugBank DailyMed CheBI, KEGG Reactome Sider BioPax Medicare 6
  • 7. Biomedical Data Integration nih:EGFR epidermal growth factor receptor Homo sapiens CCCCGGCGCAGCGCGGCCGCAGCA GCCTCCGCCCCCCGCACGGTGTGA GCGCCCGACGCGGCCGAGGCGG … nih:EGF nci:has_description nih:sequence nih:organism nih:interacts nih:organism rea:EGFR rea:Membrane rea:Receptor rea:Transferase rea:keyword rea:keyword rea:keyword NCBI Reactome sameAs 7
  • 8. Ontologies These ontologies can fall into three main categories: 1. The Biomedical ontologies are mainly used by biomedical applications and define the basic biological structures (e.g. genes, pathways etc). 2. The Drugs and Chemical Compound Ontologies are related to the clinical drugs and their active ingredients. 3. The upper level ontologies describe general concepts that many biomedical ontologies share. 8
  • 9. Ontology spectrum by Jimeno et. al [1] [1]: Antonio Jimeno-Yepes, Ernesto Jim´enez-Ruiz, Rafael Berlanga, and Dietrich Rebholz-Schuhmann. Use of shared lexical resources for efficient ontological engineering. In Semantic Web Applications and Tools for Life Sciences Workshop (SWAT4LS). CEUR WS Proceedings, volume 435, pages 93– 136, 2008 9
  • 10. Biomedical Ontologies (selected) • Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO) – data exchange in oncology, integration of clinical and molecular data • Biological Pathway Exchange (BioPAX) – metabolic, biochemical, transcription regulation, protein synthesis, signal transduction pathways • Experimental Factor Ontology (EFO) – enhance and promote consistent annotation, automatic annotation to integrate external data • Gene Ontology (GO) – for describing biological processes, molecular functions and cellular components of gene products • Medical Subject Headings (MeSH) – hierarchical structure for indexing, cataloguing, and searching for biomedical/ health-related data. • Microarray Gene Expression Data Ontology (MGED) – the biological sample, the treatment sample and the micro-array chip technology in the experiment • National Cancer Institute (NCI) Thesaurus – integrates molecular and clinical cancer-related information to integrate, retrieve and relate concepts • Ontology for biomedical Investigations (OBI) – designs, protocols, instrumentation, materials, processes, data in biological & biomedical investigations 10
  • 11. Drugs and Chemical Compound Ontologies (selected) • RxNorm – standard names for clinical drugs active drug ingredient, dosage strength, physical form) and links • Basic Formal Ontology (BFO) – formalise entities such as 3D enduring objects and comprehending processes • OBO Relation Ontology (RO) – formal definitions of basic relations that cross-cut the biomedical domain • Provenance Ontology (PROVO) – provides classes, properties and restrictions for provenance information 11 Generic and Upper Ontologies (selected)
  • 12. Statistical overview of implementation details of Ontologies (selected) Ontology Category Year* Topic Implementation Classes Properties Individuals Depth ACGT-MO Biomedical 2008 Cancer OWL/CVC/RDF/XML 1769 260 61 18 BioPAX Biomedical 2010 Pathways OWL/CVC/RDF/XML 68 96 0 4 EFO Biomedical 2015 Experimental Factors OWL/CVC/RDF/XML 18596 35 0 14 GO Biomedical 2016 Genomics and Proteomic OWL/CVC/RDF/XML 4419 9 0 16 MeSH Biomedical 2009 Health RDF/TTL/ CSV 252375 38 0 15 MGED Biomedical 2009 Microarray Experiment OWL/CVC/RDF/XML 233 121 698 8 NCIT Biomedical 2007 Clinical care OWL/CVC/RDF/XML 118167 173 45715 16 OBI Biomedical 2008 Experimental Data OWL/CVC/RDF/XML 2932 106 178 16 UMLS Biomedical 1993 Biomedical/ Health RDF 3221702 - - - RxNorm Drugs 1993 Clinical Drugs OWL/CVC/RDF/XML 118555 46 0 0 BFO Generic 2003 Genuine Upper Ontology OWL/CVC/RDF/XML 35 0 0 5 RO Generic 2005 Relations used in all OBO ontologies OWL/CVC/RDF/XML - - - - PROVO Generic 2012 PROV Data Model OWL/CVC/RDF/XML 30 50 4 3 *Statistics as of Aug 2016 - listed at BioPortal- Year specify the time when the last- most recent version is produce. “-” means information not available. 12
  • 13. Classes vs. Properties plot (Selected Ontologies) 13 1769 260 68 96 18596 35 4419 9 252375 38 233 121 118167 173 2932 106 3221702 118555 4635 30 50 CLASSES PROPERTIES ACGT-MO BioPAX EFO GO MeSH MGED NCIT OBI UMLS RxNorm BFO RO PROVO
  • 14. Public Data Repositories for Drug Discovery • The databases are separated into the following categories: – Gene, Gene Expression and Protein Databases for gene and protein annotations as well as the expression levels and related clinical data. – Pathway Databases denoting the protein interactions and the overall functional outcomes. – Chemical and Structure Databases including Biological Activities for the information related to drugs and other chemicals including also toxicity observations and clinical trials. – Disease Specific Databases for Prevention which deliver content specific to the prevention of cancer. – Literature Databases 14
  • 15. Gene, Gene Expression and Protein Databases • GenBank – over 65 B nucleotide bases in more than 61 M sequences • ArrayExpress – 65060 experiments 1'973'776 assays, annotated data for gene expression from biological experiments • Gene Expression Omnibus (GEO) – 3'848 datasets gene expression for specific studies • Universal Protein Resource (UniProt) – 63'686'057 sequences, 21'364'768'379 amino acids classifications, cross-references, annotation of proteins • Protein Data Bank (PDB) – 118280 Biological Structures evidence of experimentally validated protein structures • Protein Database – 30'047Protein Entries, 41'327PPIs translated coding regions from GenBank, TPA, SwissProt, PIR, PRF, UniProt and PDB. 15
  • 16. Pathway Databases • Kyoto Encyclopedia of Genes and Genomes (KEGG) – 432'883PathwayMaps, 153'776hierarchies, genome sequencing and high-throughput experimental technologies • Reactome – 9'386 Proteins and pathway data for signalling, transcriptional regulation, translation, apoptosis, other • Wikipathways – 2'475 pathways complementing e g. KEGG, Reactome, Pathway Commons • cPath: Pathway Database Software – 31'698 pathways, 1'151'476 interactions, pathway visualisation, analysis and modelling 16
  • 17. Chemical and Structure Databases including Biological Activities • Chemical Compounds Database (Chembase) – 150'000 pages, compounds, their physical and chemical properties, mass spectra • Chemical Entities of Biological Interest (ChEBI) – 48'296 compounds, natural and synthetic atom, molecule, ion, radical, conformer • DrugBank – 8,261 drugs, 4,164 targets, 243 Enzymes, 118 Transporters, drug (chemical, pharmaceutical), drug target (sequence, structure, pathway) • PubChem – 89'124'401 Compounds, compound neighbouring, sub/superstructure, bioactivity data • Aggregated Computational Toxicology Resource (ACToR) – more than 500 public source , environmental chemicals searchable by name and structure • ClinicalTrials – 213'868 studies , offers information for locating clinical trials for diseases and conditions • TOXicology Data NETwork (TOXNET) – toxicology, hazardous chemicals, environmental health and related areas 17
  • 18. Disease Specific Databases for Prevention• Colon Chemoprevention Agents Database (CCAD) – 1,137 agents and literature data for colon chemoprevention in human, rats, mice • Dietary Supplements Labels Database – 5'000 brands of dietary supplements to compare label ingredients in different brands. Links to other databases such as MedlinePlus and PubMed • REPAIRtoire Database – DNA damage links, pathways, proteins for DNA re-pair, diseases related to mutations • Pubmed – journal citations i.e. Primary source of information for bio-medical researchers • PubMed Dietary Supplement Subset – dietary supplement literature including vitamin, mineral, botanical/herbal supplements 18 Literature Databases
  • 19. Statistical overview of implementation details of libraries and databases (selected) Database Category Year* Topic Implementation Size/ Stats PubMed Literature 1996 Biomedical Literature WebBased/ CSV 11 M Journal citations PDSS Literature 1999 Citations of dietary supplement WebBased X DSLD Chemoprevention 2013 Ingredients of dietary supplement WebBased > 5000 selected brands ClinicalTrials Toxicity 2000 Clinical Trials WebBased 213,868 studies TOXNET Toxicity 1987 Toxicology Database WebBased X ACToR Compound 2008 Chemical Toxicity Data WebBased >500 public sources DrugBank Compound 2008 Drug Data WebBased/LOD 8206 drugs ChEBI Compound X Small Molecular entities WebBased/LOD 48,296 compounds PubChem Compound 2004 Compound Structure WebBased/LOD 89,124,401 compounds ChemSpider Chemical 2007 Compound Structure WebBased >40 million structures KEGG Pathway 1995 Genomic, Chemical, systemic WebBased/LOD 432883pathway maps Reactome Pathway 2003 Pathways WebBased 9386 proteins Wikipathway Pathway 2007 Biological pathways WebBased 2475 pathways cPath Pathway 2005 Biological pathways Desktop/WebBased 31698 pathways Uniprot Protein 2002 Protein Sequence WebBased/LOD 63686057sequences PDB Protein 1971 3D structural data of Proteins WebBased/LOD 30,047protein *Statistics as of Aug 2016 - Year specify the time when the last- most recent version is produce. “X” means information not available. 19
  • 20. Life Sciences Linked Open Data Cloud • Linked biomedical datasets relevant in a Cancer Chemoprevention and drug discovery scenario: – Linked Open Drug Data (LODD) • Set of linked datasets relevant to Drug Discovery that includes data from several datasets including Drugbank, LinkedCT, DailyMed, Diseasome, SIDER, STITCH, Medicare, RxNorm, ClinicalTrials.gov, NCBI Entrez Gene and OMIM. – Bio2RDF • Contains multiple linked biological databases including pathways databases such as KEGG, PDB and several NCBIs databases. An open-source project that uses Semantic Web technologies to build and provide the largest network of Linked Data for the Life Sciences. – LinkedLifeData • A semantic data integration platform for the biomedical domain containing 5 billion RDF statements from various sources including UniProt, PubMed, EntrezGene and 20 more. 20
  • 21. The Linked Open Data Cloud “Life sciences will drive adoption of the Semantic Web, just as high-energy physics drove the early Web.” - Sir Tim Berners-Lee, 2005 Proteins Molecules Genes Diseases 21
  • 22. Meaningful Biomedical Correlation Proteins Molecules Genes Diseases :Protein :Molecule :Gene :Disease Uniprot PDB Pfam PROSITE ProDom Uniref UniPark Daily medDrug Bank ChemBL Pub Chem KEGG Gene Ontology GeneID Affy metrix Homo gene MGI Disea some SIDER 22
  • 23. Statistical overview of datasets involved in LS- LOD, Bio2RDF and LLD (selected) Dataset Category Year* Topic Size/ Coverage Drugbank LODD 2010 Drugs 766920 triples, 4800 drugs LinkedCT LODD X Clinical Trials 25 M triples, 106000 trials DailyMed LODD 2010 Drugs 1604983 triples, >36K products Dbpedia LODD 2009 Drugs/ Diseases/Proteins 218M triples, 2300 drugs, 2200 proteins Diseasome LODD 2010 Diseases/ Genes 91182 triples, 2600 genes SIDER LODD 2010 Diseases/ Side Effects 192515 triples, 63K effects, 1737 genes STITCH LODD 2010 Chemicals/ Proteins 7.5 M chemicals, 0.5 M proteins ChEMBLE LODD 2010 Assay/ Proteins/ Organisms 130 M triples Affymetrix Bio2RDF 2014 Microarrays 8694237 triples, 6679943 entities BioModels Bio2RDF 2014 Biological/ mathematical models 2380009 triples, 188308 entities BioPortal Bio2RDF 2014 Biological/ biomedical entities 19920395 triples, 2199594 entities KEGG Bio2RDF 2014 Genes 50197150 triples, 6533307 entities PharmaG-KB Bio2RDF 2014 Genotypes/ Phenotypes 278049209 triples, 25325504 entities PubMed Bio2RDF 2014 Citations 5005343905 triples, 412593720 entities Taxonomy Bio2RDF 2014 Taxonomy 21310356 triples, 1147211 entities LLD LLD 2014 Drugs, Chromosomes 10192641644 statements *Statistics as of Aug 2016 (source DataHub) - Year specify the time when the last- most recent version is produced. “X” means information not available. 23
  • 24. Triples vs. Unique Entities (selected LS-LOD datasets) 24 86942371 6679943 2380009 188380 19920395 2199594 409942525 50061452 98835804 7337123 326720894 19768641 8801487 530538 3672531 316950 73048 6995 11663 1129 97520151 5950074 3628205 372136 7189769 869985 2323345 176579 3306107223 364255265 48781511 3110993 50197150 6533307 2174579 59776 55914 5032 7323864 305401 # OF TRIPLES # OF UNIQUE ENTITIES [affymetrix] [biomodels] [bioportal] [chembl] [clinicaltrials] [ctd] [dbsnp] [drugbank] [genage] [gendr] [goa] [hgnc] [homologene] [interpro] [iproclass] [irefindex] [kegg] [linkedspl] [lsr] [mesh]
  • 25. [2]: Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources. Semantic Web (2013) [3]: Hasnain, A.’ et al.: Linked Biomedical Dataspace: Lessons Learned integrating Data for Drug Discovery. In: International Semantic Web Conference (In-Use Track), October 2014 (2014) Related Work (selected) • Zeginis et al. [2] proposed “meet-in-the-middle” approach to develop the semantic model relevant for cancer chemoprevention. Relevant data was analysed in a bottom-up fashion from analysing the domain whereas a top-down approach was considered to collect ontologies, vocabularies and data models. • Hasnain et al. [3] proposed Linked Biomedical Dataspace (access and use biomedical resources relevant for cancer chemoprevention) with components namely: – a) knowledge extraction, – b) link creation, – c) query execution and – d) knowledge publishing. 25
  • 26. Conclusion • In this paper we introduce and classify different tiers of biomedical Data relevant to Cancer Chemoprevention and Drug Discovery domain. • This involves Ontologies, databases and Life Science Linked Open Data in Healthcare, Life Sciences and Biomedical Domain • We classify ontologies into three main classes: – i) biomedical Ontologies (e.g. EFO, OBI, GO etc), – ii) Drugs and Chemical Compound Ontologies (e.g. RxNorm) and – iii) Generic and Upper Ontologies (e.g. BFO, RO, PROV). • Similarly we categorise libraries and databases in five categories: – (i) Gene, Gene Expression and Protein Databases, – (ii) Pathway databases, – (iii) Chemical and Structure Databases including Biological Activities, – (iv) Disease Specific Databases for Prevention, and – (v) Literature databases. 26

Editor's Notes

  1. We live in a world of data!
  2. Link to next slide is –Linked data is the faclitates complex queries and workflows to be assembled
  3. To discovery which links could our datasets have to other datasources, we’ve explored what types of data are published in the linked open data cloud. What we found was a lot of messy data – looking at 8 datasets containing molecular data, their descriptions are very different; chebi calls molecules compounds, drugbank calls them drugs, dailymed calls them drugs as well but uses a different identifier. Link – how to start linking all of these datasets such that they can be made available in a unified query interface?
  4. EGFR: Epidermal growth factor receptor
  5. BioMedical Ontologies: Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO Biological Pathway Exchange (BioPAX) Experimental Factor Ontology (EFO) Gene Ontology (GO Medical Subject Headings (MeSH Microarray Gene Expression Data Ontology (MGED National Cancer Institute (NCI) Ontology for biomedical Investigations (OBI Unified Medical Language System (UMLS) Drugs and Chemical Compound Ontologies: RxNorm Generic and Upper Ontologies: Basic Formal Ontology (BFO OBO Relation Ontology (RO) Provenance Ontology (PROVO)
  6. Literature Databases: Pubmed PubMed Dietary Supplement Subset Natural Sources of Chemoprevention Agents Databases: Dietary Supplements Labels Database Toxicity and Efficacy Databases: ClinicalTrials TOXicology Data NETwork (TOXNET) Biological Activity of Compounds Databases Aggregated Computational Toxicology Resource (ACToR) DrugBank Chemical Entities of Biological Interest (ChEBI) PubChem Repartoire Database Gene Expression Databases: Cancer Gene Expression Database (CGED) ArrayExpress Gene Expression Omnibus (GEO) Gene and DNA Databases: GenBank Chemical and Physical Structure Databases: ChemSpider Chemical Compounds Database (Chembase) Sigma-Aldrich ChemDB Disease Specific Compound Databases: Colon Chemoprevention Agents Database Pathway Databases: Kyoto Encyclopedia of Genes and Genomes (KEGG) Reactome Wikipathways\footnote cPath: Pathway Database Software\footnote Protein Databases: Universal Protein Resource (UniProt) Protein Data Bank (PDB) Protein Database
  7. M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets -> once we identify these concepts, how do we actualy query them toegether?