The biomedical research community is providing large-scale data sources to enable knowledge discovery from the data alone, or from novel scientific experiments in combination with the existing knowledge.
Increasingly semantic Web technologies are being developed and used including ontologies, triple stores and combinations thereof.
The amount of data is constantly increasing as well as the complexity of data.
Since the data sources are publicly available, the amount of content can be derived giving an overview on the accessible content but also on the state of the data representation in comparison to the existing content.
For a better understanding of the existing data resources, i.e.\ judgments on the distribution of data triples across concepts, data types and primary providers, we have performed a comprehensive analysis which delivers an overview on the accessible content for semantic Web solutions.
It can be derived that the information related to genes, proteins and chemical entities form the center, whereas the content related to diseases and pathways forms a smaller portion.
Further data relates to dietary content and specific questions such as cancer prevention and toxicological effects of drugs.
Medical innovation calls for new models for collaborations that facilitates, government, academia and industry.
Barriers to research and ultimate commercialization will be lowered by bringing best practices from industry and academic settings.
Hippocrates platform facilitates early drug development extending from basic research to drug invention and commercialization significantly saving time and money.
The platform is designed in such way to facilitate collaboration amongst stakeholders as well as taking advantage of the vast resources currently available on the web to generate and aggregate content based on the needs of the research of the end-user.
This study aims to identify drug candidates that can be repurposed to treat three subtypes of leukemia by analyzing drug, protein, and disease interaction networks. The researchers gathered data on FDA-approved drugs and drugs in clinical trials for leukemia and related diseases. They then constructed networks showing interactions between drugs, proteins, and diseases. The top related diseases to leukemia were identified, and their associated drugs were considered candidates for repurposing. The researchers developed a website to import and analyze the collected data to identify the most suitable drug candidates based on the interaction networks.
INBIOMEDvision Workshop at MIE 2011. Victoria López
1) Personalized medicine currently faces challenges in processing large-scale genomic data, interpreting the functional effects of genomic variations, integrating systems-level data, and translating discoveries into medical practice.
2) Bioinformatics can help address these challenges through algorithms for mapping and aligning sequencing data, predicting functional effects, prioritizing genes, integrating multi-omics data into networks, and disseminating discoveries through databases to inform medical practice.
3) Fully realizing personalized medicine will require overcoming limitations of current approaches, validating computational predictions, and updating medical practice and education to routinely incorporate genomic information.
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
1) Researchers have created a new online resource called the IUPHAR/MMV Guide to Malaria Pharmacology (GtoMPdb) to curate information on antimalarial compounds and their molecular targets in Plasmodium.
2) The database currently contains 25 Plasmodium molecular targets and 57 antimalarial ligands that were manually curated from scientific literature.
3) A new customized online portal provides open access to the antimalarial data and allows browsing by parasite lifecycle stage, target species, and other features to help malaria research.
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals
International
Dave Marberg, Takeda
We have used the tranSMART platform to construct a warehouse containing data from several
Takeda clinical trials, proprietary preclinical drug activity studies, 1600 Gene Expression
Omnibus studies, and data from TCGA, CCLE, and other sources. All gene expression data has
been globally normalized. We extended the tranSMART platform with a set of R function calls
to enable cross-study queries and analysis via the rich toolset available in R. The utility of the
data warehouse is exemplified by a study in which we built a predictive model for drug
sensitivities. The model was trained on gene expression and IC50 data from cell lines and was
found to correctly predict drug activity in oncology indications.
Bioinformatics plays an important role in drug discovery and development by enabling target identification, rational drug design, compound refinement, and other processes. Key applications of bioinformatics include virtual screening of large compound libraries to identify potential drug leads, homology modeling of protein structures to inform drug design, and similarity searches to find analogs of existing drug molecules. The overall drug development process involves studying the disease, identifying drug targets, designing compounds, testing and refining candidates, and conducting clinical trials. Computational techniques expedite many steps but experimental validation is still needed.
Integrative Everything, Deep Learning and Streaming Data
Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018
The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.
This study aimed to identify existing drug therapies that could potentially be repurposed to treat gastric cancer. Differentially expressed genes from microarray analyses of gastric cancer tissue samples were submitted to the Connectivity Map database to identify candidate drugs that could reverse disease signatures. Vorinostat and trichostatin A, both histone deacetylase inhibitors, were predicted as potential therapies based on their expression profiles opposing those of gastric cancer tumors. Future work will integrate biological pathway knowledge to link predicted drugs to relevant pathways.
The present invention provides medicinally active extracts
and fractions; and a method for preparing the same by extracting and fractioning constituents from the tissue of plant components of the Anoectochilus family. These active extracts and fractions are useful for preventing or inhibiting tumor groWth.
Genomics and proteomics in drug discovery and development
This document discusses the role of genomics and proteomics in drug discovery and development. It explains that genomics and proteomics technologies can help identify new drug targets by comparing gene and protein expression between healthy and diseased cells. Proteomics in particular analyzes changes in protein levels and can quantify individual proteins using techniques like 2D gel electrophoresis and mass spectrometry. The integration of genomics and proteomics provides a more comprehensive understanding of biological systems and is improving the drug discovery process.
Establishing other new medical usages for already known drugs, including approved drugs.
Drug repurposing lies in repurposing an active pharmaceutical ingredient for a new indication that is already on the market.
Drug repurposing is a promising approach and mainly applied for the treatment of both common and rare genetic diseases, and it also offers significant benefits to the pharmaceutical industries.
"At its simplest, drug repurposing is taking an existing drug and seeing whether it can be used as an effective treatment for another condition.“
“Repurposing generally refers to studying drugs that are already approved to treat one disease or condition to see if they are safe and effective for treating other diseases”.
Drug repurposing involves finding new uses for existing drugs to treat different diseases. It provides a more efficient and lower cost alternative to traditional drug development. Computational approaches like network-based, text mining, and semantic methods are used to discover novel drug-disease relationships for drug repurposing. These include identifying modules in biological networks, propagating information across networks, extracting relationships from literature, and constructing semantic networks to predict new associations. Drug repurposing reduces costs and risks compared to de novo drug development.
INTRODUCTION
A PERFECT THERAPEUTIC DRUG
DRUG DISCOVERY- HISTORY
MODERN DRUG DISCOVERY
BIOINFORATICS IN DRUG DISCOVERY
DRUG DISCOVERY BASED ON BIOINFORMATIC TOOLS
BIOINFORMATICS IN COMPUTER-AIDED DRUG DISCOVERY
ECONOMICS OF DRUG DISCOVERY
CONCLUSION
REFERENCES
Phil Lorenzi discusses pathway analysis approaches and their uses in biomedical research and drug development. He compares strategies for analyzing the autophagy and apoptosis pathways, finding that integrating multiple methods provides the most comprehensive understanding. Lorenzi also provides examples of how pathway analysis could have predicted problems with COX-2 inhibitors and helped explain past failures of AKT inhibitors. He concludes that pathway analysis is consistent with approvals of EGFR, MEK, RANKL and PARP inhibitors and may support development of GLS inhibitors.
In silico repositioning of approved drugs for rare and neglected diseases
Neglected and rare diseases traditionally have not been the focus of large pharmaceutical company research as biotech and academia have primarily been involved in drug discovery efforts for such diseases. This area certainly represents a new opportunity as the pharmaceutical industry investigates new markets. One approach to speed up drug discovery is to examine new uses for existing approved drugs; this is termed drug repositioning or drug repurposing and has become increasingly popular in recent years. Analysis of the literature reveals that using high-throughput screening there have been many examples of FDA approved drugs found to be active against additional targets that can be used to therapeutic advantage for repositioning for other diseases. To date there are far fewer such examples where in silico approaches have allowed for the derivation of new uses. It is suggested that with current technologies and databases of chemical compounds (drugs) and related data, as well as close integration with in vitro screening data, improved opportunities for drug repurposing will emerge. In this publication a review of the literature will highlight several proof of principle examples from areas such as finding new inhibitors for drug transporters with 3D pharmacophores and uncovering molecules active against Mycobacterium tuberculosis (Mtb) using Bayesian models of compound libraries. Research into neglected or rare/orphan diseases can likely benefit from in silico drug repositioning approaches and accelerate drug discovery for these diseases.
The document discusses the process of drug discovery, including target selection, lead discovery, medicinal chemistry, in vitro and in vivo studies, and clinical trials. Target selection involves identifying cellular or genetic targets involved in disease through techniques like genomics, proteomics, and bioinformatics. Lead discovery focuses on identifying small molecule modulators of protein function through methods like synthesis, combinatorial chemistry, assay development, and high-throughput screening. Medicinal chemistry then works to optimize these leads. [/SUMMARY]
Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Presented at the British Pharmacological Society Focused meeting in April 2015, this poster summarises the current coverage of our curation of enzyme drug targets and supplements our previous poster covering this target class
ISO/TS 20428 defines the composition and required data fields for structured clinical genomic sequencing reports. It covers variations identified from whole genome, exome, and targeted gene panel sequencing from human samples. The standard defines the composition of a report to include a summary, detailed pages with required and optional fields, and an overall interpretation. It requires fields for patient information, sample and variant details, and recommended treatments, and defines allowable values for the overall interpretation.
Poster presented at the Elixir All-Hands Meeting in Lisbon, June 2019. Gives a broad summary of Guide to Pharmacology activities in the last year. Emphasising new tools and our extension into malaria pharmacology.
This thesis presents the development of computational methods and tools using as input three-dimensional structures data of protein-ligand complexes. The tools are useful to mine, profile and predict data from protein-ligand complexes to improve the modeling and the understanding of the protein-ligand recognition. This thesis is divided into five sub-projects. In addition, unpublished results about positioning water molecules in binding pockets are also presented. I developed a statistical model, PockDrug, which combines three properties (hydrophobicity, geometry and aromaticity) to predict the druggability of protein pockets, with results that are not dependent on the pocket estimation methods. The performance of pockets estimated on apo or holo proteins is better than that previously reported in the literature (Publication I). PockDrug is made available through a web server, PockDrug-Server (http://pockdrug.rpbs.univ-paris-diderot.fr), which additionally includes many tools for protein pocket analysis and characterization (Publication II). I developed a customizable computational workflow based on the superimposition of homologous proteins to mine the structural replacements of functional groups in the Protein Data Bank (PDB). Applied to phosphate groups, we identified a surprisingly high number of phosphate non-polar replacements as well as some mechanisms allowing positively charged replacements. In addition, we observed that ligands adopted a U-shape conformation at nucleotide binding pockets across phylogenetically unrelated proteins (Publication III). I investigated the prevalence of salt bridges at protein-ligand complexes in the PDB for five basic functional groups. The prevalence ranges from around 70% for guanidinium to 16% for tertiary ammonium cations, in this latter case appearing to be connected to a smaller volume available for interacting groups. In the absence of strong carboxylate-mediated salt bridges, the environment around the basic functional groups studied appeared enriched in functional groups with acidic properties such as hydroxyl, phenol groups or water molecules (Publication IV). I developed a tool that allows the analysis of binding poses obtained by docking. The tool compares a set of docked ligands to a reference bound ligand (may be different molecule) and provides a graphic output that plots the shape overlap and a Jaccard score based on comparison of molecular interaction fingerprints. The tool was applied to analyse the docking poses of active ligands at the orexin-1 and orexin-2 receptors found as a result of a combined virtual and experimental screen (Publication V). The review of literature focusses on protein-ligand recognition, presenting different concepts and current challenges in drug discovery.
Medical innovation calls for new models for collaborations that facilitates, government, academia and industry.
Barriers to research and ultimate commercialization will be lowered by bringing best practices from industry and academic settings.
Hippocrates platform facilitates early drug development extending from basic research to drug invention and commercialization significantly saving time and money.
The platform is designed in such way to facilitate collaboration amongst stakeholders as well as taking advantage of the vast resources currently available on the web to generate and aggregate content based on the needs of the research of the end-user.
This study aims to identify drug candidates that can be repurposed to treat three subtypes of leukemia by analyzing drug, protein, and disease interaction networks. The researchers gathered data on FDA-approved drugs and drugs in clinical trials for leukemia and related diseases. They then constructed networks showing interactions between drugs, proteins, and diseases. The top related diseases to leukemia were identified, and their associated drugs were considered candidates for repurposing. The researchers developed a website to import and analyze the collected data to identify the most suitable drug candidates based on the interaction networks.
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision
1) Personalized medicine currently faces challenges in processing large-scale genomic data, interpreting the functional effects of genomic variations, integrating systems-level data, and translating discoveries into medical practice.
2) Bioinformatics can help address these challenges through algorithms for mapping and aligning sequencing data, predicting functional effects, prioritizing genes, integrating multi-omics data into networks, and disseminating discoveries through databases to inform medical practice.
3) Fully realizing personalized medicine will require overcoming limitations of current approaches, validating computational predictions, and updating medical practice and education to routinely incorporate genomic information.
1) Researchers have created a new online resource called the IUPHAR/MMV Guide to Malaria Pharmacology (GtoMPdb) to curate information on antimalarial compounds and their molecular targets in Plasmodium.
2) The database currently contains 25 Plasmodium molecular targets and 57 antimalarial ligands that were manually curated from scientific literature.
3) A new customized online portal provides open access to the antimalarial data and allows browsing by parasite lifecycle stage, target species, and other features to help malaria research.
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...David Peyruc
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehouse for Translational Medicine at Takeda Pharmaceuticals
International
Dave Marberg, Takeda
We have used the tranSMART platform to construct a warehouse containing data from several
Takeda clinical trials, proprietary preclinical drug activity studies, 1600 Gene Expression
Omnibus studies, and data from TCGA, CCLE, and other sources. All gene expression data has
been globally normalized. We extended the tranSMART platform with a set of R function calls
to enable cross-study queries and analysis via the rich toolset available in R. The utility of the
data warehouse is exemplified by a study in which we built a predictive model for drug
sensitivities. The model was trained on gene expression and IC50 data from cell lines and was
found to correctly predict drug activity in oncology indications.
Bioinformatics plays an important role in drug discovery and development by enabling target identification, rational drug design, compound refinement, and other processes. Key applications of bioinformatics include virtual screening of large compound libraries to identify potential drug leads, homology modeling of protein structures to inform drug design, and similarity searches to find analogs of existing drug molecules. The overall drug development process involves studying the disease, identifying drug targets, designing compounds, testing and refining candidates, and conducting clinical trials. Computational techniques expedite many steps but experimental validation is still needed.
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018
The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.
This study aimed to identify existing drug therapies that could potentially be repurposed to treat gastric cancer. Differentially expressed genes from microarray analyses of gastric cancer tissue samples were submitted to the Connectivity Map database to identify candidate drugs that could reverse disease signatures. Vorinostat and trichostatin A, both histone deacetylase inhibitors, were predicted as potential therapies based on their expression profiles opposing those of gastric cancer tumors. Future work will integrate biological pathway knowledge to link predicted drugs to relevant pathways.
United States Patent Application PublicationCây thuốc Việt
The present invention provides medicinally active extracts
and fractions; and a method for preparing the same by extracting and fractioning constituents from the tissue of plant components of the Anoectochilus family. These active extracts and fractions are useful for preventing or inhibiting tumor groWth.
Genomics and proteomics in drug discovery and developmentSuchittaU
This document discusses the role of genomics and proteomics in drug discovery and development. It explains that genomics and proteomics technologies can help identify new drug targets by comparing gene and protein expression between healthy and diseased cells. Proteomics in particular analyzes changes in protein levels and can quantify individual proteins using techniques like 2D gel electrophoresis and mass spectrometry. The integration of genomics and proteomics provides a more comprehensive understanding of biological systems and is improving the drug discovery process.
Establishing other new medical usages for already known drugs, including approved drugs.
Drug repurposing lies in repurposing an active pharmaceutical ingredient for a new indication that is already on the market.
Drug repurposing is a promising approach and mainly applied for the treatment of both common and rare genetic diseases, and it also offers significant benefits to the pharmaceutical industries.
"At its simplest, drug repurposing is taking an existing drug and seeing whether it can be used as an effective treatment for another condition.“
“Repurposing generally refers to studying drugs that are already approved to treat one disease or condition to see if they are safe and effective for treating other diseases”.
Drug repurposing involves finding new uses for existing drugs to treat different diseases. It provides a more efficient and lower cost alternative to traditional drug development. Computational approaches like network-based, text mining, and semantic methods are used to discover novel drug-disease relationships for drug repurposing. These include identifying modules in biological networks, propagating information across networks, extracting relationships from literature, and constructing semantic networks to predict new associations. Drug repurposing reduces costs and risks compared to de novo drug development.
INTRODUCTION
A PERFECT THERAPEUTIC DRUG
DRUG DISCOVERY- HISTORY
MODERN DRUG DISCOVERY
BIOINFORATICS IN DRUG DISCOVERY
DRUG DISCOVERY BASED ON BIOINFORMATIC TOOLS
BIOINFORMATICS IN COMPUTER-AIDED DRUG DISCOVERY
ECONOMICS OF DRUG DISCOVERY
CONCLUSION
REFERENCES
Phil Lorenzi discusses pathway analysis approaches and their uses in biomedical research and drug development. He compares strategies for analyzing the autophagy and apoptosis pathways, finding that integrating multiple methods provides the most comprehensive understanding. Lorenzi also provides examples of how pathway analysis could have predicted problems with COX-2 inhibitors and helped explain past failures of AKT inhibitors. He concludes that pathway analysis is consistent with approvals of EGFR, MEK, RANKL and PARP inhibitors and may support development of GLS inhibitors.
Neglected and rare diseases traditionally have not been the focus of large pharmaceutical company research as biotech and academia have primarily been involved in drug discovery efforts for such diseases. This area certainly represents a new opportunity as the pharmaceutical industry investigates new markets. One approach to speed up drug discovery is to examine new uses for existing approved drugs; this is termed drug repositioning or drug repurposing and has become increasingly popular in recent years. Analysis of the literature reveals that using high-throughput screening there have been many examples of FDA approved drugs found to be active against additional targets that can be used to therapeutic advantage for repositioning for other diseases. To date there are far fewer such examples where in silico approaches have allowed for the derivation of new uses. It is suggested that with current technologies and databases of chemical compounds (drugs) and related data, as well as close integration with in vitro screening data, improved opportunities for drug repurposing will emerge. In this publication a review of the literature will highlight several proof of principle examples from areas such as finding new inhibitors for drug transporters with 3D pharmacophores and uncovering molecules active against Mycobacterium tuberculosis (Mtb) using Bayesian models of compound libraries. Research into neglected or rare/orphan diseases can likely benefit from in silico drug repositioning approaches and accelerate drug discovery for these diseases.
The document discusses the process of drug discovery, including target selection, lead discovery, medicinal chemistry, in vitro and in vivo studies, and clinical trials. Target selection involves identifying cellular or genetic targets involved in disease through techniques like genomics, proteomics, and bioinformatics. Lead discovery focuses on identifying small molecule modulators of protein function through methods like synthesis, combinatorial chemistry, assay development, and high-throughput screening. Medicinal chemistry then works to optimize these leads. [/SUMMARY]
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Guide to PHARMACOLOGY
Presented at the British Pharmacological Society Focused meeting in April 2015, this poster summarises the current coverage of our curation of enzyme drug targets and supplements our previous poster covering this target class
The document discusses online resources that can support open drug discovery systems. It outlines how pharmaceutical companies spend billions annually on R&D and how public domain data from sources like literature, patents and databases could provide high value. However, such data is difficult to integrate and navigate due to a lack of standards and interoperability between sources. The Open PHACTS project aims to address this by developing standards to semantically integrate drug discovery data from public and private sources.
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”ogishima
1) The Tohoku Medical Megabank Organization has developed an integrated database called "dbTMM" that combines health, genomic, and other data from over 1,000 participants.
2) The database aims to support genomic medicine by precisely stratifying populations using a combination of genetic, clinical, lifestyle, and other health factors.
3) Researchers can use the database to search for statistically significant differences in stratified groups and better understand targeted patient populations without reviewing raw individual data.
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
This document discusses using ontologies to semantically normalize immunological data from the Human Immune Profiling Consortium (HIPC). 57 ontologies covering domains like anatomy, disease, pathways were evaluated. Text from HIPC datasets and protocols was annotated using these ontologies, with the NCI Thesaurus, Medical Subject Headings, and Gene Ontology mapping to the most terms. Many failures were due to missing commercial reagent terms. The conclusions are that ImmPort, the HIPC data repository, could adopt ontology-based encoding with additions to ontologies and text pre-processing.
Databases pathways of genomics and proteomics Sachin Kumar
The document discusses several databases related to human metabolism and pharmacology. It describes the contents and purpose of each database, including the Human Metabolome Database (HMDB), KEGG, MetaCyc, PubChem, ChEBI, DrugBank, the Therapeutic Target Database (TTD), PharmGKB, and Chemical Entities of Biological Interest (ChEBI). These databases contain chemical, clinical, molecular biology, pathway, and genomic data on human metabolites, drugs, and targets.
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationChris Southan
This document summarizes a presentation about the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) database. The following key points are made:
- GtoPdb is an online resource containing information on over 8,000 ligands and their interactions with around 1,500 human protein targets. It has been used widely by researchers and educators since 2009.
- The database contains detailed information on drug targets like GPCRs, ion channels, and enzymes. It also provides data on ligands, drugs, interactions between ligands and targets, and related clinical information.
- Users can browse targets and ligands or search the database. Detailed target pages contain pharmacology data, mechanisms, and links
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesAureus Sciences
- Aureus Sciences builds knowledge bases for predicting drug candidates' safety, focusing on areas like drug-drug interactions, safety pharmacology, and off-target effects.
- They have developed large structured databases of chemical and bioactivity information from literature and provide applications and services to analyze the data for customers in drug development.
- Their predictive models and databases have been shown to accurately predict drug interactions and off-target effects, helping customers optimize drug safety assessment.
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
The document discusses the Innovative Medicines Initiative's Open PHACTS project, which aims to develop robust standards and apply them in a semantic integration platform ("Open Pharmacological Space") to integrate drug discovery data from various public and private sources. The project brings together partners from industry, academia, and non-profits to build an open infrastructure for linking drug discovery knowledge and supporting ongoing research. It outlines the technical approach, priorities, and initial progress on developing exemplar applications and a prototype "lash up" system.
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Remedy Informatics
The discovery of clinical insights through effective management and reuse of data requires several conditions to be optimized: Data need to be digital, data need to be structured, and data need to be standardized in terms of metadata and ontology. This presentation describes a bioinformatics system that combines a next-generation biobank management model mapped to applicable international standards and guidelines with a master ontology that controls all input and output and is able to add unique properties to meet the specialized needs of clinicians for cross-disease research.
Bioinformatics is an interdisciplinary field involving biology, computer science, mathematics and statistics. It addresses large-scale biological problems from a computational perspective. Common problems include modeling biological processes at the molecular level and making inferences from collected data. A bioinformatics solution typically involves collecting statistics from biological data, building a computational model, solving a computational problem, and testing the algorithm. Bioinformatics plays a role in areas like structural genomics, functional genomics and nutritional genomics. It is used for applications such as transcriptome analysis, drug discovery, cheminformatics analysis, and more. It is an important tool in fields like molecular medicine, gene therapy, microbial genome applications, antibiotic resistance, and evolutionary studies. Biological databases are important for organizing
This document summarizes big data in the life sciences sector and its strategic importance for stakeholders such as pharmaceutical and medical device companies. It discusses how capturing, storing, managing data flows and analyzing large amounts of information affects all aspects of organizations, particularly the discovery and research & development stages. Implementing a strategic shift towards big data approaches requires support from senior management and organization-wide implementation. Areas that can benefit include genomics, clinical research, epidemiology, public health, and understanding product effectiveness and health outcomes. Managing data generated across the entire value chain, from discovery to real-world use, has become vastly more challenging due to increasing data volumes.
The document discusses the intersection of precision medicine, biomarkers, and healthcare policy. It describes how biomarkers and -omics data can be used for precision medicine to improve diagnostic accuracy, deliver targeted therapies, and stratify patient populations. However, clinical validation of biomarkers now requires large datasets and years of studies due to regulatory and payer requirements. This has reduced incentives for diagnostic innovation. The document also discusses challenges around clinical interpretation of complex multi-omic tests, evolving medical training and workflows, and disconnects between patent and reimbursement policies.
This document discusses various types of informatics including bioinformatics, chemoinformatics, and pharmaceutical databases. It defines bioinformatics as the study of biological data, most often DNA and amino acid sequences, through acquisition, storage, analysis, and dissemination. It also discusses applications of bioinformatics such as genome annotation and drug design. Similarly, it defines chemoinformatics as focusing on the collection, storage, analysis, and manipulation of chemical data to aid in fields like drug design. The document also provides brief descriptions and examples of biological, chemical, and pharmaceutical databases.
This document discusses various types of informatics including bioinformatics, chemoinformatics, and pharmaceutical databases. It defines bioinformatics as the acquisition, storage, analysis and dissemination of biological data, especially DNA and protein sequences. Some applications of bioinformatics mentioned include genome annotation, protein structure prediction, and disease diagnostics. Chemoinformatics focuses on the collection, storage, analysis and manipulation of chemical data to aid in drug design and other chemistry fields. The document also discusses ADME, chemical, biochemical and pharmaceutical databases and their uses and tools.
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Vaticle
In the broader realm of the advancement of science and the betterment of the human condition, there are several purported benefits for sharing clinical trials and research data. The scientific community has just begun to embrace open-access datasets to build their knowledge base, gain insight into new discoveries, and generate novel data-driven hypotheses that were not initially formulated in the studies. With the increasing amount of clinical trial data available, comes the need to leverage a multitude of shared datasets. Your knowledge base needs to facilitate discovery across research domains.
This talk highlights the data sharing, dissemination, and repurposing of clinical and molecular studies generated by government-funded research consortia. Further, we are building a new knowledge base resource, IMMGRAKN to facilitate translational discovery from crowd-sourced clinical trials data in ImmPort (www.immport.org), an NIH-NIAID funded open-access immunology database and analysis portal. The case studies demonstrating the use of IMMGRAKN will be discussed
Bioinformatics is the application of information technology to store, organize, and analyze vast amounts of biological data. It involves using mathematics, statistics, and computer science to understand and organize the information associated with biological macromolecules like DNA, RNA, and proteins. The goals of bioinformatics include uncovering biological information hidden in genetic sequences and using it to better understand areas like molecular medicine, drug development, and evolution. It utilizes tools like databases, algorithms, and data analysis to solve complex biological problems.
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
Hi, I am Jesmin, studying MCSE. I think this file will help you if you want to know the basic information about Bioinformatics and the use of BLAST tool. The BLAST tool is the tool that matches the sequences of DNA,RNA and proteins.
Similar to Quantifying the content of biomedical semantic resources as a core for drug discovery platforms (20)
Access to biomedical data is increasingly important to enable data driven science in the research community.
The Linked Open Data (LOD) principles (by Tim Berner-Lee) have been suggested to judge the quality of data by its accessibility (open data access), by its format and structures, and by its interoperability with other data sources.
The objective is to use interoperable data sources across the Web with ease.
The FAIR (findable, accessible, interoperable, reusable) data principles have been introduced for similar reasons with a stronger emphasis on achieving reusability.
In this presentation we assess the FAIR principles against the LOD principles to determine, to which degree, the FAIR principles reuse LOD principles, and to which degree they extend the LOD principles.
This assessment helps to clarify the relationship between both schemes and gives a better understanding, what extension FAIR represents in comparison to LOD.
We conclude, that LOD gives a clear mandate to the openness of data, whereas FAIR asks for a stated license for access and thus includes the concept of reusability under consideration of the license agreement.
Furthermore, FAIR makes strong reference to the contextual information required to improve reuse of the data, e.g., provenance information.
According to the LOD principles, such meta-data would be considered interoperable data as well, however, the requirement of extending of data with meta-data does indicate that FAIR is an extension of the LOD (in contrast to the inverse).
PROV has been adopted by a number of workflow systems for encoding the traces of workflow executions. Exploiting these provenance traces is hampered by two main impediments. Firstly, workflow systems extend PROV differently to cater for system-specific constructs. The difference between the adopted PROV extensions yields heterogeneity in the generated provenance traces. This heterogeneity diminishes the value of such traces, e.g. when combining and querying provenance traces of different workflow systems. Secondly, the provenance recorded by workflow systems tends to be large, and as such difficult to browse and understand by a human user. In this paper, we propose SHARP, a Linked Data approach for harmonizing cross-workflow provenance. The harmonization is performed by chasing tuple-generating and equality-generating dependencies defined for workflow provenance. This results in a provenance graph that can be summarized using domain-specific vocabularies. We experimentally evaluate the effectiveness of SHARP using a real-world omic experiment involving workflow traces generated by the Taverna and Galaxy systems.
SHARP is a Linked Data approach for harmonizing cross-workflow provenance. In this demo, we demonstrate SHARP through a real-world omic experiment involving workflow traces generated by Taverna and Galaxy systems.
SHARP starts by interlinking provenance traces generated by Galaxy and Taverna workflows and then harmonize the interlinked graphs thanks to OWL and PROV inference rules. The resulting provenance graph can be exploited for answering queries across Galaxy and Taverna workflow runs.
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Syed Muhammad Ali Hasnain
The document discusses using cognitive computing and frame semantic features for biomedical document clustering. Specifically:
- It explores using IBM Watson's Alchemy Language API and Framester to extract concepts, keywords, entities and semantic frames from medical reports to represent them as vectors.
- The vectors are used to cluster the reports using hierarchical and k-means clustering with cosine and Euclidean distance metrics.
- Clustering quality is evaluated using the Silhouette coefficient, showing semantic features led to better clusters than TF-IDF alone.
An Approach for Discovering and Exploring Semantic Relationships between GenesSyed Muhammad Ali Hasnain
This paper presents an approach for extracting, integrating and mining the annotations from a large corpus of gene summaries. It includes: i) a method for extracting annotations from several ontologies, mapping them into concepts and evaluating the semantic relatedness of genes, ii) the definition of a NoSQL graph database that leverages a loosely structured and multifaceted organization of data for storing concepts and their relationships, and iii) a mechanism to support the customized exploration of stored information. A prototype with a user-friendly interface fully enables users to visualize all concepts of their interest and to take advantage of their visualization for formulating biomedical hypotheses and discovering new knowledge.
A single interface for accessing life sciences (LS) data is a natural consequence to master the data deluge in this domain. The data in the LS requires integration and current integrative solutions increasingly rely on the federation of queries for distributed resources. We introduce a federated query processing system name ``BioFed", customised for LS-LOD. BioFed federates SPARQL queries over more than 130 public SPARQL endpoints.
The life sciences domain has been one of the early adopters
of linked data and, a considerable portion of the Linked Open Data cloud is comprised of datasets from Life Sciences Linked Open Data (LSLOD). The deluge of biomedical data in the last few years, partially caused by the advent of high-throughput gene sequencing technologies, has been a primary motivation for these efforts. This success has lead to the growth in size of data sets and to the need for integrating multiples of these data-sets. This growth requires large scale distributed infrastructure and specific techniques for managing large linked data graphs. Especially in combination with Semantic Web and Linked Data technologies these promises to enable the processing of large as well as semantically heterogeneous data sources and the capturing of new knowledge from those. In this tutorial we present the state of the art in large data processing, as well as the amalgamation with Linked Data and Semantic Web technologies for better knowledge discovery and targeted applications. We aim to provide useful information for the Knowledge Acquisition research community as well as the working Data Scientist.
Tutorial at K-Cap 2015:
Knowledge Processing with Big Data and
Semantic Web Technologies.
Session 0: Motivation
Session 1: Infrastructure
Session 2: Data Curation
Session 3: Query Federation
Session 4: Analyze
Session 5: Visualization
Session 6: Hands On Session
Health care and life sciences research heavily relies on the ability to search, discover, formulate and correlate data from distinct sources. Over the last decade the deluge of health care life science data and the standardisation of linked data technologies resulted in publishing datasets of great importance. This emerged as an opportunity to explore new ways of bio-medical discovery through standardised interfaces.
Although the Semantic Web and Linked Data technologies help in dealing with data integration problem there remains a barrier adopting these for non-technical research audiences. In this paper we present FedViz, a visual interface for SPARQL query formulation and execution. FedViz is explicitly designed to increase intuitive data interaction from distributed sources and facilitates federated as well as non-federated SPARQL queries formulation. FedViz uses FedX for query execution and results retrieval. We also evaluate the usability of our system by using the standard system usability scale as well as a custom questionnaire, particularly designed to test the usability of the FedViz interface. Our overall usability score of 74.16 suggests that FedViz interface is easy to learn, consistent, and adequate for frequent use.
Ethical considerations play a crucial role in research, ensuring the protection of participants and the integrity of the study. Here are some subject-specific ethical issues that researchers need
Possible Anthropogenic Contributions to the LAMP-observed Surficial Icy Regol...Sérgio Sacani
This work assesses the potential of midsized and large human landing systems to deliver water from their exhaust
plumes to cold traps within lunar polar craters. It has been estimated that a total of between 2 and 60 T of surficial
water was sensed by the Lunar Reconnaissance Orbiter Lyman Alpha Mapping Project on the floors of the larger
permanently shadowed south polar craters. This intrinsic surficial water sensed in the far-ultraviolet is thought to be
in the form of a 0.3%–2% icy regolith in the top few hundred nanometers of the surface. We find that the six past
Apollo Lunar Module midlatitude landings could contribute no more than 0.36 T of water mass to this existing,
intrinsic surficial water in permanently shadowed regions (PSRs). However, we find that the Starship landing
plume has the potential, in some cases, to deliver over 10 T of water to the PSRs, which is a substantial fraction
(possibly >20%) of the existing intrinsic surficial water mass. This anthropogenic contribution could possibly
overlay and mix with the naturally occurring icy regolith at the uppermost surface. A possible consequence is that
the origin of the intrinsic surficial icy regolith, which is still undetermined, could be lost as it mixes with the
extrinsic anthropogenic contribution. We suggest that existing and future orbital and landed assets be used to
examine the effect of polar landers on the cold traps within PSRs
Hydrogen sulfide and metal-enriched atmosphere for a Jupiter-mass exoplanetSérgio Sacani
We observed two transits of HD 189733b in JWST program 1633 using JWST
NIRCam grism F444W and F322W2 filters on August 25 and 29th 2022. The first
visit with F444W used SUBGRISM64 subarray lasting 7877 integrations with 4
BRIGHT1 groups per integration. Each effective integration is 2.4s for a total effective exposure time of 18780.9s and a total exposure duration of 21504.2s (∼6 hrs)
including overhead. The second visit with F322W2 used SUBGRISM64 subarray
lasting 10437 integrations with 3 BRIGHT1 groups per integration. Each effective
integration is 1.7s for a total effective exposure time of 17774.7s and a total exposure
duration of 21383.1s (∼6 hrs) including overhead. The transit duration of HD189733
b is ∼1.8 hrs and both observations had additional pre-ingress baseline relative to
post-egress baseline in anticipating the potential ramp systematics at the beginning
of the exposure from NIRCam infrared detectors.
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
just download it to see!
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking ...University of Maribor
Slides from talk:
Aleš Zamuda, Mark Dokter:
Deploying DAPHNE Computational Intelligence on EuroHPC Vega for Benchmarking Randomised Optimisation Algorithms.
2024 International Conference on Broadband Communications for Next Generation Networks and Multimedia Applications (CoBCom), 9--11 July 2024, Graz, Austria
https://www.cobcom.tugraz.at/
ScieNCE grade 08 Lesson 1 and 2 NLC.pptxJoanaBanasen1
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it................
just download it..............
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Hossein Fani
Collaborative team recommendation involves selecting users with certain skills to form a team who will, more likely than not, accomplish a complex task successfully. To automate the traditionally tedious and error-prone manual process of team formation, researchers from several scientific spheres have proposed methods to tackle the problem. In this tutorial, while providing a taxonomy of team recommendation works based on their algorithmic approaches to model skilled users in collaborative teams, we perform a comprehensive and hands-on study of the graph-based approaches that comprise the mainstream in this field, then cover the neural team recommenders as the cutting-edge class of approaches. Further, we provide unifying definitions, formulations, and evaluation schema. Last, we introduce details of training strategies, benchmarking datasets, and open-source tools, along with directions for future works.
Dalghren, Thorne and Stebbins System of Classification of AngiospermsGurjant Singh
The Dahlgren, Thorne, and Stebbins system of classification is a modern method for categorizing angiosperms (flowering plants) based on phylogenetic relationships. Developed by botanists Rolf Dahlgren, Robert Thorne, and G. Ledyard Stebbins, this system emphasizes evolutionary relationships and incorporates extensive morphological and molecular data. It aims to provide a more accurate reflection of the genetic and evolutionary connections among angiosperm families and orders, facilitating a better understanding of plant diversity and evolution. This classification system is a valuable tool for botanists, researchers, and horticulturists in studying and organizing the vast diversity of flowering plants.
Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from...James AH Campbell
"Probing the northern Kaapvaal craton root with mantle-derived xenocrysts from the Marsfontein orangeite diatreme, South Africa".
N.S. Ngwenya, S. Tappe, K.A. Smart, D.C. Hezel, J.A.H. Campbell, K.S. Viljoen
Keys of Identification for Indian Wood: A Seminar ReportGurjant Singh
Identifying Indian wood involves recognizing key characteristics such as grain patterns, color, texture, hardness, and specific anatomical features. These identification keys include observing the wood's pores, growth rings, and resin canals, as well as its scent and weight. Understanding these features is essential for accurate wood identification, which is crucial for various applications in carpentry, furniture making, and conservation.
Additionally, the application of Convolutional Neural Networks (CNN) in wood identification has revolutionized this field. CNNs can analyze images of wood samples to identify species with high accuracy by learning and recognizing intricate patterns and features. This technological advancement not only enhances the precision of wood identification but also accelerates the process, making it more efficient for industry professionals and researchers alike.
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...Sérgio Sacani
Recent years have seen increasing public attention and indeed concern regarding Unidentified
Anomalous Phenomena (UAP). Hypotheses for such phenomena tend to fall into two classes: a
conventional terrestrial explanation (e.g., human-made technology), or an extraterrestrial explanation
(i.e., advanced civilizations from elsewhere in the cosmos). However, there is also a third minority
class of hypothesis: an unconventional terrestrial explanation, outside the prevailing consensus view of
the universe. This is the ultraterrestrial hypothesis, which includes as a subset the “cryptoterrestrial”
hypothesis, namely the notion that UAP may reflect activities of intelligent beings concealed in stealth
here on Earth (e.g., underground), and/or its near environs (e.g., the moon), and/or even “walking
among us” (e.g., passing as humans). Although this idea is likely to be regarded sceptically by most
scientists, such is the nature of some UAP that we argue this possibility should not be summarily
dismissed, and instead deserves genuine consideration in a spirit of epistemic humility and openness.
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
Quantifying the content of biomedical semantic resources as a core for drug discovery platforms
1. Quantifying the content of Biomedical Semantic
Resources as a core for Drug Discovery
Platforms
Ali Hasnain and Dietrich Rebholz-Schuhmann
May 2017
2. Agenda
• Introduction
• Motivation
• Ontologies
• Biomedical Ontologies
• Drugs and Chemical Compound Ontologies
• Upper level Ontologies
• Data Repositories/ Databases for Drug Discovery
• Gene, Gene Expression and Protein Databases
• Pathway databases
• Chemical and Structure Databases
• Disease Specific Databases for Prevention
• Literature databases
• Life Sciences Linked Open Data Cloud
• Linked Open Drug Data (LODD)
• Bio2RDF
• LinkedLifeData
• Related Work
• Conclusion
2
3. Introduction
• Biomedical data exists as ontologies, repositories, and
other open data resources e.g, Life Science Linked
Open Data (LS- LOD) relevant in the context of Drug
Discovery and Cancer Chemoprevention.
• The analysis gives an overview of which resources
have to be considered, what amount of data requires
integration and provides the opportunity to tailor
semantic solutions to specific needs in terms of size
and performance.
5. Linked Data for Cancer Chemoprevention
• Because Biomedical Data is heterogeneous and spread
across multiple sources
5
~5 molecs testable in
the lab
~2000 small
molecs
~100 molecs
~ 10 interesting
pathways
Literature
Insilicomodels
Browsedatabases
Hypothesis
Generation
Linked Data
6. Heterogeneous Data – Multiple Data sources
DrugBank
DailyMed
CheBI, KEGG
Reactome
Sider
BioPax
Medicare
6
8. Ontologies
These ontologies can fall into three main categories:
1. The Biomedical ontologies are mainly used by biomedical
applications and define the basic biological structures
(e.g. genes, pathways etc).
2. The Drugs and Chemical Compound Ontologies are
related to the clinical drugs and their active ingredients.
3. The upper level ontologies describe general concepts that
many biomedical ontologies share.
8
9. Ontology spectrum by Jimeno et. al [1]
[1]: Antonio Jimeno-Yepes, Ernesto Jim´enez-Ruiz, Rafael Berlanga, and Dietrich Rebholz-Schuhmann. Use of shared lexical resources for efficient
ontological engineering. In Semantic Web Applications and Tools for Life Sciences Workshop (SWAT4LS). CEUR WS Proceedings, volume 435, pages 93–
136, 2008
9
10. Biomedical Ontologies (selected)
• Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO)
– data exchange in oncology, integration of clinical and molecular data
• Biological Pathway Exchange (BioPAX)
– metabolic, biochemical, transcription regulation, protein synthesis, signal transduction
pathways
• Experimental Factor Ontology (EFO)
– enhance and promote consistent annotation, automatic annotation to integrate external
data
• Gene Ontology (GO)
– for describing biological processes, molecular functions and cellular components of gene
products
• Medical Subject Headings (MeSH)
– hierarchical structure for indexing, cataloguing, and searching for biomedical/ health-related
data.
• Microarray Gene Expression Data Ontology (MGED)
– the biological sample, the treatment sample and the micro-array chip technology in the
experiment
• National Cancer Institute (NCI) Thesaurus
– integrates molecular and clinical cancer-related information to integrate, retrieve and relate
concepts
• Ontology for biomedical Investigations (OBI)
– designs, protocols, instrumentation, materials, processes, data in biological & biomedical
investigations 10
11. Drugs and Chemical Compound Ontologies (selected)
• RxNorm
– standard names for clinical drugs active drug ingredient, dosage
strength, physical form) and links
• Basic Formal Ontology (BFO)
– formalise entities such as 3D enduring objects and comprehending
processes
• OBO Relation Ontology (RO)
– formal definitions of basic relations that cross-cut the biomedical domain
• Provenance Ontology (PROVO)
– provides classes, properties and restrictions for provenance information
11
Generic and Upper Ontologies (selected)
12. Statistical overview of implementation details of
Ontologies (selected)
Ontology Category Year* Topic Implementation Classes Properties Individuals Depth
ACGT-MO Biomedical 2008 Cancer OWL/CVC/RDF/XML 1769 260 61 18
BioPAX Biomedical 2010 Pathways OWL/CVC/RDF/XML 68 96 0 4
EFO Biomedical 2015 Experimental Factors OWL/CVC/RDF/XML 18596 35 0 14
GO Biomedical 2016 Genomics and Proteomic OWL/CVC/RDF/XML 4419 9 0 16
MeSH Biomedical 2009 Health RDF/TTL/ CSV 252375 38 0 15
MGED Biomedical 2009 Microarray Experiment OWL/CVC/RDF/XML 233 121 698 8
NCIT Biomedical 2007 Clinical care OWL/CVC/RDF/XML 118167 173 45715 16
OBI Biomedical 2008 Experimental Data OWL/CVC/RDF/XML 2932 106 178 16
UMLS Biomedical 1993 Biomedical/ Health RDF 3221702 - - -
RxNorm Drugs 1993 Clinical Drugs OWL/CVC/RDF/XML 118555 46 0 0
BFO Generic 2003 Genuine Upper Ontology OWL/CVC/RDF/XML 35 0 0 5
RO Generic 2005
Relations used in all OBO
ontologies
OWL/CVC/RDF/XML - - - -
PROVO Generic 2012 PROV Data Model OWL/CVC/RDF/XML 30 50 4 3
*Statistics as of Aug 2016 - listed at BioPortal- Year specify the time when the last- most recent version is
produce. “-” means information not available.
12
14. Public Data Repositories for Drug Discovery
• The databases are separated into the following
categories:
– Gene, Gene Expression and Protein Databases for
gene and protein annotations as well as the expression
levels and related clinical data.
– Pathway Databases denoting the protein interactions and
the overall functional outcomes.
– Chemical and Structure Databases including Biological
Activities for the information related to drugs and other
chemicals including also toxicity observations and clinical
trials.
– Disease Specific Databases for Prevention which
deliver content specific to the prevention of cancer.
– Literature Databases
14
15. Gene, Gene Expression and Protein Databases
• GenBank
– over 65 B nucleotide bases in more than 61 M sequences
• ArrayExpress
– 65060 experiments 1'973'776 assays, annotated data for gene
expression from biological experiments
• Gene Expression Omnibus (GEO)
– 3'848 datasets gene expression for specific studies
• Universal Protein Resource (UniProt)
– 63'686'057 sequences, 21'364'768'379 amino acids
classifications, cross-references, annotation of proteins
• Protein Data Bank (PDB)
– 118280 Biological Structures evidence of experimentally
validated protein structures
• Protein Database
– 30'047Protein Entries, 41'327PPIs translated coding regions
from GenBank, TPA, SwissProt, PIR, PRF, UniProt and PDB.
15
16. Pathway Databases
• Kyoto Encyclopedia of Genes and Genomes (KEGG)
– 432'883PathwayMaps, 153'776hierarchies, genome
sequencing and high-throughput experimental technologies
• Reactome
– 9'386 Proteins and pathway data for signalling,
transcriptional regulation, translation, apoptosis, other
• Wikipathways
– 2'475 pathways complementing e g. KEGG, Reactome,
Pathway Commons
• cPath: Pathway Database Software
– 31'698 pathways, 1'151'476 interactions, pathway
visualisation, analysis and modelling
16
17. Chemical and Structure Databases including
Biological Activities
• Chemical Compounds Database (Chembase)
– 150'000 pages, compounds, their physical and chemical properties, mass spectra
• Chemical Entities of Biological Interest (ChEBI)
– 48'296 compounds, natural and synthetic atom, molecule, ion, radical, conformer
• DrugBank
– 8,261 drugs, 4,164 targets, 243 Enzymes, 118 Transporters, drug (chemical,
pharmaceutical), drug target (sequence, structure, pathway)
• PubChem
– 89'124'401 Compounds, compound neighbouring, sub/superstructure, bioactivity
data
• Aggregated Computational Toxicology Resource (ACToR)
– more than 500 public source , environmental chemicals searchable by name and
structure
• ClinicalTrials
– 213'868 studies , offers information for locating clinical trials for diseases and
conditions
• TOXicology Data NETwork (TOXNET)
– toxicology, hazardous chemicals, environmental health and related areas 17
18. Disease Specific Databases for
Prevention• Colon Chemoprevention Agents Database (CCAD)
– 1,137 agents and literature data for colon chemoprevention in human, rats,
mice
• Dietary Supplements Labels Database
– 5'000 brands of dietary supplements to compare label ingredients in different
brands. Links to other databases such as MedlinePlus and PubMed
• REPAIRtoire Database
– DNA damage links, pathways, proteins for DNA re-pair, diseases related to
mutations
• Pubmed
– journal citations i.e. Primary source of information for bio-medical researchers
• PubMed Dietary Supplement Subset
– dietary supplement literature including vitamin, mineral, botanical/herbal
supplements
18
Literature Databases
19. Statistical overview of implementation details of
libraries and databases (selected)
Database Category Year* Topic Implementation Size/ Stats
PubMed Literature 1996 Biomedical Literature WebBased/ CSV 11 M Journal citations
PDSS Literature 1999 Citations of dietary supplement WebBased X
DSLD Chemoprevention 2013 Ingredients of dietary supplement WebBased > 5000 selected brands
ClinicalTrials Toxicity 2000 Clinical Trials WebBased 213,868 studies
TOXNET Toxicity 1987 Toxicology Database WebBased X
ACToR Compound 2008 Chemical Toxicity Data WebBased >500 public sources
DrugBank Compound 2008 Drug Data WebBased/LOD 8206 drugs
ChEBI Compound X Small Molecular entities WebBased/LOD 48,296 compounds
PubChem Compound 2004 Compound Structure WebBased/LOD 89,124,401 compounds
ChemSpider Chemical 2007 Compound Structure WebBased >40 million structures
KEGG Pathway 1995 Genomic, Chemical, systemic WebBased/LOD 432883pathway maps
Reactome Pathway 2003 Pathways WebBased 9386 proteins
Wikipathway Pathway 2007 Biological pathways WebBased 2475 pathways
cPath Pathway 2005 Biological pathways Desktop/WebBased 31698 pathways
Uniprot Protein 2002 Protein Sequence WebBased/LOD 63686057sequences
PDB Protein 1971 3D structural data of Proteins WebBased/LOD 30,047protein
*Statistics as of Aug 2016 - Year specify the time when the last- most recent version is produce. “X” means
information not available.
19
20. Life Sciences Linked Open Data Cloud
• Linked biomedical datasets relevant in a Cancer Chemoprevention
and drug discovery scenario:
– Linked Open Drug Data (LODD)
• Set of linked datasets relevant to Drug Discovery that includes data
from several datasets including Drugbank, LinkedCT, DailyMed,
Diseasome, SIDER, STITCH, Medicare, RxNorm, ClinicalTrials.gov,
NCBI Entrez Gene and OMIM.
– Bio2RDF
• Contains multiple linked biological databases including pathways
databases such as KEGG, PDB and several NCBIs databases. An
open-source project that uses Semantic Web technologies to build
and provide the largest network of Linked Data for the Life Sciences.
– LinkedLifeData
• A semantic data integration platform for the biomedical domain
containing 5 billion RDF statements from various sources including
UniProt, PubMed, EntrezGene and 20 more.
20
21. The Linked Open Data Cloud
“Life sciences will drive adoption of the Semantic Web, just as high-energy physics
drove the early Web.”
- Sir Tim Berners-Lee, 2005
Proteins
Molecules
Genes
Diseases
21
25. [2]: Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer
Chemoprevention linked-data sources. Semantic Web (2013)
[3]: Hasnain, A.’ et al.: Linked Biomedical Dataspace: Lessons Learned integrating Data for Drug Discovery. In:
International Semantic Web Conference (In-Use Track), October 2014 (2014)
Related Work (selected)
• Zeginis et al. [2] proposed “meet-in-the-middle” approach to develop
the semantic model relevant for cancer chemoprevention. Relevant
data was analysed in a bottom-up fashion from analysing the
domain whereas a top-down approach was considered to collect
ontologies, vocabularies and data models.
• Hasnain et al. [3] proposed Linked Biomedical Dataspace (access
and use biomedical resources relevant for cancer chemoprevention)
with components namely:
– a) knowledge extraction,
– b) link creation,
– c) query execution and
– d) knowledge publishing.
25
26. Conclusion
• In this paper we introduce and classify different tiers of biomedical Data
relevant to Cancer Chemoprevention and Drug Discovery domain.
• This involves Ontologies, databases and Life Science Linked Open Data in
Healthcare, Life Sciences and Biomedical Domain
• We classify ontologies into three main classes:
– i) biomedical Ontologies (e.g. EFO, OBI, GO etc),
– ii) Drugs and Chemical Compound Ontologies (e.g. RxNorm) and
– iii) Generic and Upper Ontologies (e.g. BFO, RO, PROV).
• Similarly we categorise libraries and databases in five categories:
– (i) Gene, Gene Expression and Protein Databases,
– (ii) Pathway databases,
– (iii) Chemical and Structure Databases including Biological Activities,
– (iv) Disease Specific Databases for Prevention, and
– (v) Literature databases.
26
Link to next slide is –Linked data is the faclitates complex queries and workflows to be assembled
To discovery which links could our datasets have to other datasources, we’ve explored what types of data are published in the linked open data cloud.
What we found was a lot of messy data – looking at 8 datasets containing molecular data, their descriptions are very different; chebi calls molecules compounds, drugbank calls them drugs, dailymed calls them drugs as well but uses a different identifier.
Link – how to start linking all of these datasets such that they can be made available in a unified query interface?
EGFR: Epidermal growth factor receptor
BioMedical Ontologies:
Advancing Clinico-Genomic Trials on Cancer (ACGT)
Master Ontology (MO
Biological Pathway Exchange (BioPAX)
Experimental Factor Ontology (EFO)
Gene Ontology (GO
Medical Subject Headings (MeSH
Microarray Gene Expression Data Ontology (MGED
National Cancer Institute (NCI)
Ontology for biomedical Investigations (OBI
Unified Medical Language System (UMLS)
Drugs and Chemical Compound Ontologies:
RxNorm
Generic and Upper Ontologies:
Basic Formal Ontology (BFO
OBO Relation Ontology (RO)
Provenance Ontology (PROVO)
Literature Databases:
Pubmed
PubMed Dietary Supplement Subset
Natural Sources of Chemoprevention Agents Databases:
Dietary Supplements Labels Database
Toxicity and Efficacy Databases:
ClinicalTrials
TOXicology Data NETwork (TOXNET)
Biological Activity of Compounds Databases
Aggregated Computational Toxicology Resource (ACToR)
DrugBank
Chemical Entities of Biological Interest (ChEBI)
PubChem
Repartoire Database
Gene Expression Databases:
Cancer Gene Expression Database (CGED)
ArrayExpress
Gene Expression Omnibus (GEO)
Gene and DNA Databases:
GenBank
Chemical and Physical Structure Databases:
ChemSpider
Chemical Compounds Database (Chembase)
Sigma-Aldrich
ChemDB
Disease Specific Compound Databases:
Colon Chemoprevention Agents Database
Pathway Databases:
Kyoto Encyclopedia of Genes and Genomes (KEGG)
Reactome
Wikipathways\footnote
cPath: Pathway Database Software\footnote
Protein Databases:
Universal Protein Resource (UniProt)
Protein Data Bank (PDB)
Protein Database
M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets
-> once we identify these concepts, how do we actualy query them toegether?