Repetitive DNA and next-generation sequencing: computational challenges and solutions

TJ Treangen, SL Salzberg - Nature Reviews Genetics, 2012 - nature.com
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to
mammals, and they cover nearly half of the human genome. Repeats have always …

What are decision trees?

C Kingsford, SL Salzberg - Nature biotechnology, 2008 - nature.com
What are decision trees? | Nature Biotechnology Skip to main content Thank you for visiting
nature.com. You are using a browser version with limited support for CSS. To obtain the best …

Bioinformatics challenges of new sequencing technology

M Pop, SL Salzberg - Trends in genetics, 2008 - cell.com
New DNA sequencing technologies can sequence up to one billion bases in a single day at
low cost, putting large-scale sequencing within the reach of many scientists. Many …

On comparing classifiers: Pitfalls to avoid and a recommended approach

SL Salzberg - Data mining and knowledge discovery, 1997 - Springer
An important component of many data mining projects is finding a good classification
algorithm, a process that requires very careful thought about experimental design. If not …

Genome sequence of the human malaria parasite Plasmodium falciparum

MJ Gardner, N Hall, E Fung, O White, M Berriman… - Nature, 2002 - nature.com
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of
malaria, and kills more than one million African children annually. Here we report an …

Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

D Kim, JM Paggi, C Park, C Bennett… - Nature biotechnology, 2019 - nature.com
The human reference genome represents only a small number of individuals, which limits its
usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for …

The transcriptional landscape of the mammalian genome

P Carninci, T Kasukawa, S Katayama, J Gough… - science, 2005 - science.org
This study describes comprehensive polling of transcription start and termination sites and
analysis of previously unidentified full-length complementary DNAs derived from the mouse …

FLASH: fast length adjustment of short reads to improve genome assemblies

T Magoč, SL Salzberg - Bioinformatics, 2011 - academic.oup.com
Motivation: Next-generation sequencing technologies generate very large numbers of short
reads. Even with very deep genome coverage, short read lengths cause problems in de …

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

M Pertea, GM Pertea, CM Antonescu, TC Chang… - Nature …, 2015 - nature.com
Methods used to sequence the transcriptome often produce more than 200 million short
sequences. We introduce StringTie, a computational method that applies a network flow …

The Genome Sequence of the Malaria Mosquito Anopheles gambiae

RA Holt, GM Subramanian, A Halpern, GG Sutton… - science, 2002 - science.org
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500
million people and causes more than 1 million deaths each year. Tenfold shotgun sequence …