Repetitive DNA and next-generation sequencing: computational challenges and solutions
TJ Treangen, SL Salzberg - Nature Reviews Genetics, 2012 - nature.com
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to
mammals, and they cover nearly half of the human genome. Repeats have always …
mammals, and they cover nearly half of the human genome. Repeats have always …
What are decision trees?
C Kingsford, SL Salzberg - Nature biotechnology, 2008 - nature.com
What are decision trees? | Nature Biotechnology Skip to main content Thank you for visiting
nature.com. You are using a browser version with limited support for CSS. To obtain the best …
nature.com. You are using a browser version with limited support for CSS. To obtain the best …
Bioinformatics challenges of new sequencing technology
M Pop, SL Salzberg - Trends in genetics, 2008 - cell.com
New DNA sequencing technologies can sequence up to one billion bases in a single day at
low cost, putting large-scale sequencing within the reach of many scientists. Many …
low cost, putting large-scale sequencing within the reach of many scientists. Many …
On comparing classifiers: Pitfalls to avoid and a recommended approach
SL Salzberg - Data mining and knowledge discovery, 1997 - Springer
An important component of many data mining projects is finding a good classification
algorithm, a process that requires very careful thought about experimental design. If not …
algorithm, a process that requires very careful thought about experimental design. If not …
Genome sequence of the human malaria parasite Plasmodium falciparum
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of
malaria, and kills more than one million African children annually. Here we report an …
malaria, and kills more than one million African children annually. Here we report an …
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
The human reference genome represents only a small number of individuals, which limits its
usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for …
usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for …
The transcriptional landscape of the mammalian genome
This study describes comprehensive polling of transcription start and termination sites and
analysis of previously unidentified full-length complementary DNAs derived from the mouse …
analysis of previously unidentified full-length complementary DNAs derived from the mouse …
FLASH: fast length adjustment of short reads to improve genome assemblies
T Magoč, SL Salzberg - Bioinformatics, 2011 - academic.oup.com
Motivation: Next-generation sequencing technologies generate very large numbers of short
reads. Even with very deep genome coverage, short read lengths cause problems in de …
reads. Even with very deep genome coverage, short read lengths cause problems in de …
StringTie enables improved reconstruction of a transcriptome from RNA-seq reads
Methods used to sequence the transcriptome often produce more than 200 million short
sequences. We introduce StringTie, a computational method that applies a network flow …
sequences. We introduce StringTie, a computational method that applies a network flow …
The Genome Sequence of the Malaria Mosquito Anopheles gambiae
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500
million people and causes more than 1 million deaths each year. Tenfold shotgun sequence …
million people and causes more than 1 million deaths each year. Tenfold shotgun sequence …