Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows
<p>Flow chart of the platforms from each participating organization’s workflows at the time of analysis. Shown are the schematics for (<b>A</b>) Illumina platform variant calling and (<b>B</b>) Oxford Nanopore Technologies (ONT) variant calling. For each sequencing platform, the main steps of variant calling are captured in each box, including: read retrieval, host removal, read trimming, alignment, variant calling, variant filtering, and variant normalization. For each step, the software used by each workflow is noted.</p> "> Figure 2
<p>The impact of host contamination removal and primer trimming. (<b>A</b>) The removal of host reads from RNAseq SARS-CoV-2 sequencing result, SRA run SRR12245095, reduced the potential for false-positive variant calls. In the top panel, additional mutations were present in aligned reads between positions 3049–3076 of NC_045512 when host reads were not removed. After excluding host reads (bottom panel), reads containing the mutations were no longer observed. (<b>B</b>) Allele frequencies of variants called after trimming primer sequences from aligned reads (corrected allele frequencies) are plotted against allele frequencies of the same variants called without primer trimming (uncorrected allele frequencies). Primer trimming increases the allele frequencies of most within-primer binding sites variants. Blue lines represent the allele-frequency thresholds used in this study to filter variant calls (allele frequency; AF ≥ 0.15) and to call consensus variants (AF ≥ 0.5).</p> "> Figure 3
<p>The effect of Alternate Allele Depth and Alternate Allele Frequency on variant calling agreement across workflows and platforms. For each panel, calls made by all but one workflow (<b>A</b>–<b>D</b>) or both platforms (<b>E</b>–<b>H</b>) were considered true-positives, while calls made by only a single workflow (or technology) were considered false-positives, thus the ROC AUCs cannot be directly compared between groups. For the right panels, points represent an Allele Frequency (AF) cut-off of 1 at the lower-leftmost point, and the cut-off decreases by 0.1 along the length of the line. For the left panels, the points represent a minimum Alternate Allele Depth (AltDP) going from 4,000 at the lower-left most point to 10 along each line. (<b>A</b>,<b>B</b>) Impact of AltDP and AF, respectively, on Illumina workflow accuracy and specificity across workflows. (<b>C</b>,<b>D</b>) Impact of AltDP and AF, respectively, on Illumina workflow accuracy and specificity across platforms. (<b>E</b>,<b>F</b>) Impact of AltDP and AF, respectively, on ONT workflow accuracy and specificity across workflows. (<b>G</b>,<b>H</b>) Impact of AltDP and AF, respectively, on ONT workflow accuracy and specificity across platforms.</p> "> Figure 4
<p>Agreement across workflows with and without recommended parameters. (<b>A</b>–<b>D</b>) Agreement across workflows, without recommended parameters. (<b>E</b>–<b>H</b>) Agreement across workflows, with recommended parameters. (<b>A</b>,<b>E</b>) Agreement on Illumina SNP calls. (<b>B</b>,<b>F</b>) Agreement on Illumina InDel calls. (<b>C</b>,<b>G</b>) Agreement on Oxford Nanopore (ONT) SNP calls. (<b>D</b>,<b>H</b>) Agreement on ONT InDel Calls. For each figure, the bars indicate the number of variants called by the groups, indicated by filled circles below, across the whole dataset.</p> "> Figure 5
<p>Application of recommended parameters results in increased agreement across platforms. Graphical representation of the agreement between platforms without the application of recommended parameters of SNP (<b>A</b>) and InDel (<b>B</b>) calls. (<b>C</b>) (SNP) and (<b>D</b>) (InDel) represent the agreement between platforms after the application of the recommended parameters. For each figure, only those samples for which both Illumina and ONT platform data had at least one variant call that passed all of the filters were considered. The total height is normalized to the total number of calls made by each workflow, with light blue portion indicating calls made on both platforms for a given sample, medium blue indicating calls made only for Illumina data, and dark blue indicating calls made only for ONT data.</p> "> Figure 6
<p>Variant calling workflow recommendations. Outline of the recommendations for each step in a variant calling workflow, from read cleanup to variant filtering, are illustrated. Additionally, the benefit of implementing the recommendations at each step are noted.</p> ">
Abstract
:1. Introduction
2. Results
2.1. Datasets
2.2. SARS-CoV-2 Bioinformatic Workflow Overview
2.3. Data Pre-Processing Impacts on Variant Calling
2.4. A Parsimony Normalization Method to Standardize Variant Reporting across All Workflows Allowed Vis-a-Vis Comparison of Variants
2.5. Within-Sample Allele Frequency and Per-Position Depth of Coverage Determine the Consistency of Variant Calls across Workflows
2.6. Filtering for Highly Supported Variant Calls Supports Cross-Workflow and Cross-Platform Agreement
3. Discussion
4. Methods
4.1. Dataset
4.2. BEI Resources
4.3. Galaxy Project
4.3.1. Variant Calling from Paired-End Illumina Short-Read Data
4.3.2. Variant Calling from Oxford Nanopore Long-Read Data
4.4. Gilead Sciences
4.4.1. Illumina
4.4.2. ONT
4.5. Los Alamos National Laboratory (EDGE-COVID19)
4.6. Lilly
Illumina
4.7. NCBI
4.7.1. Illumina
4.7.2. ONT
4.8. VIR
4.8.1. Illumina
4.8.2. ONT
5. Parsimony Script
6. Studying the Effects of Primer Trimming on Variant Calls and Apparent Allele Frequencies of Called Variants
7. Calculation of Receiver Operating Characteristic (ROC) Plots Based on Concordance across Workflows
8. Calculation of Receiver Operating Characteristic (ROC) Plots Based on Cross-Platform Agreement
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Khoury, M.J.; Bowen, M.S.; Clyne, M.; Dotson, W.D.; Gwinn, M.L.; Green, R.F.; Kolor, K.; Rodriguez, J.L.; Wulf, A.; Yu, W. From Public Health Genomics to Precision Public Health: A 20-Year Journey. Genet. Med. 2018, 20, 574–582. [Google Scholar] [CrossRef] [PubMed]
- Van Goethem, N.; Descamps, T.; Devleesschauwer, B.; Roosens, N.H.C.; Boon, N.A.M.; Van Oyen, H.; Robert, A. Status and Potential of Bacterial Genomics for Public Health Practice: A Scoping Review. Implement. Sci. 2019, 14, 1–16. [Google Scholar] [CrossRef] [PubMed]
- Sayers, E.W.; Cavanaugh, M.; Clark, K.; Pruitt, K.D.; Schoch, C.L.; Sherry, S.T.; Karsch-Mizrachi, I. GenBank. Nucleic Acids Res. 2022, 50, D161–D164. [Google Scholar] [CrossRef] [PubMed]
- Katz, K.; Shutov, O.; Lapoint, R.; Kimelman, M.; Brister, J.R.; O’Sullivan, C. The Sequence Read Archive: A Decade More of Explosive Growth. Nucleic Acids Res. 2022, 50, D387–D390. [Google Scholar] [CrossRef] [PubMed]
- Lo, S.W.; Jamrozy, D. Genomics and Epidemiological Surveillance. Nat. Rev. Microbiol. 2020, 18, 478. [Google Scholar] [CrossRef]
- Ling-Hu, T.; Rios-Guzman, E.; Lorenzo-Redondo, R.; Ozer, E.A.; Hultquist, J.F. Challenges and Opportunities for Global Genomic Surveillance Strategies in the COVID-19 Era. Viruses 2022, 14, 2532. [Google Scholar] [CrossRef]
- Chiara, M.; D’Erchia, A.M.; Gissi, C.; Manzari, C.; Parisi, A.; Resta, N.; Zambelli, F.; Picardi, E.; Pavesi, G.; Horner, D.S.; et al. Next Generation Sequencing of SARS-CoV-2 Genomes: Challenges, Applications and Opportunities. Brief. Bioinform. 2021, 22, 616–630. [Google Scholar] [CrossRef]
- Tao, K.; Tzou, P.L.; Nouhin, J.; Gupta, R.K.; Oliveira, T.; Kosakovsky Pond, S.L.; Fera, D.; Shafer, R.W. The Biological and Clinical Significance of Emerging SARS-CoV-2 Variants. Nat. Rev. Genet. 2021, 22, 757–773. [Google Scholar] [CrossRef]
- Zverinova, S.; Guryev, V. Variant Calling: Considerations, Practices, and Developments. Hum. Mutat. 2022, 43, 976–985. [Google Scholar] [CrossRef]
- Pan, B.; Ren, L.; Onuchic, V.; Guan, M.; Kusko, R.; Bruinsma, S.; Trigg, L.; Scherer, A.; Ning, B.; Zhang, C.; et al. Assessing Reproducibility of Inherited Variants Detected with Short-Read Whole Genome Sequencing. Genome Biol. 2022, 23, 2. [Google Scholar] [CrossRef]
- Foox, J.; Tighe, S.W.; Nicolet, C.M.; Zook, J.M.; Byrska-Bishop, M.; Clarke, W.E.; Khayat, M.M.; Mahmoud, M.; Laaguiby, P.K.; Herbert, Z.T.; et al. Performance Assessment of DNA Sequencing Platforms in the ABRF Next-Generation Sequencing Study. Nat. Biotechnol. 2021, 39, 1129–1140. [Google Scholar] [CrossRef]
- Krishnan, V.; Utiramerur, S.; Ng, Z.; Datta, S.; Snyder, M.P.; Ashley, E.A. Benchmarking Workflows to Assess Performance and Suitability of Germline Variant Calling Pipelines in Clinical Diagnostic Assays. BMC Bioinform. 2021, 22, 1–17. [Google Scholar] [CrossRef]
- Lincoln, S.E.; Truty, R.; Lin, C.F.; Zook, J.M.; Paul, J.; Ramey, V.H.; Salit, M.; Rehm, H.L.; Nussbaum, R.L.; Lebo, M.S. A Rigorous Interlaboratory Examination of the Need to Confirm Next-Generation Sequencing–Detected Variants with an Orthogonal Method in Clinical Genetic Testing. J. Mol. Diagn. 2019, 21, 318–329. [Google Scholar] [CrossRef] [PubMed]
- Cornish, A.; Guda, C. A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference. Biomed Res. Int. 2015, 2015, 456479. [Google Scholar] [CrossRef] [PubMed]
- Lin, Y.L.; Chang, P.C.; Hsu, C.; Hung, M.Z.; Chien, Y.H.; Hwu, W.L.; Lai, F.P.; Lee, N.C. Comparison of GATK and DeepVariant by Trio Sequencing. Sci. Rep. 2022, 12, 1809. [Google Scholar] [CrossRef] [PubMed]
- Chen, J.; Li, X.; Zhong, H.; Meng, Y.; Du, H. Systematic Comparison of Germline Variant Calling Pipelines Cross Multiple Next-Generation Sequencers. Sci. Rep. 2019, 9, 9345. [Google Scholar] [CrossRef] [PubMed]
- Krusche, P.; Trigg, L.; Boutros, P.C.; Mason, C.E.; La Vega, F.M.D.; Moore, B.L.; Gonzalez-Porta, M.; Eberle, M.A.; Tezak, Z.; Lababidi, S.; et al. Best Practices for Benchmarking Germline Small-Variant Calls in Human Genomes. Nat. Biotechnol. 2019, 37, 555–560. [Google Scholar] [CrossRef] [PubMed]
- Xiao, W.; Ren, L.; Chen, Z.; Fang, L.T.; Zhao, Y.; Lack, J.; Guan, M.; Zhu, B.; Jaeger, E.; Kerrigan, L.; et al. Toward Best Practice in Cancer Mutation Detection with Whole-Genome and Whole-Exome Sequencing. Nat. Biotechnol. 2021, 39, 1141–1150. [Google Scholar] [CrossRef]
- Lo, C.-C.; Shakya, M.; Connor, R.; Davenport, K.; Flynn, M.; y Gutiérrez, A.M.; Hu, B.; Li, P.-E.; Jackson, E.P.; Xu, Y.; et al. EDGE COVID-19: A Web Platform to Generate Submission-Ready Genomes from SARS-CoV-2 Sequencing Efforts. Bioinformatics 2022, 38, 2700–2704. [Google Scholar] [CrossRef]
- Tracking Resistance and Coronavirus Evolution (TRACE)|National Institutes of Health (NIH). Available online: https://www.nih.gov/research-training/medical-research-initiatives/activ/tracking-resistance-coronavirus-evolution-trace (accessed on 25 August 2023).
- Bentley, D.R.; Balasubramanian, S.; Swerdlow, H.P.; Smith, G.P.; Milton, J.; Brown, C.G.; Hall, K.P.; Evers, D.J.; Barnes, C.L.; Bignell, H.R.; et al. Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry. Nature 2008, 456, 53–59. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Zhao, Y.; Bollas, A.; Wang, Y.; Au, K.F. Nanopore Sequencing Technology, Bioinformatics and Applications. Nat. Biotechnol. 2021, 39, 1348–1365. [Google Scholar] [CrossRef] [PubMed]
- Katz, K.S.; Shutov, O.; Lapoint, R.; Kimelman, M.; Brister, J.R.; O’Sullivan, C. STAT: A Fast, Scalable, MinHash-Based k-Mer Tool to Assess Sequence Read Archive next-Generation Sequence Submissions. Genome Biol. 2021, 22, 270. [Google Scholar] [CrossRef] [PubMed]
- Holmes, J.B.; Moyer, E.; Phan, L.; Maglott, D.; Kattman, B. SPDI: Data Model for Variants and Applications at NCBI. Bioinformatics 2020, 36, 1902–1907. [Google Scholar] [CrossRef] [PubMed]
- Van Poelvoorde, L.A.E.; Delcourt, T.; Coucke, W.; Herman, P.; De Keersmaecker, S.C.J.; Saelens, X.; Roosens, N.H.C.; Vanneste, K. Strategy and Performance Evaluation of Low-Frequency Variant Calling for SARS-CoV-2 Using Targeted Deep Illumina Sequencing. Front. Microbiol. 2021, 12, 3073. [Google Scholar] [CrossRef] [PubMed]
- Turakhia, Y.; Thornlow, B.; Hinrichs, A.; McBroome, J.; Ayala, N.; Ye, C.; Smith, K.; De Maio, N.; Haussler, D.; Lanfear, R.; et al. Pandemic-Scale Phylogenomics Reveals the SARS-CoV-2 Recombination Landscape. Nature 2022, 609, 994–997. [Google Scholar] [CrossRef] [PubMed]
- Gutierrez, B.; Castelán Sánchez, H.G.; da Silva Candido, D.; Jackson, B.; Fleishon, S.; Houzet, R.; Ruis, C.; Delaye, L.; Faria, N.R.; Rambaut, A.; et al. Emergence and Widespread Circulation of a Recombinant SARS-CoV-2 Lineage in North America. Cell Host Microbe 2022, 30, 1112–1123. [Google Scholar] [CrossRef] [PubMed]
- Rockett, R.J.; Draper, J.; Gall, M.; Sim, E.M.; Arnott, A.; Agius, J.E.; Johnson-Mackinnon, J.; Fong, W.; Martinez, E.; Drew, A.P.; et al. Co-Infection with SARS-CoV-2 Omicron and Delta Variants Revealed by Genomic Surveillance. Nat. Commun. 2022, 13, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.; Quinn, J.; Pinsky, B.; Shah, N.H.; Brown, I. Rates of Co-Infection Between SARS-CoV-2 and Other Respiratory Pathogens. JAMA 2020, 323, 2085–2086. [Google Scholar] [CrossRef]
- Mohammed, K.S.; Kibinge, N.; Prins, P.; Agoti, C.N.; Cotten, M.; Nokes, D.J.; Brand, S.; Githinji, G. Evaluating the Performance of Tools Used to Call Minority Variants from Whole Genome Short-Read Data. Wellcome Open Res. 2018, 3. [Google Scholar] [CrossRef]
- Van Poelvoorde, L.A.E.; Delcourt, T.; Vuylsteke, M.; De Keersmaecker, S.C.J.; Thomas, I.; Van Gucht, S.; Saelens, X.; Roosens, N.; Vanneste, K. A General Approach to Identify Low-Frequency Variants within Influenza Samples Collected during Routine Surveillance. Microb. Genom. 2022, 8, mgen000867. [Google Scholar] [CrossRef]
- McCrone, J.T.; Lauring, A.S. Measurements of Intrahost Viral Diversity Are Extremely Sensitive to Systematic Errors in Variant Calling. J. Virol. 2016, 90, 6884. [Google Scholar] [CrossRef] [PubMed]
- Part 11, Electronic Records; Electronic Signatures—Scope and Application|FDA. Available online: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application (accessed on 6 June 2023).
- Donaldson, E.F.; Deming, D.J.; O’Rear, J.J.; Naeger, L.K. Regulatory Evaluation of Antiviral Drug Resistance in the Era of Next-Generation Sequencing. Biomark. Med. 2015, 9, 1047–1051. [Google Scholar] [CrossRef] [PubMed]
- O’Leary, N.A.; Wright, M.W.; Brister, J.R.; Ciufo, S.; Haddad, D.; McVeigh, R.; Rajput, B.; Robbertse, B.; Smith-White, B.; Ako-Adjei, D.; et al. Reference Sequence (RefSeq) Database at NCBI: Current Status, Taxonomic Expansion, and Functional Annotation. Nucleic Acids Res. 2016, 44, D733–D745. [Google Scholar] [CrossRef] [PubMed]
- Sayers, E.W.; Bolton, E.E.; Brister, J.R.; Canese, K.; Chan, J.; Comeau, D.C.; Connor, R.; Funk, K.; Kelly, C.; Kim, S.; et al. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2022, 50, D20–D26. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. Fastp: An Ultra-Fast All-in-One FASTQ Preprocessor. Bioinformatics 2018, 34, i884–i890. [Google Scholar] [CrossRef] [PubMed]
- De Coster, W.; D’Hert, S.; Schultz, D.T.; Cruts, M.; Broeckhoven, C. Van NanoPack: Visualizing and Processing Long-Read Sequencing Data. Bioinformatics 2018, 34, 2666–2669. [Google Scholar] [CrossRef]
- Wood, D.E.; Lu, J.; Langmead, B. Improved Metagenomic Analysis with Kraken 2. Genome Biol. 2019, 20, 1–13. [Google Scholar] [CrossRef]
- Shen, W.; Le, S.; Li, Y.; Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS ONE 2016, 11, e0163962. [Google Scholar] [CrossRef]
- Li, H.; Durbin, R. Fast and Accurate Short Read Alignment with Burrows–Wheeler Transform. Bioinformatics 2009, 25, 1754–1760. [Google Scholar] [CrossRef]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve Years of SAMtools and BCFtools. Gigascience 2021, 10, giab008. [Google Scholar] [CrossRef]
- Van der Auwera, G.; O’Connor, B. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra; O’Reilly Media: Newton, MA, USA, 2020; ISBN 9781491975190. [Google Scholar]
- Wilm, A.; Aw, P.P.K.; Bertrand, D.; Yeo, G.H.T.; Ong, S.H.; Wong, C.H.; Khor, C.C.; Petric, R.; Hibberd, M.L.; Nagarajan, N. LoFreq: A Sequence-Quality Aware, Ultra-Sensitive Variant Caller for Uncovering Cell-Population Heterogeneity from High-Throughput Sequencing Datasets. Nucleic Acids Res. 2012, 40, 11189–11201. [Google Scholar] [CrossRef] [PubMed]
- GitHub—Nanoporetech/Medaka: Sequence Correction Provided by ONT Research. Available online: https://github.com/nanoporetech/medaka (accessed on 3 October 2023).
- Afgan, E.; Nekrutenko, A.; Grüning, B.A.; Blankenberg, D.; Goecks, J.; Schatz, M.C.; Ostrovsky, A.E.; Mahmoud, A.; Lonie, A.J.; Syme, A.; et al. The Galaxy Platform for Accessible, Reproducible and Collaborative Biomedical Analyses: 2022 Update. Nucleic Acids Res. 2022, 50, W345–W351. [Google Scholar] [CrossRef]
- Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; De Argila, J.R.; Baker, D.; Roach, N.; et al. Ready-to-Use Public Infrastructure for Global SARS-CoV-2 Monitoring. Nat. Biotechnol. 2021, 39, 1178–1179. [Google Scholar] [CrossRef] [PubMed]
- O’Connor, B.D.; Yuen, D.; Chung, V.; Duncan, A.G.; Liu, X.K.; Patricia, J.; Paten, B.; Stein, L.; Ferretti, V. The Dockstore: Enabling Modular, Community-Focused Sharing of Docker-Based Genomics Tools and Workflows. F1000Res 2017, 6, 52. [Google Scholar] [CrossRef] [PubMed]
- Maier, W.; Bray, S.; van den Beek, M.; Bouvier, D.; Coraor, N.; Miladi, M.; Singh, B.; De Argila, J.R.; Baker, D.; Roach, N.; et al. Freely Accessible Ready to Use Global Infrastructure for SARS-CoV-2 Monitoring. bioRxiv 2021, preprint. [Google Scholar] [CrossRef]
- Grubaugh, N.D.; Gangavarapu, K.; Quick, J.; Matteson, N.L.; De Jesus, J.G.; Main, B.J.; Tan, A.L.; Paul, L.M.; Brackney, D.E.; Grewal, S.; et al. An Amplicon-Based Sequencing Framework for Accurately Measuring Intrahost Virus Diversity Using PrimalSeq and IVar. Genome Biol. 2019, 20, 8. [Google Scholar] [CrossRef]
- Nickloman; Rowe, W.; Poplawski, R.; Rudd, S.; Rambaut, A.; Bedford, T.; Karthik, W.S.; Simpson, J. Artic-Network/Fieldbioinformatics: 1.2.2 2022. Available online: https://zenodo.org/records/6245266 (accessed on 4 December 2023).
- Li, H.; Durbin, R. Fast and Accurate Long-Read Alignment with Burrows–Wheeler Transform. Bioinformatics 2010, 26, 589–595. [Google Scholar] [CrossRef]
- Garrison, E.; Marth, G. Haplotype-Based Variant Detection from Short-Read Sequencing. arXiv 2012, arXiv:1207.3907. [Google Scholar] [CrossRef]
- Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A Flexible Trimmer for Illumina Sequence Data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
- Gaspar, J.M. NGmerge: Merging Paired-End Reads via Novel Empirically-Derived Models of Sequencing Errors. BMC Bioinform. 2018, 19, 536. [Google Scholar] [CrossRef]
- Li, H. Minimap2: Pairwise Alignment for Nucleotide Sequences. Bioinformatics 2018, 34, 3094–3100. [Google Scholar] [CrossRef]
- Stancu, M.C.; Van Roosmalen, M.J.; Renkens, I.; Nieboer, M.M.; Middelkamp, S.; De Ligt, J.; Pregno, G.; Giachino, D.; Mandrile, G.; Valle-Inclan, J.E.; et al. Mapping and Phasing of Structural Variation in Patient Genomes Using Nanopore Sequencing. Nat. Commun. 2017, 8, 1326. [Google Scholar] [CrossRef]
- Rang, F.J.; Kloosterman, W.P.; de Ridder, J. From Squiggle to Basepair: Computational Approaches for Improving Nanopore Sequencing Read Accuracy. Genome Biol. 2018, 19, 90. [Google Scholar] [CrossRef]
- Martin, M. Cutadapt Removes Adapter Sequences from High-Throughput Sequencing Reads. EMBnet J. 2011, 17, 10–12. [Google Scholar] [CrossRef]
- Kim, D.; Paggi, J.M.; Park, C.; Bennett, C.; Salzberg, S.L. Graph-Based Genome Alignment and Genotyping with HISAT2 and HISAT-Genotype. Nat. Biotechnol. 2019, 37, 907–915. [Google Scholar] [CrossRef]
- McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce Framework for Analyzing next-Generation DNA Sequencing Data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef]
- Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The Variant Call Format and VCFtools. Bioinformatics 2011, 27, 2156. [Google Scholar] [CrossRef]
- Li, H.; Barrett, J. A Statistical Framework for SNP Calling, Mutation Discovery, Association Mapping and Population Genetical Parameter Estimation from Sequencing Data. Bioinformatics 2011, 27, 2987–2993. [Google Scholar] [CrossRef]
- Cock, P.J.A.; Antao, T.; Chang, J.T.; Chapman, B.A.; Cox, C.J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; et al. Biopython: Freely Available Python Tools for Computational Molecular Biology and Bioinformatics. Bioinformatics 2009, 25, 1422–1423. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Connor, R.; Shakya, M.; Yarmosh, D.A.; Maier, W.; Martin, R.; Bradford, R.; Brister, J.R.; Chain, P.S.G.; Copeland, C.A.; di Iulio, J.; et al. Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows. Viruses 2024, 16, 430. https://doi.org/10.3390/v16030430
Connor R, Shakya M, Yarmosh DA, Maier W, Martin R, Bradford R, Brister JR, Chain PSG, Copeland CA, di Iulio J, et al. Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows. Viruses. 2024; 16(3):430. https://doi.org/10.3390/v16030430
Chicago/Turabian StyleConnor, Ryan, Migun Shakya, David A. Yarmosh, Wolfgang Maier, Ross Martin, Rebecca Bradford, J. Rodney Brister, Patrick S. G. Chain, Courtney A. Copeland, Julia di Iulio, and et al. 2024. "Recommendations for Uniform Variant Calling of SARS-CoV-2 Genome Sequence across Bioinformatic Workflows" Viruses 16, no. 3: 430. https://doi.org/10.3390/v16030430