Overview
You can generate consensus genomes for virus species with high abundance metrics (e.g., rPM) directly from the Sample Report page. This is useful when trying to obtain genomes or improve coverage for identified viruses. You may notice that reads aligning to a given species match more than one virus reference sequence. By generating consensus genomes, you can assemble all reads against a single reference genome listed in the report. Therefore, the consensus genome pipeline may improve coverage because reads are not split among multiple reference genomes. Additionally, all the quality- and host-filtered reads will be used for consensus genome generation (no subsampling). This contrasts with the mNGS pipeline, where reads are subsampled prior to identification of taxa.
This consensus genome pipeline is a downstream application of the mNGS module. Therefore, it is NOT intended for data obtained through metagenomic sequencing with spiked primer enrichment (MSSPE) or PCR-based assays. This pipeline should be used for data generated through shotgun (or random) sequencing because it does not include a primer trimming step. If you are working with data obtained with MSSPE or PCR-based assays and/or would like to use your own reference sequence, upload data to the Viral Consensus Genome pipeline. If you are working with SARS-CoV-2 data obtained through MSSPE or PCR-based assays, please upload your data to our SARS-CoV-2 pipeline.
After reading this guide, you will be able to:
- Assemble consensus genomes for viruses listed in the Sample Report
- View coverage for generated genomes
Building a viral consensus genome from Sample Report table
You can use the “Consensus Genome” feature available through the Sample Report table to assemble consensus genomes for abundant viral species in your data. This feature is only available for viral species with at least one contig aligning to the NCBI nucleotide (NT) database.
To build a viral consensus genome:
1. Find the viral species of interest in the Sample Report Table.
2. Hover over the species name to view available analysis icons and click the Consensus Genome icon (). This icon will only be active for species eligible for consensus genome creation (i.e., those with at least one contig aligning to the NCBI NT database).
3. A dialog box will appear for you to choose a reference sequence. After choosing the reference accession from the dropdown menu, click “Create Consensus Genome”.
Reference Accession: Available reference sequences for consensus genome generation are the same as those shown in the coverage visualization. These reference accessions are collected through the mNGS pipeline when reads have a minimum of 36 bp aligning to the taxon and one contig aligning to the NCBI NT database. Not all available reference accessions will yield the same quality consensus genome. Therefore, keep the following criteria in mind when selecting reference sequences:
- Coverage visualization - The best reference for consensus genome assembly will likely be the accession with the highest coverage (depth and breadth of coverage).
- Percent identity (% ID) - Generally, the higher the % identity between the reference and reads, the better the resulting consensus genome.
- Accession length - NCBI databases contain complete and partial genomes. When choosing a reference genome, it is essential to select a complete genome and not just a partial sequence.
- Current practices - It can also help do a literature search to determine if there is a specific reference genome that is typically used. One caveat is that only the accessions in the coverage plots will be available to choose for a reference genome.
4. Navigate to the Consensus Genome tab to view the progress of genome assembly.
- Sample Name
- Consensus Genome Tab
5. Once the pipeline run is completed, you will see the genome coverage and assembly metrics. You can assess the quality of the assembled genome and download files.
Comments
0 comments
Please sign in to leave a comment.