Jump to Section:
Overview
The Coverage Visualization available through the sample report can be used to gain confidence in the detection of a particular microbe. Coverage visualizations are used to show the range and uniformity of sequencing coverage across a reference sequence (or accession) for a taxon identified in the sample. Coverage can be evaluated by both depth and breadth. Depth of coverage refers to the average read depth for each base pair of the accession, while the coverage breadth represents the percent of the reference accession length that is covered by at least one read or contig.
You can use coverage information to evaluate matches to taxa listed in the sample report. A taxon that has high coverage depth or coverage at multiple loci across the reference sequence (high coverage breadth) is likely to be a true positive. Conversely, a taxon with reads mapping to a small, localized region of the reference accession (low coverage) may be the result of similarity within conserved genomic regions (e.g., 16S rRNA). The coverage is also affected by the type of microbe due to variations in genome size. Notably, since viral genomes are smaller than bacterial and eukaryotic genomes, viruses often have higher coverage breadth of their genomes.
Viewing the Coverage Visualization
To view the coverage visualization, hover over the taxon name of interest which will display a set of analysis icons. Select the coverage visualization icon () and the Coverage Visualization Panel will pop up at the bottom of the screen.
The schematic lines under the coverage plot represent the reference accession length (grey), contigs mapping to the reference accession (dark blue) and unassembled (or loose) reads that mapped to the accession (light blue). By clicking the blue lines, you can download or copy contigs mapping to regions of interest in FASTA format.
Coverage Visualization Metrics
If you hover over a term in the Coverage Visualization panel you will see tooltip descriptions with more information. Here is a summary of the provided metrics.
Reference NCBI Entry | The NCBI GenBank entry for the reference accession |
Reference Length | Length in base pairs of the reference accession |
Aligned Contigs | Total number of contigs for which this accession was the best match |
Aligned Loose Reads | The number of unassembled reads for which this accession was the best match (only includes reads not assembled into contigs). |
Coverage Depth | The average read depth of aligned contigs and reads over the length of the accession |
Coverage Breadth | The percentage of the accession that is covered by contigs and reads |
Max Alignment Length | Length of the longest aligned region over all reads and contigs |
Average Mismatched % | Percentage of aligned regions that are mismatches, averaged over all reads and contigs. |
Interpreting the Coverage Visualization
16S rRNA genes are often highly conserved within genera. Be careful when you see low coverage limited to a genomic region (i.e., low coverage breadth). Low coverage breadth due to matches to highly conserved, non-coding regions can be a source of error in CZ ID species calls. In general:
- If you identify a taxon with low coverage depth but relatively high coverage breadth, you may conclude that the species was likely to be present in the sample.
- If you identify a taxon with high coverage depth in a small portion of the genome (potentially the 16S rRNA region) but low coverage breadth, we recommend to BLAST the contig to verify whether it truly belongs to the 16S region. If it does, then you may consider whether there are other species within that genus with greater coverage breadth. If another species within the genus has greater coverage breadth and a similar number of reads, it is possible that the reads assigned to the 16S region of the other species came from another species within that genus.
Below we walk through coverage visualization examples.
High Coverage of Viral Genome
Here, we show high coverage of a complete viral genome. Given the smaller size of viral genomes (as compared to bacterial and fungal pathogens), it is common to see relatively high coverage. Based on the coverage, we are confident that this virus was present in the sample. We may not always obtain full viral genomes, but coverage across multiple regions of the genome provide confidence in the hit.
High Confidence Bacterial Hit
The visualization shows high coverage of a bacterial genome accession. The distributed coverage across the length of the accession provides higher confidence that this taxon was truly present in the sample and is not an artifact of alignment against conserved bacterial genomic regions.
Low Confidence Bacterial Hit
Finally, here is a low-confidence bacterial hit with relatively few reads, isolated to a small region of the genome. Note that the accession only represents a small segment of the genome (~3.5 kb) and the maximum alignment length is only 214. These reads may be aligned to rRNA genes that are conserved across numerous taxa (use BLAST to confirm).
Accession Numbers
The Coverage Visualization panel also shows the alignment of reads against specific NCBI accessions. To see the NCBI accessions identified for a given taxonomy ID, click the downward chevron next to the accession title. Select any accession in the dropdown menu to view its coverage.
The Coverage Visualization panel will show coverage for up to 10 accessions representing a given taxonomy ID. All accessions represent unique isolates or sequences that have the same taxonomy ID. The accession information will help you determine which isolate was the closest match to the taxa in your sample. To prioritize accessions with longer contigs, CZ ID sorts accessions in the coverage visualization by using the following formula:
Max contig length + Total contig length + Number of reads
You can use the accession information to see if reads mapped to multiple genomic regions across different accessions. In the example above, we see the coverage plot for a Taenia solium accession representing a partial mitochondrial genome sequence. By looking at other accessions in the coverage visualization panel, we learn that there are reads matching multiple regions (e.g., 18S and coding regions) which give us more confidence that T. solium is a true positive.
Comments
0 comments
Please sign in to leave a comment.