Jump to Section:
The Coverage Visualization can be used to assess the strength of the evidence for the presence of a particular microbe. Coverage visualizations are used to show the range and uniformity of sequencing coverage for an accession identified in the sample. Coverage can be evaluated by both depth and breadth. Depth of coverage indicates the average read depth across the length of the accession, while the coverage breadth represents the percent of the reference accession that is covered by at least one read or contig. A microbe that has high coverage depth or coverage at multiple loci across the reference sequence (high coverage breadth) is likely to have been present in reality. Conversely, a microbe with low coverage of only a small portion of the genome may be a result of genomic similarity at the 16S rRNA region. Notably, since viral genomes are smaller than bacteria and eukaryotic genomes, viruses often have higher coverage of their genomes.
Viewing the Coverage Visualization
Select the coverage visualization icon next to Taenia solium. The coverage map will pop up at the bottom of the screen.
You can see that there is solid coverage over certain areas of the genome. We can see that the contigs are not isolated to one part of the genome, meaning there is a high likelihood this is a true hit. This might be enough information for you to be satisfied with your hit.
The dark blue coverage represents assembled contigs. The light blue coveragerepresents individual reads.
Interpreting the Visualization
High Coverage of Viral Genome
Here, we show high coverage of a complete viral genome. Given the smaller size of viral genomes (as compared to bacterial and fungal pathogens), it is common to see relatively high coverage. Here, we are confident that this virus was present in the sample. We may not always obtain full viral genomes, but coverage across multiple regions of the genome indicate confidence in the hit.
High Confidence Bacterial Hit
This shows coverage of a bacterial microbe (obtained from a cultured isolate with high coverage). The distributed coverage across the length of the genome accession provides higher confidence that this taxon was truly present in the sample and is not an artifact of alignment to conserved bacterial genomic regions.
Low Confidence Bacterial Hit
Finally, here is a low-confidence bacterial hit with relatively few reads, isolated to a small region of the genome. These reads may be aligned to rRNA genes that are conserved across numerous taxa.
The coverage visualization also helps you determine which NCBI accession was the closest match to the taxa in your sample. To learn more, we will explore the coverage visualization in the Patient 008 sample.
The Coverage Visualization also shows the alignment of reads to a particular NCBI accession. To see the multiple accessions identified for that taxonomy ID click the downward facing arrow to the right of the title.
All accessions have the same taxonomy ID but have different genomes (for example, they each represent different strains belonging to the same species). We sort accessions in the coverage visualization by using the following formula:
Max contig length + Total contig length + Number of reads
This prioritizes accessions with longer contigs.
Metrics and Their Meanings
If you hover over a term in the Coverage Visualization you will see tooltip descriptions with more information.
|Reference NCBI Entry||The NCBI GenBank entry for the reference accession|
|Reference Length||Length in base pairs of the reference accession|
|Aligned Contigs||Total number of contigs where this accession was the best match|
|Aligned Loose Reads||The number of reads which this accession was the best match (only includes reads not assembled into contigs).|
|Coverage Depth||The average read depth of aligned contigs and reads over the length of the accession|
|Coverage Breadth||The percentage of the accession that is covered by contigs and reads|
|Max Alignment Length||Length of the longest aligned region over all reads and contigs|
|Average Mismatched %||Percentage of aligned regions that are mismatches, averaged over all reads and contigs.|
Note: 16S rRNA genes are often highly conserved within genera. Therefore, they can be a source of error in CZ ID hit-calling.
If you identify a taxon with lower coverage across a greater breadth of the genome, you may conclude that the species was likely to be present in the sample.
If you identify a taxon with high coverage of only a small portion of the genome (potentially the 16S rRNA region), we advise you to BLAST the contig to identify whether it truly belongs to the 16S region. If it does, then you may consider whether there are other species within that genus with greater breadth of coverage across their entire genome. If another species within the genus has greater breadth of coverage and a similar number of reads, it is possible that the reads assigned to the 16S region of the other species came from another species within that genus.