The total number of rows in a report can be overwhelming. This is because CZ ID defaults to showing you everything that has been discovered in your sample. Metagenomic next-generation sequencing is highly sensitive, picking up low levels of sequences associated with environmental contaminants that are often not relevant when looking for infecting microbes. Luckily, CZ ID's filtering functionality makes it easier to digest and understand the results.
Our first step is to filter out low-confidence reads. These are spurious hits that didn’t match the right taxa or are not abundant enough to have an impact.
At the top of the report table, there are a series of filters you can apply. We will start by focusing on the Categories and Threshold Filters.
The Categories filter gives you the ability to select which taxon categories (superkingdoms) you would like to view. For this sample, we see that most of the top hits are Eukaryotic. Let’s apply a filter to only view taxa in that Category.
When you select a filter a blue tag will show up under the filters. You can remove the filter at any point by clicking on the X to the right of the tag.
Threshold Filters are a bit more complicated. They give you the ability to add many different filters to the report page at once (these filters are applied using AND logic). You can add or remove multiple thresholds using the dropdown menu.
For the Patient 008 case, we want to remove some of the low-quality hits that we don’t trust. For this sample, we will apply the following thresholds: NT rPM >= 10 and NT L (alignment length) >=35. Once you have specified your filters, select Apply. The report will automatically update and your filters will appear above the table.
With these filters, CZ ID surfaced the correct microbe to the top: Taenia Solium was confirmed to be the infecting agent in this patient. We will continue to explore the results below.
Reduce Noise in Your Report
Simple ways to reduce noise in your report:
Add a threshold filter of NT L alignment length > 50bp. Short alignments (NT < 36 bp, NR < 10) are filtered out upstream, largely reducing false positives, but depending on read length, a filter of 50 bp can remove additional false-positives.
Review each category (Virus, Bacteria, Eukaryote) separately to find taxa of interest.
Add a threshold filter of NT rPM > 10 . Note that some viral pathogens may be present at low levels - you may choose to use a lower threshold (ie NT rPM > 1) for viruses.
Create a background model of negative controls of the same sample type and host to improve the relevance of the z-score to your dataset.
Concordance on both NT and NR renders a hit more believable, therefore requiring NT r > 0 and NR r > 0 reduces many spurious hits.
Check out the coverage visualization on the species level to see if there is adequate coverage to be believable.