Jump to Section:
Overview
Here we list steps to download results and data generated after running samples through the mNGS Nanopore pipeline. We also describe how to navigate to the pipeline visualization where you can see details regarding the analysis workflow and download intermediate files.
After reading this guide, you will be able to:
Learn about:
Learn about:
- Download reads and contigs representing taxa of interest
- Download raw and intermediate files
- Navigate to the pipeline visualization
Downloading Data for a Single Sample (Sample Report)
You can easily download data for a sample of interest. To do this, navigate to the Sample Report page for the sample of interest. Click the Download button on the right-hand side of the page and select the file of interest from the Download dropdown menu (see available files).
Files available to download from the Sample Report page
Report Name | File type | Description |
Report Table | CSV | Report summarizing all detected taxa and associated QC metrics (e.g., bPM, number of contigs, % identity) |
Report Table with applied filters |
CSV | Report Table for remaining taxa after applying filters |
Non-host reads | FASTA | Quality-filtered reads after subtracting host and human data |
Non-host contigs | FASTA | Contig sequences after assembling non-host reads |
Non-host contig summary | CSV | Report summarizing individual contig information, their matching taxon, and associated QC metrics (e.g., read count and bases per contig, contig length, coverage) |
Unmapped reads | FASTA | All reads that did not match sequences in the NCBI NT and NR databases, including those assembled into contigs that did not have matches |
Downloading Data for One or Multiple Samples (Bulk Download)
You can download reports for one or multiple samples from the Project page. Please note that files available through the Downloads page will be deleted after 7 days of creating the download.
To download data from the Project page, select the sample(s) for download and click the Download icon () on the right-hand side of the page.
A dialog box will appear where you can select the file of interest. Select the file and click “Start Generating Download”.
Below we describe report and sequence data files available for download. Note that the file options differ from those available for download for a single sample through the Sample Report page.
Files available to download through the Samples page for a project of interest
Category | Filename | File type | Description |
Reports | Sample Metadata | Multiple report files (“.csv”) compressed as “tar.gz” | Includes metadata for selected samples. |
Sample Taxon Reports | Multiple report files (“.csv”) compressed as “tar.gz” | Individual reports for each selected sample summarizing all detected taxa and associated QC metrics (e.g., bPM, number of contigs, % identity). | |
Combined Sample Taxon Reports | Report file (“.csv”) compressed as “tar.gz” | Values for metric of choice (e.g., NT bPM, NT b) at the species level for all taxa identified in selected samples. | |
Sample Overview | Report file (“.csv”) compressed as “tar.gz” | QC metrics (e.g., percent of reads and bases passing QC) and other summary statistics for selected samples. | |
Contig Summary Reports | Multiple report files (“.csv”) compressed as “tar.gz” | Reports summarizing individual contig information, their matching taxon, and associated QC metrics (e.g., read count and bases per contig, contig length, coverage) for selected samples. | |
Sequence Data | Original Input Files | Multiple FASTQ sequence files compressed as “tar.gz” | If you originally uploaded selected samples, you can download the raw sequence files. |
Reads (Non-host) | Multiple FASTQ or FASTA sequence files compressed as “tar.gz” |
Quality-controlled reads for selected samples after subtracting host and human data. You can choose to download all the reads or reads associated with a taxon of interest. |
|
Contigs (Non-host) | Multiple FASTA sequence files compressed as “tar.gz” |
Contig sequences for selected samples after assembling non-host reads. You can choose to download all the contigs or contigs associated with a taxon of interest. |
The report files contain detailed information regarding sample quality and metrics for identified taxa. Below we describe results included within the Sample Overview, Sample Taxon Report, and Contig Summary Report.
Sample Overview Report
Category | Field | Description |
Pipeline Information | Sample_name | Name of the sample |
Uploader | Name of the person who uploaded the sample to CZ ID | |
Upload_date | Date the sample was uploaded to CZ ID | |
Overall_job_status | Complete or failed | |
Runtime_seconds | Total CZ ID pipeline runtime reported in seconds | |
Sample QC Metrics | Total_reads | Total number of reads |
Total_bases | Total number of bases | |
Passed_filters | Number of reads remaining after applying QC filters, subtracting host and human data, and subsampling to 1 million reads | |
Passed_filters_percent | Percentage of reads remaining after applying QC filters, subtracting host and human data, and subsampling to 1 million reads | |
Subsampled_fraction_bases | Ratio of subsampled bases (up to 1 million reads) to total bases passing QC and host and human data filtration steps | |
Bases_after_quality_filter_percent | Percentage of bases that passed QC filtering thresholds implemented through fastp | |
Bases_after_quality_filter | Number of bases that passed QC filtering thresholds implemented through fastp | |
Bases_after_minimap2_host_filtering | Number of bases remaining after subtracting host and human sequences | |
Sample Metadata | Host_organism | Host organism from which the sample was collected |
Notes | Sample notes made through the web interface (e.g., annotations) | |
Sample_type | Type of sample (e.g., environmental swab, water control, tissue) | |
Nucleotide_type | RNA or DNA | |
Collection_location | Location where the sample was originally collected | |
Optional Metadata | Collection_date | Month and year the sample was originally collected |
Water_control | Is the sample a water control? (Yes or No) | |
Other metadata | All other optional metadata (e.g., isolate, host age, library prep) uploaded with selected samples |
Sample Taxon Report
*Note that values against NR only reflect alignments between contigs and their matching taxon (i.e., unassembled reads are not aligned against NR).
Category | Field | Description |
Taxon Information | Tax_id |
NCBI taxon ID (sequences not assigned to a specific taxonomic classification will have negative taxon IDs) |
Tax_level | Taxonomic level (e.g., species, genus) | |
Genus_tax_id |
NCBI taxon ID for a given genus (sequences not assigned to a specific genus will have negative IDs) |
|
Name | Organism species or genus name | |
Common_name | Common name for a given organism | |
Category | Type of organism or agent (e.g., bacteria, fungi, virus, plasmid) | |
Is_phage | Is this a virus that infects bacteria? (True or False) | |
Match Metrics | nt_bpm, nr_bpm* | Bases per million (bpm): Number of bases within reads aligning to a given taxon, including those assembled into contigs that mapped to the taxon, in the NT or NR database per million bases sequenced. |
nt_count, nr_count* | Number of reads that match a given taxon in the NT or NR database, including reads that assembled into contigs that match the taxon. | |
nt_percent_identity, nr_percent_identity* | Average percent identity between all the query sequences (contigs and unassembled reads) and their matching taxon in the NT or NR database. | |
nt_alignment_length, nr_alignment_length* | Average length of local alignments between all the query sequences (contigs and unassembled reads) and their matching taxon in the NT or NR database. Note that values against NR are reported in base pairs and only reflect local alignments between contig sequences and their matching taxon (unassembled reads are not aligned against NR). | |
nt_e_value, nr_e_value* | Average expect value (E-value) of alignments to the NT or NR database. |
Contig Summary Report
Category | Field | Description |
Contig Information | Contig_name | Name of the contig |
Read_count | Number of reads that assembled into the contig | |
Base_count | Number of bases within all the reads that assembled into the contig | |
Taxon Information | NT.rank_taxid, NR.rank_taxid | NCBI taxonomic IDs for a given taxon in the NT or NR database, including IDs for various taxonomy ranks (species, genus, family, order, class, phylum, kingdom, superkingdom). Missing taxonomic IDs will get negative values (e.g., no species = -100, no genus = -200) |
NT.Accession, NR.Accession | GenBank accession number for a given match in the NT or NR database | |
Contig Match Metrics |
NT.Percentage identity, NR.Percentage identity |
Percent identity between contig sequence and its top match in the NT or NR database |
NT.Alignment Length, NR.Alignment Length |
Length of local alignments between contig and its top match in the NT or NR database. Note that alignments against the NR database are reported in base pairs. | |
NR.Number of mismatches, NR.Number of mismatches |
Number of mismatches between contig and its top match in the NT or NR database within aligned regions | |
NT.E-value, NR.E-value | Expect value (E-value) for alignment between contig and its top match in the NT or NR database | |
NT.Bitscore, NR.Bitscore | Bit-score for alignment between contig and its top match in the NT or NR database. |
Downloading Reads and Contigs Associated with Taxa
You can easily download read and contig sequences associated with taxa of interest. To download sequences:
1) Go to the Sample Report page and hover over a taxon of interest. This will enable the Download icon. You can download sequences associated with a genus or a specific species.
2) Click “Contigs (.fasta)” or “Reads (.fasta)” from the Download dropdown menu to download sequences associated with the taxon.
Downloading Raw and Intermediate Files
You can download data files produced throughout the pipeline. You can find intermediate files through the Sample Report or Pipeline Visualization pages.
To view intermediate files through the Sample Report page:
1) Navigate to the Sample Report page for the sample of interest.
2) Click the Download button on the right-hand side of the page.
3) Select “View Results Folder” from the Download dropdown menu.
4) Once on the Results Folder page, scroll and/or search for the file of interest. Click on the file of interest to download.
Navigating to the Pipeline Visualization
You can view details regarding steps implemented for analyzing a given sample through the mNGS Nanopore pipeline visualization. You can also use the visualization to find and download intermediate files of interest.
To view the pipeline visualization:
1) Navigate to the Sample Report page for the sample of interest.
2) Click the Download button on the right-hand side of the page.
3) Select “View Pipeline Visualization” from the Download dropdown menu.
4) Click the “ONT mNGS Pipeline” button on the Pipeline Visualization page to open visualization.
5) Click on a step of interest from the pipeline visualization. A panel will open on the right-hand side of the page with information about the step as well as input and output files. Simply click on a file of interest to download.
Comments
0 comments
Please sign in to leave a comment.