Jump to Section:
Overview
Interested in analyzing next-generation sequencing (NGS) data? Wondering if the CZ ID platform can be useful for you and your team? Here we describe the main goals and features of CZ ID.
What is CZ ID?
CZ ID is a free, no-code, and user-friendly platform for analyzing NGS data to detect infectious disease agents. This open source, cloud-based platform integrates bioinformatic tools that enable any lab to quickly identify microbial agents in their NGS datasets regardless of available computational resources. Analyses through CZ ID can be tailored towards pathogen detection and/or surveillance, microbiome composition characterization, and outbreak investigations.
CZ ID is possible due to open-source bioinformatic software that can be integrated into a single platform. Our team is grateful to all the open-source software developers whose tools are foundational to CZ ID and our shared vision for open science.
What type of analyses can be performed through CZ ID?
The CZ ID platform includes modules that will allow you to:
- Analyze metagenomic NGS (mNGS) data to identify microbes of interest (e.g., pathogens) in your samples. The mNGS module will enable you to easily explore complex metagenomic datasets through interactive outputs (see examples).
- Identify bacterial antimicrobial resistance (AMR) genes from metagenomic or whole genome sequence datasets. The AMR module will enable you to quickly identify AMR genes in your data through an interactive sample report (see example report).
- Assemble viral consensus genomes, including a SARS-CoV-2 specialized workflow, and explore genome coverage (see example coverage plot). The SARS-CoV-2 module links to Nextclade where you can easily view genome quality and phylogenetic placement.
Type of input data processed by CZ ID:
Module | Input Sequencing Data | Supported Data |
Metagenomics (mNGS) |
Shotgun (or random) data Note: Module is NOT meant to process amplicon data (e.g., 16S, 18S, ITS) |
Illumina & Nanopore |
Antimicrobial Resistance (AMR) |
Shotgun (or random) and whole genome sequence data |
Illumina |
SARS-CoV-2 |
Metagenomic sequencing with spiked primer enrichment (MSSPE) and PCR-based data (e.g., ARTIC v3 protocol) |
Illumina & Nanopore |
Viral Consensus Genomes |
Target enrichment (e.g., MSSPE), PCR, whole genome, or metagenomic sequencing data |
Illumina |
What is the general workflow for analyzing NGS data?
The main goal of CZ ID is to provide accessible and fast NGS data analyses that can quickly inform public health decision making. Therefore, CZ ID modules were designed to enable users to analyze NGS data in a matter of hours. After uploading data to a secure platform, users can view results and easily keep track of their samples through a user-friendly interface. All reports, data, and intermediate files produced throughout the pipelines can be downloaded.
Workflow overviews for CZ ID modules. This free, plug-and-play platform automatically generates results after uploading NGS data for metagenome analysis, antimicrobial resistance gene detection, and viral genome assembly.
With just a few clicks, users can explore and analyze results through interactive tables, genome coverage visualizations, and heatmaps (see examples below). Users can also build phylogenetic trees to compare taxa of interest across mNGS samples.
Example of mNGS sample report table:
- High-scoring Taxa: CZ ID implements ranking scores based on a given taxon's abundance in samples relative to controls and matches to both nucleotide (NT) and protein (NR) NCBI databases. High-scoring taxa are automatically highlighted in blue.
- Background Model: Create background models to calculate metrics that account for contamination (Score, Z-score).
- Filter Options: Filter the report table based on categories (e.g., bacteria, viruses) or metric thresholds (e.g., rPM 100) to focus on relevant results.
- Metrics per Taxa: Reported metrics for matches found in the NT and NR databases including: Score, Z-score, reads per million (rPM), # of reads (r), # of contigs, # of reads assembled into contigs (contig r), average % identity of alignments (% id), average length of alignments (L), and E-value.
- Annotation Tag: Use tags to note if identified taxa are a "hit", "no hit", or "inconclusive".
- Details at the Species Level: By default, results are shown at the genus level. However, results at the species level can be viewed by clicking on the downward arrow located by genus names.
- Analysis Icons: Use these icons to view coverage, run BLAST, build trees, generate viral consensus genomes or download sequences associated with the taxon.
- Known Pathogen Flag: Flags highlight pathogenic taxa.
Heatmap example:
- Filters and View Options: Use filters and view options to customize heatmap based on categories (e.g., bacteria, viruses), metric thresholds (e.g., rPM > 100), taxonomic levels, and/or scale of interest.
- Pin Samples, Add Metadata, and/or Select Specific Taxa: Use these dropdown menus to further customize heatmap.
- Color Scale: Heatmap scale reflects relative abundance (rPM) by default.
Example of AMR sample report:
- Gene Information: Information is based on the antibiotic resistance ontology (ARO).
- Metrics: Metrics for contig and read alignments against CARD.
- Add Dropdown Menu: Use this menu to customize the table view by adding or removing columns.
- Filter Options: Use filters to narrow down results based on thresholds (e.g., minimum number of contigs matching AMR genes) and/or drug class of interest.
- Download Icon: Use icon by a gene of interest to download reads and/or contigs associated with the gene.
Example of SARS-CoV-2 consensus genome results:
- Assembly Metrics
- Coverage Stats
- Coverage Visualization
What are the main advantages of CZ ID?
Anyone can use CZ ID to streamline analysis of NGS data and generate data visualizations with just a few clicks. The tool integration offered by CZ ID has several advantages for users, including:
- No need to install multiple software and write code for analyzing data.
- No need to transfer or reformat data between multiple NGS analysis software (e.g., software used for host contamination removal, read quality control, read mapping, and sequence assembly).
- No need to worry about setting up an individual cloud-based solution for storing data.
The user-friendly interface allows you to:
- Easily share data and analyses across your team through a centralized platform.
- Download results and any intermediate data files produced throughout the pipeline.
- Explore and compare against other NGS datasets that are public on CZ ID.
- Keep track of software used throughout the pipeline (e.g., tool and code used for each step).
What about data privacy and ownership? Will your data be private?
Yes! Your data is always yours and, if you want to share it, you decide when you want it to be shared. Here are important things to note:
- You always own and control your data. Any research you create with your data is yours and only yours. CZ ID does not own it, nor its associated partners, and will never sell it.
- Uploaded raw sequence data is not shared with any other CZ ID user and will never be publicly released on the platform. CZ ID staff will only access data when specifically requested by a user (e.g., debugging requests).
- Sample metadata is shared with technical partners (Chan Zuckerberg Initiative, LLC; CZI) and Service Providers (e.g., AWS) that help operate and secure CZ ID. CZI and Service Providers are limited by our Privacy policy and will not use any data for any purpose beyond operating and securing CZ ID.
- CZ ID does not share any human sequence data (human data is filtered out during data pre-processing).
- You can delete your data from CZ ID at any time.
- See CZ ID Privacy policy and Terms of use for details.
Have questions or concerns about CZ ID?
Please reach out to our team by sending an email to help@czid.org.
Comments
0 comments
Please sign in to leave a comment.