Overview
Here we list steps for uploading data to the mNGS Nanopore pipeline. After uploading data and the pipeline run is completed you can analyze data to identify microbes of interest and download results. Click here to learn about the mNGS Nanopore pipeline.
Upload Data
CZ ID only accommodates one sequence file per sample when uploading Nanopore sequencing data to the pipeline. Multiple FASTQ files for a given sample will be automatically concatenated if filenames follow a certain format (see Automatic Concatenation of Nanopore Files for details). If there are multiple FASTQ files per sample and filenames do not follow the format recognized by the platform, you need to concatenate (or combine) them into a single file before upload. Once your sample files are ready, follow the steps listed below to upload data to the mNGS Nanopore pipeline.
- Step 1: Go to Upload Page
- Step 2: Specify Project
- Step 3: Select Analysis Type
- Step 4: Select Sequencing Files
- Step 5: Add Metadata
- Step 6: Review
Step 1: Go to Upload Page
Log in to CZ ID using your email and password. Once logged in, you will see your name in the upper right-hand corner of the application. You will see a link to the Upload page next to your username.
- Upload Page Link: Click this link to open the Upload page
- Upload Steps: Upload is divided into three general stages to upload samples ("Samples"), add metadata ("Metadata"), and review the information ("Review"). The current stage will be highlighted in blue.
Step 2: Specify Project
Select or create a project through the Select Project section. The project selection will affect the pipeline and database version used to run the samples given that the pipeline version for all analysis types and NCBI Index Date are assigned upon project creation (see Pipeline Version and NCBI Index Date for details). Therefore, all samples within a project will run on the same major pipeline version and use the same NCBI Index Date.
- Create New Project: Use this link to create a new project. A dialog box will appear to enter the new project information.
- Project Dropdown Menu: Use the dropdown menu to upload samples to an existing project.
When creating a new project, you will need to add a project name, select if the project will be public within CZ ID or private, and provide a project description. Click the Create Project button to finish creating the new project.
Step 3: Select Analysis Type
The next step within the Select Samples page is to select the analysis type. Under “Analysis Type”, select Metagenomics from the main list and Nanopore from the sequencing platform options. You will be prompted to select the Guppy basecaller model used to generate the FASTQ files. The Analysis Type box will also specify the Pipeline Version and NCBI Index Date that will be used to run the samples.
- Sequencing Platform Options: You can select from short- (Illumina) or long-read (Nanopore) platforms. Select Nanopore.
- Pipeline Version: Specifies the mNGS Nanopore pipeline version that will be used to run samples.
- NCBI Index Date: Specifies the date nucleotide (NT) and protein (NR) databases used to analyze samples were downloaded from NCBI.
- Guppy Basecaller Options: Select the model used to generate FASTQ files.
Guppy Basecaller
The Guppy basecaller model will determine steps used for assembly based on the expected sequence error rate. The Guppy basecaller options include: fast, high accuracy (hac), and super accuracy (super) models. We highly recommend using the “super” model during basecalling whenever possible. The “super” model has been shown to reduce sequence error rates compared to the other two models. If you don’t have access to the “super” model, the “hac” model can reduce error rates by ~ 2% relative to the fast model (see research article). The “fast” model should be avoided if possible.
Pipeline Version
The pipeline version that will be used to run uploaded samples can be seen once you select a project and an analysis type (i.e. Metagenomics - Nanopore). CZ ID uses a three-level pipeline versioning system where the first number indicates the major pipeline version followed by numbers that specify minor version and patch updates. For example, pipeline v1.2.15 refers to major pipeline version 1, minor pipeline version 2, and patch version 15.
The project’s pipeline version will be automatically assigned upon project creation based on the latest version available for each analysis type. This pipeline version pinning by project helps to ensure that all sample runs within a project are comparable*. For example, if your project is pinned to mNGS Nanopore pipeline v0.7.5, all new samples uploaded to that project will run on major pipeline v0. This system enables minor pipeline updates to be associated with the same major version while still allowing your results to be comparable.
You will see a Warning Icon ( ) if there is a new major pipeline version available. To use this new pipeline, you must create a new project.
*Note: Projects created before February 08, 2024 may include multiple major pipeline versions.
NCBI Index Date
The NCBI Index that will be used to process uploaded samples can be seen once you select a project and analysis type (i.e. Metagenomics - Nanopore). The NCBI Index Date indicates the date the NCBI NT and NR databases were downloaded for use by CZ ID. This index date can be used to find associated GenBank release numbers (see GenBank release notes). The downloaded databases are then compressed and indexed by CZ ID. Newer versions of the index will have the most up to date taxon information from NCBI.
The project’s NCBI Index Date will be automatically assigned upon project creation based on the latest version available. Each project is pinned to one NCBI Index Date to ensure that all sample runs within a project are comparable*. For example, if your project is pinned to NCBI Index Date 2021-01-22, all new samples uploaded to that project will run on Index 2021-01-22.
You will see a Warning Icon ( ) if there is a new NCBI Index available. To use this new Index, you must create a new project.
*Note: Projects created before February 08, 2024 may include multiple NCBI Index Dates.
Step 4: Select Sequencing Files
After specifying the analysis type, scroll down to the "Select Files" section to upload FASTQ (“.fastq” or “.fq”) or compressed FASTQ (“.fastq.gz” or “.fq.gz”) files directly from your computer. Click here if you have FASTA files.
- Upload Box: Drag and drop files into the provided box or click the link to use your file browser.
- Sample List: Sample sequencing files ready for upload will be listed here. Sample names will be based on the sequence filenames (see note regarding filenames below).
- Continue Button: After selecting files, use this button to continue to the Upload Metadata section.
Notes regarding sequencing files:
-
-
- If there are multiple FASTQ files per sample that do not follow the format recognized by the platform (see Automatic Concatenation of Nanopore Files for details), make sure to concatenate sequencing files into a single file prior to uploading sequences.
- Filenames must be no longer than 120 characters.
- Filenames can only contain letters from the English alphabet (A-Z, upper and lower case), numbers (0-9), periods (.), hyphens (-) and underscores (_). Spaces are not allowed.
-
Step 5: Add Metadata
Add the appropriate sample metadata through the Upload Metadata page. There are six required metadata fields, including: Host Organism, Sample Type, Water Control, Nucleotide Type, Collection Date, and Collection Location (see Adding Metadata for details). You can enter metadata manually or upload a metadata file in comma-delimited format (".csv” file extension).
Manual Metadata Entry
Use the “Manual Input” tab (default). Fill in metadata information using the provided fields directly through the web interface. After entering information for all the required fields, click the Continue button to go to the Review section.
- Manual Input Tab: Use this tab (default) when entering metadata directly throught the web interface.
- Metadata Table: Enter information for each column or field. By default, the required fields will be listed on the table. You can add additional columns through the Metadata Dropdown Menu.
- Metadata Dropdown Menu: Click the plus sign to view and add optional metadata fields to the Metadata Table.
Upload Metadata File
Prepare a metadata file locally on your computer by downloading a metadata template or by copying and pasting required metadata fields into a file. Save your metadata as a “comma-delimited” file and upload the file under the “CSV Upload” tab. If there are no errors, click the Continue button to go to the Review section.
- CSV Upload Tab: Select this tab to upload a metadata file.
- Metadata Template: Click template link to download a CSV file that will be already populated with sample names and metadata fields. Edit the file to include the appropriate metadata and save it. You are not required to use the provided template.
- Metadata File Upload Box: Use this box to upload the metadata file.
Step 6: Review
Use the Review page to review the project, sample, and analysis information. The "Edit" links by each section can be used to edit project and sample information if you need to correct anything before upload. After reviewing sample and metadata information, please accept CZ ID's Terms of Service and Data Privacy Policy. Press Start Upload to begin the upload process to our server and kick off the analysis pipeline.
Note regarding host filtering:
"Host Subtraction" information is located below the table listing samples to be uploaded. This information tells you how your selection of host organism will affect the pipeline, specifically the host subtraction step. If CZ ID has the genome of the host organism, the pipeline will subtract out reads aligning to that genome. Regardless of your choice of host, the pipeline will always remove ERCCs and reads aligning to the Human genome (reference: HG38 and T2T-CHM13 assemblies). If CZ ID does not have the genome that matches to your host organism, you can request it by following the instructions in our FAQs.
After pressing Start Upload, you will see a modal showing the upload progress. DO NOT close the web page while the upload is in progress. Otherwise, the upload will be canceled and you will have to start your upload over.
Wait until you will see an "Uploads completed!" confirmation message confirming that your samples have been uploaded successfully. Once you see the confirmation message you can close your window or press "Go to Project" to navigate to the Project page where you can view sample status and analyze results.
Comments
0 comments
Please sign in to leave a comment.