Overview
You can easily assemble SARS-CoV-2 consensus genomes through CZ ID. Here we list steps to upload data to the SARS-CoV-2 consensus genome pipeline, where you can assemble short- (Illumina) and long-read (Nanopore) data. See SARS-CoV-2 Pipeline for details about pipelines used to assemble SARS-CoV-2 genomes.
Upload SARS-CoV-2 Data
Follow the steps listed below to upload SARS-CoV-2 data and view assembled genomes:
- Step 1: Go to Upload Page
- Step 2: Specify Project
- Step 3: Select Analysis Type
- Step 4: Select Sequencing Files
- Step 5: Add Metadata
- Step 6: Review
- Step 7: View Genome Status and Report
Step 1: Go to Upload Page
Log in to CZ ID using your email and password. Once logged in, you will see your name in the upper right-hand corner of the application. You will see a link to the Upload page next to your username.
- Upload Page Link: Click this link to open the Upload page.
- Upload Steps: Upload is divided into three general stages to upload samples ("Samples"), add metadata ("Metadata"), and review the information ("Review"). The current stage will be highlighted in blue.
Step 2: Specify Project
Select or create a project through the Select Project section. The project selection will affect the pipeline version used to run the samples given that the pipeline version for all analysis types is assigned upon project creation (see Pipeline Version for details). Therefore, all samples within a project will run on the same major pipeline version.
- Create New Project: Use this link to create a new project. A dialog box will appear to enter the new project information.
- Project Dropdown Menu: Use the dropdown menu to upload samples to an existing project.
When creating a new project, you will need to add a project name, select if the project will be public within CZ ID or private, and provide a project description. Click the Create Project button to finish creating the new project.
Step 3: Select Analysis Type
Under “Analysis Type”, select SARS-CoV-2 Consensus Genome and specify the sequencing platform. The pipeline supports Illumina and Nanopore data.
If you select Illumina as the sequencing platform, you will be prompted to select a wet-lab protocol. Picking the correct protocol is a critical step because the protocol dictates the primers that will be removed during genome assembly. The pipeline version that will be used to run the samples will be specified here.
- Pipeline Version: Specifies the pipeline version that will be used to run samples.
- Wet-lab Protocol Dropdown Menu: Select a protocol from the dropdown options.
If you are uploading Nanopore data, you will be prompted to specify the wet-lab protocol and Medaka model. The pipeline version that will be used to run the samples will be specified here.
- Pipeline Version: Specifies the pipeline version that will be used to run samples.
- Clear Labs Option: Select Yes or No (default) to specify if sequencing FASTQ files were provided by Clear Labs (files have already gone through some QC, including read filtering based on length and trimming).
- Wet-lab Protocol Dropdown Menu: Select a protocol from the dropdown options.
-
Medaka Model Dropdown Menu: Select a model from the dropdown options. The default model
r941_min_high_g360
can be used if you are unsure of which Medaka model to select.
Notes regarding Medaka model options:
-
-
-
- Medaka models are named to indicate i) the pore type, ii) the sequencing device (MinION or PromethION), iii) the basecaller variant, and iv) the basecaller version ({pore}_{device}_{caller variant}_{caller version}). For example, the model named
r941_min_fast_g303
should be used with data from MinION (or GridION) R9.4.1 flowcells using the fast Guppy basecaller version 3.0.3 (see Medaka model Github page for details). - If you used a version of the Guppy basecaller without a corresponding Medaka model, the Medaka model with the highest version equal to or less than the Guppy basecaller version should be selected.
- Use the flow chart below to choose the correct Medaka model (click here if you can't see the figure below).
- Medaka models are named to indicate i) the pore type, ii) the sequencing device (MinION or PromethION), iii) the basecaller variant, and iv) the basecaller version ({pore}_{device}_{caller variant}_{caller version}). For example, the model named
-
-
If you're having trouble viewing the above diagram, click here.
Pipeline Version
The pipeline version that will be used to run uploaded samples can be seen once you select a project and analysis type (i.e. SARS-CoV-2 Consensus Genome). CZ ID uses a three-level pipeline versioning system where the first number indicates the major pipeline version followed by numbers that specify minor version and patch updates. For example, pipeline v1.2.15 refers to major pipeline version 1, minor pipeline version 2, and patch version 15.
The project’s pipeline version will be automatically assigned upon project creation based on the latest version available for each analysis type. This pipeline version pinning by project helps to ensure that all sample runs within a project are comparable. For example, if your project is pinned to SARS-CoV-2 Consensus Genome pipeline v3.4.18, all new samples uploaded to that project will run on major AMR pipeline v3. This system enables minor pipeline updates to be associated with the same major version while still allowing your results to be comparable.
You will see a Warning Icon ( ) if there is a new major pipeline version available. To use this new pipeline, you must create a new project.
*Note: Projects created before February 08, 2024 may include multiple major pipeline versions.
Step 4: Select Sequencing Files
After specifying the analysis type, scroll down to "Select Files" to upload FASTQ (“.fastq” or “.fq”) or compressed FASTQ (“.fastq.gz” or “.fq.gz”) files directly from your computer or BaseSpace account. Click here if you have FASTA files.
Upload Files from Your Computer
Select Your Computer tab to upload files directly from your computer.
- Your Computer Tab: Use this tab (default) to select sequencing files found in your computer.
- Upload Box: Drag and drop files into the provided box or click the link to use your file browser.
- Sample List: Sequencing files ready for upload will be listed here. Sample names will be based on the sequence filenames (see file requirements).
- Continue Button: After selecting files, use this button to continue to the Upload Metadata section.
If you have sequencing files split over multiple lanes per sample, CZ ID will automatically detect files representing the same sample based on Illumina's naming convention and concatenate these files for you. For example, if you were to upload one paired-end sequence sample split over three lanes, such sample would have six files. In the screenshot below you can see that CZ ID automatically detects that each file is part of the same sample.
Upload Files from BaseSpace
If your Illumina sequencing files are hosted on BaseSpace, you can pull them directly into CZ ID. Select the BaseSpace tab under the Select Files section to access your files. Click the Connect to BaseSpace button to launch the BaseSpace login page.
Use your credentials to log in to BaseSpace and select files for upload.
Once you have selected and reviewed the files you want to process, click the Continue button at the bottom of the screen to continue to the next step (Upload Metadata).
Step 5: Add Metadata
Add the appropriate sample metadata through the Upload Metadata page. There are six required metadata fields, including: Host Organism, Sample Type, Water Control, Nucleotide Type, Collection Date, and Collection Location (see Adding Metadata for details). You can enter metadata manually or upload a metadata file in comma-delimited format (".csv” file extension).
Manual Metadata Entry
Use the “Manual Input” tab (default). Fill in metadata information using the provided fields directly through the web interface. After entering information for all the required fields, click the Continue button to go to the Review section.
- Manual Input Tab: Use this tab (default) when entering metadata directly throught the web interface.
- Metadata Table: Enter information for each column or field. By default, the required fields will be listed on the table. You can add additional columns through the Metadata Dropdown Menu.
- Metadata Dropdown Menu: Click the plus sign to view and add optional metadata fields to the Metadata Table.
Upload Metadata File
Use the "CSV Upload" tab to upload a comma-separated value (CSV) file with metadata. If there are no errors, click the Continue button to go to the Review section.
- CSV Upload Tab: Select this tab to upload a metadata file.
- Metadata Template: Click template link to download a CSV file that will be already populated with sample names and metadata fields. Edit the file to include the appropriate metadata and save it. You are not required to use the provided template.
- Metadata File Upload Box: Use this box to upload the metadata file.
Step 6: Review
Use the Review page to review the project, sample, and analysis information. The "Edit" links by each section can be used to edit project and sample information if you need to correct anything before upload. After reviewing sample and metadata information, please accept CZ ID's Terms of Service and Data Privacy Policy. Press Start Upload to begin the upload process to our server and kick off the analysis pipeline.
After pressing Start Upload, you will see a modal showing the upload progress. DO NOT close the web page while the upload is in progress. Otherwise, the upload will be canceled and you will have to start your upload over.
Wait until you will see an "Uploads completed!" confirmation message confirming that your samples have been uploaded successfully. Once you see the confirmation message you can close your window or press "Go to Project" to navigate to the Project page where you can view sample status and analyze results.
Step 7: View Genome Status and Report
You can see the status of your run by going to the Project Page of interest and selecting the Consensus Genome tab.
- Project Name
- Consensus Genome Tab
- Sample Status: Specifies sample progress. When the run is successfully completed, you will see a "Complete" status highlighted in green.
After the sample run has completed, click on the sample to view the genome report (see example report below). Assess the quality of the genome and/or download data.
Comments
0 comments
Please sign in to leave a comment.