Jump to Section:
Overview
You can upload data to CZ ID for consensus genome assembly through the CZ ID command line interface (CLI). After uploading data, consensus genomes will be automatically assembled against a user-provided reference sequence. You can then view and download the assembled genomes through the CZ ID web application. Note that you will have to complete your account profile the first time you log in to the web application.
Here we describe how to upload samples to CZ ID using a generalized CLI workflow to assemble viral consensus genomes. If you are interested in assembling SARS-CoV-2 consensus genomes, please see instructions for the SARS-CoV-2 specific workflow using the CLI. Below we list steps for assembling consensus genomes for any virus of interest using Mac and Windows operating systems (OS).
After reading this guide, you will be able to:
- Install the CZ ID CLI on your computer
- Set up a connection with your CZ ID account
- Upload data to CZ ID for consensus genome assembly using the CLI
- View consensus genomes in the CZ ID web application
Why Use the CLI?
Although uploading samples for viral consensus genome assembly through the CZ ID web application is straightforward, the CLI offers some advantages over the web interface. Uploading samples through the CLI may be faster than the web upload in some cases. The CLI also enables you to upload samples directly from systems with no user interface (e.g., remote servers) and may allow you to incorporate sample upload to CZ ID into automated workflows using other tools.
Install the CZ ID CLI and upload samples to CZ ID using a Mac OS
Below we provide instructions to install CZ ID CLI on your Mac or Linux system and establish a connection with your CZ ID account. Note that you only need to perform these steps once. After setting up your connection, you only need to log in to CZ ID to work with samples through the CLI. We also describe how to upload files to CZ ID through the CLI for consensus genome assembly. The instructions are divided into five general steps:
Step 2: Set up an initial connection with your CZ ID account
Step 4: Prepare your sample upload code (includes code templates)
Step 5: Upload samples to CZ ID for consensus genome assembly
Step 1: Install the CZ ID CLI
You can easily install the latest release for the CZ ID CLI using Homebrew. To do this:
- Download and install Homebrew on your computer by following steps 2 through 4. Go to step 5, if you already have Homebrew on your computer.
- Open your terminal.
- Go to Homebrew and copy the installation command on the web page
- Paste the command into your Terminal and continue with the installation by following the prompts. Make sure to run the last two commands listed in the instructions to add Homebrew to your PATH environment variables.
- After installing Homebrew, add the “chanzuckerberg tap” by typing the following command into your Terminal:
brew tap chanzuckerberg/tap
If everything is going well, you should see a “Tapping chanzuckerberg/tap” message. - After adding the tap, install the CZ ID CLI package by typing the following into your Terminal:
brew install czid-cli
If everything is going well, you should see messages regarding the progress of package download and installation.
MacOS Terminal highlighting CZ ID CLI installation commands using Homebrew (red arrows)
Step 2: Set up an initial connection with your CZ ID account from your device
Use your credentials to log in to CZ ID via CLI. To do this:
- Open your Terminal
- Type the following command:
czid login
- You will be provided a user code and directed to the web to log in to CZ ID with your username and password.
After typing "czid login" on your Terminal, you will be directed to the web. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device. A new page will appear where you will log in using your credentials.
Click “Accept” to authorize access to your CZ ID account. You will then see a message indicating that you are all set. - Go back to the Terminal and accept the user agreement by entering the following:
czid accept-user-agreement
Note that you will not be prompted to accept an agreement, simply type the command above. After you enter the command, the user agreement will be printed and you will be prompted to accept the agreement.Only after typing the "czid accept-user-agreement" command will you be able to see the user agreement and the prompt to accept the terms. - You are all set to use the CZ ID CLI! Next time you need to use the CLI, simply log in to CZ ID and confirm your device.
After setting up your connection with CZ ID, you only need to log in and confirm your device before uploading samples to the platform. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device.
Step 3: Get your files ready
To upload files to CZ ID for genome assembly, you need to have your project information and files ready. Make sure that all your files are in the same directory/folder.
You will need the following files and information for your upload command (Step 4):
Project name: Uploaded samples will be organized under a project.
-
-
- Reference an existing project by using the project name of interest while uploading samples through the CLI.
- If you would like to create a new project, you have to create it within your account using the CZ ID web interface first and use the new project name while uploading samples through the CLI. See Project Selection within Upload Data through the Web App for details.
-
- Sample name: If you are uploading only one sample, you should specify the sample name. Note that the same sample name should be included in your metadata file (see below).
-
Metadata file: Sample information should be provided in a comma-delimited file (“csv” file extension). See Metadata instructions and dictionary for details regarding metadata requirements and format.
-
- If you download metadata for samples on your CZ ID account, the metadata file will be already in the correct format.
- If you need to prepare a metadata file, we recommend using our Metadata template to generate your file. Not all metadata in the template is required. If you don’t have information for a given metadata field, simply leave it blank. Save your edited file as a comma-delimited file (“csv” file extension).
- Note that there are seven required metadata entries for samples, including:
- Sample Name
- Collection Location
- If possible, provide information specifying more than the country. However, don’t provide more than county-level information to protect personally identifiable information.
- Collection Date
- Nucleotide Type
- Sample Type
- Water Control
- Host Organism
- When uploading one sample at a time to CZ ID, make sure that the Sample Name on the metadata file matches the sample name provided in the upload command (see step 3 regarding upload commands below).
-
-
Example metadata file for uploading a single sample.
Sequencing platform: Specify “Illumina” as the sequencing platform given that, currently, this pipeline only supports assembly of Illumina reads. However, our team is working to extend the pipeline to include analysis of Nanopore reads. Stay tuned for updates!
Reference sequence file: Specify a reference sequence using a FASTA file or an accession ID.
Primer file (optional): Specify the file containing information about primer positions in BED format (“.bed” file extension).
Read files: Specify the file containing sequence reads.
-
- CZ ID supports the following file types: .fastq/.fq/.fastq.gz/.fq.gz.
- You can specify a single file for single-end reads or two files for paired-end reads.
- If you are uploading more than 1 sample at a time, you can specify the path to a directory containing read files. The CZ ID CLI will search the directory for read files and automatically upload supported files types (.fastq/.fq/.fastq.gz/.fq.gz). Sample names will be assigned using file names.
-
Step 4: Prepare your sample upload code
Now that you have sample and file information ready, you can work on your code or command to upload sample files for genome assembly through the CLI. You will use this command on Step 5 (described below).
Write your upload command using a plain text editor. Below we provide code templates for uploading different types of files. You can copy the commands that suit your needs and edit accordingly using your text editor of choice. DO NOT USE MICROSOFT WORD or text editors that are not in plain text format because these programs will disrupt the required format and your code will not work.You can use TextEdit, a built-in text editor on Mac OS, to work on your upload code. However, make sure to set the format to plain text before pasting the code template.
Upload code templates
Upload a sample with a reference sequence file in FASTA format:
czid consensus-genome upload-sample \
--project 'Your Project ID' \
--sample-name 'Your Sample Name' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
--reference-fasta 'Your_reference_sequence_file.fasta' \
--primer-bed 'Your_primer_file.bed' \
'Your_Sample_R1_file.fastq.gz' 'Your_Sample_R2_file.fastq.gz'
Upload multiple samples with a reference sequence file in FASTA format:
czid consensus-genome upload-sample \
--project 'Your Project ID' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
--reference-fasta 'Your_reference_sequence_file.fasta' \
--primer-bed 'Your_primer_file.bed' \
'Path_to_samples_directory'
Upload a sample with a reference sequence file in FASTA format (no primer BED file):
czid consensus-genome upload-sample \
--project 'Your Project ID' \
--sample-name 'Your Sample Name' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
--reference-fasta 'Your_reference_sequence_file.fasta' \
'Your_Sample_R1_file.fastq.gz' 'Your_Sample_R2_file.fastq.gz'
Upload multiple samples with a reference sequence file in FASTA format (no primer BED file):
czid consensus-genome upload-sample \
--project 'Your Project ID' \
--sample-name 'Your Sample Name' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
--reference-fasta 'Your_reference_sequence_file.fasta' \
'Path_to_samples_directory'
Upload a sample with a reference sequence using an accession ID:
czid consensus-genome upload-sample \
--project 'Your Project ID' \
--sample-name 'Your Sample Name' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
--reference-accession 'Your_reference_sequence_accessionID' \
--primer-bed 'Your_primer_file.bed' \
'Your_Sample_R1_file.fastq.gz' 'Your_Sample_R2_file.fastq.gz'
Upload multiple sequences with a reference sequence using an accession ID:
czid consensus-genome upload-sample \
--project 'Your Project ID' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
--reference-accession 'Your_reference_sequence_accessionID' \
--primer-bed 'Your_primer_file.bed' \
'Path_to_samples_directory'
Step 5: Upload your files to CZ ID for consensus genome assembly
Now that your upload code is ready, you can upload sample files to CZ ID for consensus genome assembly and view assembled genomes in the web application. To do this:
- Log in to your CZ ID CLI account by opening your Terminal and typing the following command:
czid login
- Set your directory to the folder containing sample files:
cd Path_to_directory
- Copy and paste the upload code you edited in Step 4 into your Terminal and press enter.Example command for uploading a single paired-end sample
- Consensus genomes will be automatically assembled after uploading your files.
- Go to the CZ ID web interface and log in to your account (note that you will have to complete your account profile the first time you log in to the web application).
- Go the Consensus Genomes tab within the Project page of interest to check on the status of your consensus genomes. Genome assembly may take a few minutes. The reference accession may not be listed in the Samples table for samples uploaded through the CLI. Keep an eye on the status of your samples. If your sample has a “Complete” status, your consensus genome is ready. If the sample status indicates "Running", consensus genome assembly is in progress. If the status reads “Created”, sample upload is in progress. However, if the Created status doesn’t change to “Running” after a while, there might have been an error during file upload. Eventually, the platform will show a “FAILED” status for samples that had errors while uploading.
- Once “Complete”, see consensus genome details by clicking on a genome of interest. This is important to assess the quality of the genome. The taxon name may not be listed in the genome report for samples uploaded through the CLI.
Sample list (top panel) and consensus genome assembly details for the selected sample (bottom panel)
Install the CZ ID CLI and upload samples to CZ ID using Windows OS
Below we provide instructions to install CZ ID CLI on your Windows device and establish a connection with your CZ ID account. Note that you only need to perform these steps once. After setting up your connection, you only need to log in to CZ ID to work with samples through the CLI. We also describe how to upload files to CZ ID through the CLI for consensus genome assembly. The instructions are divided into five general steps:
Step 2: Set up an initial connection with your CZ ID account
Step 4: Prepare your sample upload code (includes code templates)
Step 5: Upload samples to CZ ID for consensus genome assembly
Step 1: Install the CZ ID CLI
To install the CZ ID CLI on your Windows device you need to download the CZ ID CLI executable and run it on your computer. To do this:
- Find the latest CLI release on the CZ ID CLI GitHub page and download the compressed file named "czid-cli_windows_amd64.zip"
- Decompress or unzip the downloaded "czid-cli_windows_amd64.zip" file.
- Move the czid executable ("czid.exe") to your desired directory and copy the path to your clipboard. You can copy the path by right clicking on the file and selecting “Copy as path”.
- Add the "czid.exe" path to your environment variables by following steps 5 and 6.
- Search for “environment variables” using the File explorer and select “Edit the system environment variables”.
- A "System Properties" dialog box will appear where you can add the new path.You need to: a) Select “Environment Variables…” under the “Advanced” tab; b) Select or highlight “Path” under the options for “System variables” and click "Edit"; c) Click on “New” under the Edit environment variable dialog box; d) Paste the path to the "czid.exe" file that you copied on step 3. Note that the file name should not be included in the path and you need to delete quotation marks. Example:
C:\Users\UserX\Documents\CZID-CLI\czid-cli_windows_amd64\
e) Click “OK” on all the dialog boxes.
Step 2: Set up an initial connection with your CZ ID account from your device
Use your credentials to log in to CZ ID via CLI. To do this:
- Open a Command Prompt window
To open a Command Prompt window, search for “Command” using the File Explorer and click on the Command Prompt App.
- Type the following command:
czid login
- You will be provided a user code and directed to the web to log in to CZ ID with your username and password.
After typing "czid login" on the Command Prompt, you will be directed to the web. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device. A new page will appear where you will log in using your credentials. You will be ready to go once you enter your login information.
- Go back to the Command Prompt and accept the user agreement by entering the following:
czid accept-user-agreement
Note that you will not be prompted to accept an agreement, simply type the command above. After you enter the command, the user agreement will be printed and you will be prompted to accept the agreement.Only after typing the "czid accept-user-agreement" command will you be able to see the user agreement and the prompt to accept the terms.
- You are all set to use the CZ ID CLI! Next time you need to use the CLI, simply log in to CZ ID and confirm your device.After setting up your connection with CZ ID, you only need to log in and confirm your device before uploading samples to the platform. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device.
Step 3: Get your files ready
To upload files to CZ ID for genome assembly, you need to have your project information and files ready. Make sure that all your files are in the same directory/folder.
You will need the following files and information for your upload command:
Project name: Uploaded samples will be organized under a project.
-
-
- Reference an existing project by using the project name of interest while uploading samples through the CLI.
- If you would like to create a new project, you have to create it within your account using the CZ ID web interface first and use the new project name while uploading samples through the CLI. See Project Selection within Upload Data through the Web App for details.
-
- Sample name: If you are uploading only one sample, you should specify the sample name. Note that the same sample name should be included in your metadata file (see below).
-
Metadata file: Sample information should be provided in a comma-delimited file (“csv” file extension). See Metadata instructions and dictionary for details regarding metadata requirements and format.
-
- If you download metadata for samples on your CZ ID account, the metadata file will be already in the correct format.
- If you need to prepare a metadata file, we recommend using our Metadata template to generate your file. Not all metadata in the template is required. If you don’t have information for a given metadata field, simply leave it blank. Save your edited file as a comma-delimited file (“csv” file extension).
- Note that there are seven required metadata entries for samples, including:
- Sample Name
- Collection Location
- If possible, provide information specifying more than the country. However, don’t provide more than county-level information to protect personally identifiable information.
- Collection Date
- Nucleotide Type
- Sample Type
- Water Control
- Host Organism
- When uploading one sample at a time to CZ ID, make sure that the Sample Name on the metadata file matches the sample name provided in the upload command (see step 3 regarding upload commands below).
-
-
Example metadata file for uploading a single sample.
Sequencing platform: Specify “Illumina” as the sequencing platform given that, currently, this pipeline only supports assembly of Illumina reads. However, our team is working to extend the pipeline to include analysis of Nanopore reads. Stay tuned for updates!
Reference sequence file: Specify a reference sequence using a FASTA file or an accession ID.
Primer file (optional): Specify the file containing information about primer positions in BED format (“.bed” file extension).
Read files: Specify the file containing sequence reads.
-
- CZ ID supports the following file types: .fastq/.fq/.fastq.gz/.fq.gz.
- You can specify a single file for single-end reads or two files for paired-end reads.
- If you are uploading more than 1 sample at a time, you can specify the path to a directorycontaining read files. The CZ ID CLI will search the directory for read files and automatically upload supported files types (.fastq/.fq/.fastq.gz/.fq.gz). Sample names will be assigned using file names.
-
Step 4: Prepare your sample upload code
Now that you have sample and file information ready, you can work on your code or command to upload sample files for genome assembly through the CLI. You will use this command on Step 5 (described below).
Write your upload command using a plain text editor, such as Notepad. Below we provide code templates for uploading different types of files. You can copy the commands that suit your needs and edit accordingly using your text editor of choice. DO NOT USE MICROSOFT WORD or text editors that are not in plain text format because these programs will disrupt the required format and your code will not work.
You can use Notepad to edit your upload code. Make sure there are not hard enters between command line arguments.
Upload code templates
The templates below show a long line of code, make sure to copy all the text. You can double-click on the template code to highlight all the text and then copy it. Drag your cursor to the right to scroll towards the end of the code line. Make sure everything is highlighted within the code block before copying the text to your clipboard.
Upload a sample with a reference sequence file in FASTA format:
czid consensus-genome upload-sample “Your_Sample_R1_file.fastq.gz” “Your_Sample_R2_file.fastq.gz” --project “Your Project ID” --sample-name “Your Sample Name” --metadata-csv “Your_metadata_file.csv” --sequencing-platform “Illumina” --reference-fasta “Your_reference_sequence_file.fasta” --primer-bed “Your_primer_file.bed
Upload multiple samples with a reference sequence file in FASTA format:
czid consensus-genome upload-sample “Path_to_your_sample_directory” --project “Your Project ID” --metadata-csv “Your_metadata_file.csv” --sequencing-platform “Illumina” --reference-fasta “Your_reference_sequence_file.fasta” --primer-bed “Your_primer_file.bed”
Upload a sample with a reference sequence file in FASTA format (no primer BED file):
czid consensus-genome upload-sample “Your_Sample_R1_file.fastq.gz” “Your_Sample_R2_file.fastq.gz” --project “Your Project ID” --sample-name “Your Sample Name” --metadata-csv “Your_metadata_file.csv” --sequencing-platform “Illumina” --reference-fasta “Your_reference_sequence_file.fasta”
Upload multiple samples with a reference sequence file in FASTA format (no primer BED file):
czid consensus-genome upload-sample “Path_to_your_sample_directory” --project “Your Project ID” --metadata-csv “Your_metadata_file.csv” --sequencing-platform “Illumina” --reference-fasta “Your_reference_sequence_file.fasta”
Upload a sample with a reference sequence using an accession ID:
czid consensus-genome upload-sample “Your_Sample_R1_file.fastq.gz” “Your_Sample_R2_file.fastq.gz” --project “Your Project ID” --sample-name “Your Sample Name” --metadata-csv “Your_metadata_file.csv” --sequencing-platform “Illumina” --reference-accession “Your_reference_sequence_accessionID” --primer-bed “Your_primer_file.bed”
Upload multiple sequences with a reference sequence using an accession ID:
czid consensus-genome upload-sample “Path_to_your_sample_directory” --project “Your Project ID” --metadata-csv “Your_metadata_file.csv” --sequencing-platform “Illumina” --reference-accession “Your_reference_sequence_accessionID” --primer-bed “Your_primer_file.bed”
Step 5: Upload your files to CZ ID for consensus genome assembly
Now that your upload code is ready, you can upload sample files to CZ ID for consensus genome assembly and view assembled genomes in the web application. To do this:
- Log in to your CZ ID CLI account by opening your Command Prompt and typing the following command:
czid login
- Set your directory to the folder containing sample files:
cd Path_to_directory
- Copy and paste the upload code you edited in Step 4 into your Terminal and press enter.Example command for uploading a single paired-end sample
- Consensus genomes will be automatically assembled after uploading your files.
- Go to the CZ ID web interface and log in to your account (note that you will have to complete your account profile the first time you log in to the web application).
- Go the Consensus Genomes tab within the Project page of interest to check on the status of your consensus genomes. Genome assembly may take a few minutes. The reference accession may not be listed in the Samples table for samples uploaded through the CLI. Keep an eye on the status of your samples. If your sample has a “Complete” status, your consensus genome is ready. If the sample status indicates "Running", consensus genome assembly is in progress. If the status reads “Created”, sample upload is in progress. However, if the Created status doesn’t change to “Running” after a while, there might have been an error during file upload. Eventually, the platform will show a “FAILED” status for samples that had errors while uploading.
- Once “Complete”, see consensus genome details by clicking on a genome of interest. This is important to assess the quality of the genome. The taxon name may not be listed in the genome report for samples uploaded through the CLI.
Sample list (top panel) and consensus genome assembly details for the selected sample (bottom panel)
Troubleshooting tips
- If you get an error message indicating “No such file or directory” after executing the upload command, make sure that:
- The spelling in your code matches the specified file names
- The files are found in the correct directory
- There are no weird characters between command line arguments (make sure you edit the code using a plain text editor)
- If you are having problems and getting unexpected error messages, some of which may refer to the CZ ID CLI GitHub page, make sure that you have the latest release of the CZ ID CLI. Check for release updates on the CZ ID CLI GitHub page.
- Find more details about the CLI on the CZ ID CLI GitHub page.
Comments
0 comments
Please sign in to leave a comment.