Jump to Section:
Overview
You can upload metagenomic (mNGS) data to CZ ID through the command line interface (CLI). After uploading data, samples will run through the mNGS Pipeline. You can then view identified taxa and perform downstream analysis through the Sample Report found in the CZ ID web application. Note that you will have to complete your account profile the first time you log in to the web application.
The CLI feature is available for short- and long-read data obtained with Illumina and Nanopore sequencers, respectively. Although uploading samples through the CZ ID web application is straightforward for Illumina and Nanopore data, the CLI offers some advantages over the web interface. Uploading samples through the CLI enables you to upload samples directly from systems with no user interface (e.g., remote servers) and may allow you to incorporate sample upload to CZ ID into automated workflows using other tools. Additionally, the CLI may be faster than the web upload in some cases.
Here we describe how to upload samples to CZ ID for mNGS analysis and troubleshooting tips. We list steps for uploading samples using Mac and Windows operating systems (OS). After reading this guide, you will be able to:
- Install the CZ ID CLI on your computer
- Set up a connection with your CZ ID account
- Upload short- or long-read data to CZ ID for mNGS analysis using the CLI
- View sample reports in the CZ ID web application
Install the CZ ID CLI and Upload Samples to CZ ID Using a Mac OS
Below we provide instructions to install CZ ID CLI on your Mac or Linux system and establish a connection with your CZ ID account. Note that you only need to perform these steps once. After setting up your connection, you only need to log in to CZ ID to work with samples through the CLI. We also describe how to upload files to CZ ID through the CLI for mNGS analysis. The instructions are divided into five general steps:
Step 2: Set up an initial connection with your CZ ID account
Step 4: Prepare upload command (includes templates)
Step 5: Upload samples to CZ ID for mNGS analysis
Step 1: Install the CZ ID CLI
You can easily install the latest release for the CZ ID CLI using Homebrew.
To install the CZ ID CLI through Homebrew:
- Download and install Homebrew on your computer by following steps 2 through 4. Go to step 5, if you already have Homebrew on your computer.
- Open your terminal.
-
Go to Homebrew and copy the installation command on the web page.
- Paste the command into your Terminal and continue with the installation by following the prompts. Make sure to run the last two commands listed in the instructions to add Homebrew to your PATH environment variables.
-
After installing Homebrew, add the “chanzuckerberg tap” by typing the following command into your Terminal:
brew tap chanzuckerberg/tap
If everything is going well, you should see a “Tapping chanzuckerberg/tap” message. -
After adding the tap, install the CZ ID CLI package by typing the following into your Terminal:
brew install czid-cli
If everything is going well, you should see messages regarding the progress of package download and installation.
MacOS Terminal highlighting CZ ID CLI installation commands using Homebrew (red arrows)
Step 2: Set up Initial Connection with Your CZ ID Account
Use your credentials to log in to CZ ID via CLI. To do this:
- Open your Terminal
-
Type the following command:
czid login
-
You will be provided a user code and directed to the web to log in to CZ ID with your username and password.
After typing "czid login" on your Terminal, you will be directed to the web. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device. A new page will appear where you will log in using your credentials.
Click “Accept” to authorize access to your CZ ID account. You will then see a message indicating that you are all set.
-
Go back to the Terminal and accept the user agreement by entering the following:
czid accept-user-agreement
Note that you will not be prompted to accept an agreement, simply type the command above. After you enter the command, the user agreement will be printed and you will be prompted to accept the agreement by typing "y" or "Y".Only after typing the "czid accept-user-agreement" command will you be able to see the user agreement and the prompt to accept the terms.
-
You are all set to use the CZ ID CLI! Next time you need to use the CLI, simply log in to CZ ID and confirm your device.
After setting up your connection with CZ ID, you only need to log in and confirm your device before uploading samples to the platform. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device.
Step 3: Get Files Ready
To upload files, you need to have your project information and files ready. This information will be specified in your upload command (see step 4). Make sure that all your files are in the same directory or folder.
In your upload command, you will specify project name, sample name, and filenames for metadata and read files. See details below.
Project name: Uploaded samples will be organized under a project.
-
- Uploading to an existing project: Reference the project by using the project name of interest in your upload command while uploading samples through the CLI.
- Uploading to a new project: If you would like to create a new project, you have to create it within your account using the CZ ID web interface first and use the new project name while uploading samples through the CLI. See Project Selection within Upload Data through the Web App for details.
Sample name (optional): You can include a flag for sample name in your upload command.
-
- Uploading a single sample: If you are uploading only one sample, you can specify the sample name in your upload code. This is necessary if there are multiple sequencing files in the same folder.
- Uploading samples in bulk: If you are uploading samples in bulk, you should compile all the sequencing files to be uploaded in the same folder. You do not need to provide a sample name flag in the upload code for uploading a batch of samples.
- Note that for single file and bulk uploads the sample names should be included in your metadata file and match filenames in the specified folder except for the part of the filename indicating the sequenced end for Illumina data (i.e., R1 or R2) and file extension. For example, for filename “Sample1_S001_ R1.fastq” you only need to specify “Sample1_S001” in your metadata file (see below).
Metadata file: Sample information should be provided in a comma-delimited file (“csv” file extension). See Metadata instructions and dictionary for details regarding metadata requirements and format.
-
- If you download metadata for samples on your CZ ID account, the metadata file will be already in the correct format.
- If you need to prepare a metadata file, we recommend using our Metadata template to generate your file. Not all metadata in the template is required. If you don’t have information for a given metadata field, simply leave it blank. Save your edited file as a comma-delimited file (“csv” file extension).
- Note that there are seven required metadata entries for samples, including:
-
- Sample Name
- Collection Location
- If possible, provide information specifying more than the country. However, don’t provide more than county-level information to protect personally identifiable information.
- Collection Date
- Nucleotide Type
- Sample Type
- Water Control
- Host Organism
-
- When uploading one sample at a time to CZ ID, make sure that the Sample Name on the metadata file matches the sample name provided in the upload command (see step 4 below)
Example metadata file for uploading a single sample. If uploading multiple samples, make sure sample names provided in the metadata file match filenames in the folder containing sequencing files.
Sequencing platform: Specify “Illumina” or "Nanopore" as the sequencing platform.
Guppy basecaller (Nanopore only): If you are analyzing Nanopore data, specify which basecalling model was used to generate the data. Options include: "hac", "super", or "fast". Click here for more details.
Read files: You will need to specify sequencing files by providing filenames (single sample upload) or the path to the directory containing sequencing files (bulk upload). Note the following regarding sequencing files:
-
- CZ ID mNGS pipelines support FASTQ formats, including: .fastq/.fq/.fastq.gz/.fq.gz
- The mNGS Illumina pipeline also supports FASTA format, including: .fasta/.fa/.fasta.gz/.fa.gz.
- File names must be no longer than 120 characters and can only contain letters from the English alphabet (A-Z, upper and lower case), numbers (0-9), periods (.), hyphens (-) and underscores (_). Spaces are not allowed.
- When uploading a single short-read sample (Illumina), you can specify a single file for single-end reads or two files for paired-end reads.
- For Nanopore data, you can only upload one sequencing file per sample. Therefore, if there are multiple FASTQ files associated with a given sample, make sure to concatenate into a single file prior to upload. This contrasts with the mNGS Illumina pipeline, where FASTQ files from multiple lanes associated with a single sample are automatically concatenated during upload.
-
If you are uploading more than 1 sample at a time, you have to specify the path to a directory containing sequencing files.
- The CZ ID CLI will search the directory for read files and automatically upload supported files types (.fastq/.fq/.fasta/.fa/.fastq.gz/.fq.gz/.fasta.gz/.fa.gz).
- Sample names will be assigned using file names. Sample names will include the base name of the file with the extension specifying the sequenced end ( e.g., _R1, _R2, _R1_001, and _R2_001 ) removed. For example, the sample name for file “Sample1_R1_001.fq” will be “Sample1”.
Step 4: Prepare Sample Upload Command
Now that you have sample and file information ready, you can work on your code or command to upload sample files for mNGS analysis through the CLI. You will use this command on Step 5.
Write your upload command using a plain text editor. Below we provide code templates for uploading mNGS files. You can copy the commands that suit your needs and edit accordingly using your text editor of choice. DO NOT USE MICROSOFT WORD or text editors that are not in plain text format because these programs will disrupt the required format and your code will not work.
You can use TextEdit, a built-in text editor on Mac OS, to work on your upload code. However, make sure to set the format to plain text before pasting the code template.
Upload code templates for Mac
Illumina data: Upload a single sample by providing sample name and specifying sequencing files (e.g., paired-end data)
czid metagenomics upload-sample \
--project 'Your Project ID' \
--sample-name 'Your Sample Name' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
'Your_Sample_File_R1.fastq.gz' 'Your_Sample_File_R2.fastq.gz'
Illumina data: Upload multiple samples by specifying path to sequencing files
czid metagenomics upload-samples \
--project 'Your Project ID' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Illumina' \
'Path_to_samples_directory'
Nanopore data: Upload a single sample by providing sample name and specifying sequencing files
czid metagenomics upload-sample \
--project 'Your Project ID' \
--sample-name 'Your Sample Name' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Nanopore' \
--guppy-basecaller-setting 'hac' \
'Your_Sample_File.fastq.gz'
Nanopore data: Upload multiple samples by specifying path to sequencing files
czid metagenomics upload-samples \
--project 'Your Project ID' \
--metadata-csv 'Your_metadata_file.csv' \
--sequencing-platform 'Nanopore' \
--guppy-basecaller-setting 'hac' \
'Path_to_samples_directory'
Step 5: Upload Files to CZ ID for mNGS Analysis
Now that you have sample information ready and upload command ready, you can upload sample files. To upload your data to CZ ID:
-
Log in to your CZ ID CLI account by opening your Terminal and typing the following command:
czid login
-
Set your directory to the folder containing sample files:
cd Path_to_directory
-
Copy and paste the upload code you edited in Step 4 into your Terminal and press enter.
Example command for uploading multiple samples (Nanopore).
- The mNGS pipeline will begin running automatically after uploading your files.
- Go to the CZ ID web interface and log in to your account (note that you will have to complete your account profile the first time you log in to the web application).
-
Go to the Project page of interest to check on the status of your sample.
Keep an eye on the status of your samples. A "Complete” status indicates that the mNGS pipeline run has successfully completed.
-
Once the mNGS run is "Complete", click on the sample to go to the Sample Report page. Click here to learn more about the Sample Report.
Sample list within project of interest (top panel) and Sample Report page for the selected sample (bottom panel)
Install the CZ ID CLI and Upload Samples to CZ ID Using Windows OS
Below we provide instructions to install CZ ID CLI on your Windows device and establish a connection with your CZ ID account. Note that you only need to perform these steps once. After setting up your connection, you only need to log in to CZ ID to work with samples through the CLI. We also describe how to upload files to CZ ID through the CLI for consensus genome assembly. The instructions are divided into five general steps:
Step 2: Set up an initial connection with your CZ ID account
Step 4: Prepare upload command (includes templates)
Step 5: Upload samples to CZ ID for mNGS analysis
Step 1: Install the CZ ID CLI
To install the CZ ID CLI on your Windows device you need to download the CZ ID CLI executable and run it on your computer. To do this:
-
Find the latest CLI release on the CZ ID CLI GitHub page and download the compressed file named "czid-cli_windows_amd64.zip"
- Decompress or unzip the downloaded "czid-cli_windows_amd64.zip" file.
-
Move the czid executable ("czid.exe") to your desired directory and copy the path to your clipboard. You can copy the path by right clicking on the file and selecting “Copy as path”.
- Add the "czid.exe" path to your environment variables by following steps 5 and 6.
-
Search for “environment variables” using the File explorer and select “Edit the system environment variables”.
-
A "System Properties" dialog box will appear where you can add the new path.
You need to do the following through the System Properties dialog box:
a) Select “Environment Variables…” under the “Advanced” tab.
b) Select or highlight “Path” under the options for “System variables” and click "Edit".
c) Click on “New” under the Edit environment variable dialog box.
d) Paste the path to the "czid.exe" file that you copied above during step 3. Note that the file name should not be included in the path and you need to delete quotation marks. Example:C:\Users\UserX\Documents\CZID-CLI\czid-cli_windows_amd64\
e) Click “OK” on all the dialog boxes.
Step 2: Set up an Initial Connection with Your CZ ID Account
Use your credentials to log in to CZ ID via CLI. To do this:
-
Open a Command Prompt window
To open a Command Prompt window, search for “Command” using the File Explorer and click on the Command Prompt App.
-
Type the following command:
czid login
-
You will be provided a user code and directed to the web to log in to CZ ID with your username and password.
After typing "czid login" on the Command Prompt, you will be directed to the web. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device. A new page will appear where you will log in using your credentials. You will be ready to go once you enter your login information.
-
Go back to the Command Prompt and accept the user agreement by entering the following:
czid accept-user-agreement
Note that you will not be prompted to accept an agreement, simply type the command above. After you enter the command, the user agreement will be printed and you will be prompted to accept the agreement by typing "y" or "Y".Only after typing the "czid accept-user-agreement" command will you be able to see the user agreement and the prompt to accept the terms.
-
You are all set to use the CZ ID CLI! Next time you need to use the CLI, simply log in to CZ ID and confirm your device.
After setting up your connection with CZ ID, you only need to log in and confirm your device before uploading samples to the platform. Here we overlaid web prompts over the CLI for visualization purposes. Look at your user code on the CLI and make sure it matches the one on the web page to confirm your device.
Step 3: Get Files Ready
To upload files, you need to have your project information and files ready. This information will be specified in your upload command (see step 4). Make sure that all your files are in the same directory or folder.
In your upload command, you will specify project name, sample name, and filenames for metadata and read files. See details below.
Project name: Uploaded samples will be organized under a project.
-
- Uploading to an existing project: Reference the project by using the project name of interest in your upload command while uploading samples through the CLI.
- Uploading to a new project: If you would like to create a new project, you have to create it within your account using the CZ ID web interface first and use the new project name while uploading samples through the CLI. See Project Selection within Upload Data through the Web App for details.
Sample name (optional): You can include a flag for sample name in your upload command.
-
- Uploading a single sample: If you are uploading only one sample, you can specify the sample name in your upload code. This is necessary if there are multiple sequencing files in the same folder.
- Uploading samples in bulk: If you are uploading samples in bulk, you should compile all the sequencing files to be uploaded in the same folder. You do not need to provide a sample name flag in the upload code for uploading a batch of samples.
- Note that for single file and bulk uploads the sample names should be included in your metadata file and match filenames in the specified folder except for the part of the filename indicating the sequenced end for Illumina data(i.e., R1 or R2) and file extension. For example, for filename “Sample1_S001_ R1.fastq” you only need to specify “Sample1_S001” in your metadata file (see below).
Metadata file: Sample information should be provided in a comma-delimited file (“csv” file extension). See Metadata instructions and dictionary for details regarding metadata requirements and format.
-
- If you download metadata for samples on your CZ ID account, the metadata file will be already in the correct format.
- If you need to prepare a metadata file, we recommend using our Metadata template to generate your file. Not all metadata in the template is required. If you don’t have information for a given metadata field, simply leave it blank. Save your edited file as a comma-delimited file (“csv” file extension).
- Note that there are seven required metadata entries for samples, including:
-
- Sample Name
- Collection Location
- If possible, provide information specifying more than the country. However, don’t provide more than county-level information to protect personally identifiable information.
- Collection Date
- Nucleotide Type
- Sample Type
- Water Control
- Host Organism
-
- When uploading one sample at a time to CZ ID, make sure that the Sample Name on the metadata file matches the sample name provided in the upload command (see step 4 below)
Example metadata file for uploading a single sample. If uploading multiple samples, make sure sample names provided in the metadata file match filenames in the folder containing sequencing files.
Sequencing platform: Specify "Illumina" or "Nanopore" as the sequencing platform.
Guppy basecaller (Nanopore only): If you are analyzing Nanopore data, specify which basecalling model was used to generate the data. Options include: "hac", "super", or "fast". Click here for more details.
Read files: You will need to specify sequencing files by providing filenames (single sample upload) or the path to the directory containing sequencing files (bulk upload). Note the following regarding sequencing files:
-
- CZ ID mNGS pipelines support FASTQ formats, including: .fastq/.fq/.fastq.gz/.fq.gz
- The mNGS Illumina pipeline also supports FASTA format, including: .fasta/.fa/.fasta.gz/.fa.gz.
- File names must be no longer than 120 characters and can only contain letters from the English alphabet (A-Z, upper and lower case), numbers (0-9), periods (.), hyphens (-) and underscores (_). Spaces are not allowed.
- When uploading a single short-read sample (Illumina), you can specify a single file for single-end reads or two files for paired-end reads.
- For Nanopore data, you can only upload one sequencing file per sample. Therefore, if there are multiple FASTQ files associated with a given sample, make sure to concatenate into a single file prior to upload. This contrasts with the mNGS Illumina pipeline, where FASTQ files from multiple lanes associated with a single sample are automatically concatenated during upload.
-
If you are uploading more than 1 sample at a time, you have to specify the path to a directory containing sequencing files.
- The CZ ID CLI will search the directory for read files and automatically upload supported files types (.fastq/.fq/.fasta/.fa/.fastq.gz/.fq.gz/.fasta.gz/.fa.gz).
- Sample names will be assigned using file names. Sample names will include the base name of the file with the extension specifying the sequenced end ( e.g., _R1, _R2, _R1_001, and _R2_001 ) removed. For example, the sample name for file “Sample1_R1_001.fq” will be “Sample1”.
Step 4: Prepare Sample Upload Command
Now that you have sample and file information ready, you can work on your code or command to upload sample files for mNGS analysis through the CLI. You will use this command on Step 5.
Write your upload command using a plain text editor. Below we provide code templates for uploading mNGS files. You can copy the commands that suit your needs and edit accordingly using your text editor of choice. DO NOT USE MICROSOFT WORD or text editors that are not in plain text format because these programs will disrupt the required format and your code will not work.
You can use Notepad to edit your upload code. Make sure there are not hard enters between command line arguments.
Upload code templates for Windows
The templates below show a long line of code, make sure to copy all the text. You can double-click on the template code to highlight all the text and then copy it. Drag your cursor to the right to scroll towards the end of the code line. Make sure everything is highlighted within the code block before copying the text to your clipboard.
Illumina data: Upload a single sample by providing sample name and specifying sequencing files (e.g., paired-end data)
czid metagenomics upload-sample "Your_Sample_file_R1.fastq.gz" "Your_Sample_file_R2.fastq.gz" --project "Your Project ID" --sample-name "Your Sample Name" --metadata-csv "Your_metadata_file.csv" --sequencing-platform "Illumina"
Illumina data: Upload multiple samples by specifying path to sample files
czid metagenomics upload-samples "Path_to_your_sample_directory" --project "Your Project ID" --metadata-csv "Your_metadata_file.csv" --sequencing-platform "Illumina"
Nanopore data: Upload a single sample by providing sample name and specifying sequencing files (e.g., paired-end data)
czid metagenomics upload-sample "Your_Sample_file.fastq.gz" --project "Your Project ID" --sample-name "Your Sample Name" --metadata-csv "Your_metadata_file.csv" --sequencing-platform "Nanopore" --guppy-basecaller-setting "hac"
Nanopore data: Upload multiple samples by specifying path to sample files
czid metagenomics upload-samples "Path_to_your_sample_directory" --project "Your Project ID" --metadata-csv "Your_metadata_file.csv" --sequencing-platform "Nanopore" --guppy-basecaller-setting "hac"
Step 5: Upload Files to CZ ID for mNGS Analysis
Now that you have sample information ready and upload command ready, you can upload sample files. To upload your data to CZ ID:
-
Log in to your CZ ID CLI account by opening your Terminal and typing the following command:
czid login
-
Set your directory to the folder containing sample files:
cd Path_to_directory
-
Copy and paste the upload code you edited in Step 4 into your Terminal and press enter.
Example command for uploading multiple samples (Nanopore).
- mNGS pipeline will begin running automatically after uploading your files.
- Go to the CZ ID web interface and log in to your account (note that you will have to complete your account profile the first time you log in to the web application).
-
Go to the Project page of interest to check on the status of your sample.
Keep an eye on the status of your samples. A "Complete” status indicates that the mNGS pipeline run has successfully completed.
-
Once the mNGS run is "Complete", click on the sample to go to the Sample Report page. Click here to learn more about the Sample Report.
Sample list within project of interest (top panel) and Sample Report page for the selected sample (bottom panel)
Troubleshooting Tips
- If you get an error message indicating “No such file or directory” after executing the upload command, make sure that:
-
- The spelling in your code matches the specified file names
- The files are found in the correct directory
- There are no weird characters between command line arguments (make sure you edit the code using a plain text editor)
-
- If you are having problems and getting unexpected error messages, some of which may refer to the CZ ID CLI GitHub page, make sure that you have the latest release of the CZ ID CLI. Check for release updates on the CZ ID CLI GitHub page.
- Find more details about the CLI on the CZ ID CLI GitHub page.
Comments
0 comments
Please sign in to leave a comment.