Jump to Section:
Uploading Data
To analyze your samples and generate a report you need to upload your samples to CZ ID.
To upload new samples to CZ ID, login to the application using your email and password. Once logged in, you will see your name in the upper right-hand corner of the application. Next to your name, you will see an Upload link. This button will take you to the upload page.
The upload page provides users different settings and options for their sample upload.
Project Selection
Samples uploaded to CZ ID must belong to a project. You can upload samples to an existing project or create a new project.
Uploading to an Existing Project
To add samples to an existing project, select the project dropdown menu at the top of the page. You will be able to view your existing projects and select which project to store your new files. Once you select a project the name will be visible.
Creating a New Project
If you would like to create a new project select the blue + Create Project link below the project dropdown field. Selecting that link will display a new interface with project creation options.
Input your new project.
Project Sharing
Select if you would like your new project to be public or private.
- Private Projects - Samples uploaded to private projects will remain private to you and your collaborators until you decide to share it with other researchers by making it public on CZ ID. CZ ID will not automatically change your samples from private to public - the decision if and when to share samples is entirely yours.
- Public projects - Samples uploaded to public projects are discoverable to all CZ ID users. Note: raw sample data (genetic sequence files (ex: FASTA/FASTQ)) that have been uploaded to CZ ID are only available to the original uploader, no matter if your sample is public, private, or shared via a project. Raw data is not shared with any other CZ ID user, nor is it ever accessed by anyone working on CZ ID unless specifically requested by a user, such as to debug an issue.
You can also add a brief project description to help others understand the context of your project.
Click the Blue Create Project button to save your new project and close the new project options. The name of your new project will be selected.
The next step is to select the analysis type. In this case, you will select Metagenomics.
Selecting Files
There are 2 ways to select the files you want to upload to CZ ID. You can upload files directly from your computer or pull samples from your Basespace account.
File Information
- Accepted file formats: fastq (.fq), fastq.gz (.fq.gz), fasta (.fa), fasta.gz (.fa.gz).
- CZ ID is optimized for files output during short read sequencing
- CZ ID can process single files or paired reads
- Paired files must be labeled with "_R1" or "_R2" at the end of the basename. CZ ID will automatically detect paired reads based on the naming convention.
- File names must be no longer than 120 characters and can only contain letters from the English alphabet (A-Z, upper and lower case), numbers (0-9), periods (.), hyphens (-) and underscores (_). Spaces are not allowed.
Upload from Your Computer
To upload files directly from your computer you can select them through our file browser or drag and drop them directly in CZ ID.
CZ ID will automatically name your samples based on your file name.
Once you have selected and reviewed the files you want to process, click the Continue button at the bottom of the screen.
If you have sequencing files split over multiple lanes per sample, CZ ID will automatically detect this based on Illumina's naming convention and concatenate these files for you. For example, the Novaseq provides so much sequencing data that one sample may be split over 4 lanes; a paired-end sample would produce 8 files. In the screenshot below you can see that CZ ID automatically detects that each file is part of the same sample.
Upload from Basespace
If your files are hosted on the Basespace cloud you can pull them directly into CZ ID. Select the Upload from Basespace tab on the upload page to access your files. Select Connect to Basespace to launch the Basespace login page. Use your basespace credentials to login to the site and select your files for upload.
Once you have selected and reviewed the files you want to process, click the Continue button at the bottom of the screen.
Adding Metadata
You can add metadata to your samples by manually entering data through the interface or uploading an existing CSV spreadsheet.
We require 5 metadata fields when uploading sample but encourage our users to upload more. Metadata helps our users compare across samples and find meaningful patterns.
Required fields: Host Organism, Sample Type, Water Control, Nucleotide Type, Collection Date, Collection Location
You can learn everything you need to know about our metadata fields by looking at the metadata dictionary.
The host organism metadata field in CZ ID refers to the organism from which you collected your metagenomic sample. Your choice of host organism will determine which genome gets subtracted out during the host subtraction step in the pipeline. If your host organism maps to one of the available host genomes on CZ ID, reads aligning to that genome will be removed. The available host genomes are updated often and listed at the top of the Upload Metadata page. You will see "Host will not be subtracted" in the host organism dropdown menu if we do not have a genome for your chosen host organism.
Regardless of your choice of host, the pipeline will always remove ERCCs (synthetic RNA spike-ins) and reads aligning to the Human genome. If you are unsure which host to select or if your desired host is not in the available options, you can select “ERCC only” as the Host Organism, in which case no host subtraction will be performed.
Manual Data Entry
Fill out the metadata in the metadata table.
Host Organism: Organism from which the sample was collected. If the sample is a cultured isolate or does not contain any host reads, select “ERCC only”.
Sample Type: Tissue or site that most accurately describes sample. "Suggested" list is based on Host Organism selection.
Water Control: Whether or not the sample is a water control.
Nucleotide Type: RNA or DNA.
Collection Date: The month and year the sample was originally collected.
Collection Location: Location where the sample was originally collected. For privacy reasons, location data for human samples can only be collected on the state or county level.
Use the location input field to search for locations.
To add more metadata fields and fill them out manually select the + in the right hand corner of the manual entry table.
CSV Upload Instructions
- Review the fields in our metadata dictionary, where you will find definitions and format requirements. Take special note of the required fields, which you must provide when uploading a new sample.
- You can use your own CSV or copy your metadata into our CSV template.
If your entered Host Organism does not match a supported host genome, we will only subtract out ERCCs and the Human genome. You can read more about how to request a new genome to be added to CZ ID here. - Make sure your column headers match our naming convention.
- Make sure your metadata values are in the correct format.
- Upload your CSV file.
- If there are errors, please make the necessary changes in your CSV and upload it
Tip: You can add or edit metadata in your projects at any time.
Reviewing Data
After you have added your metadata click Continue to see a review of your samples and submitted metadata. If you see an issue you can edit your projects and your samples before uploading.
Below the table of samples, you will see some information on how your selection of host organism will affect the pipeline, specifically the host subtraction step. If CZ ID has the genome of the host organism, we will subtract out reads aligning to that genome. Regardless of your choice of host, the pipeline will always remove ERCCs and reads aligning to the Human genome (hg38).
If CZ ID does not have the genome that matches to your host organism, you can request it by following the instructions in our FAQs.
If your samples and metadata look correct please accept CZ ID's Terms of Service and Data Privacy Policy. You can then start your upload by selecting Start Upload. This will upload your samples to our server and kick off the analysis pipeline.
Do not close the web page when your samples are uploading to our servers. The upload will be canceled you and you will have to start your upload over.
You will see a confirmation when your samples have been uploaded successfully. Once you see the confirmation page you can close your window or return to your project page. The CZ ID pipeline can take 30 minutes or a couple of hours to complete running. You can see which pipeline step your sample is in by returning to the project page.
Once completed, your samples will be flagged as "Completed" on your Project Page. You can now explore your sample report. If you encounter issues please get in touch by selecting "Contact Us" from the drop-down menu in the upper right hand corner of your screen.
Comments
0 comments
Please sign in to leave a comment.