Jump to Section:
Overview
The CZ ID phylogenetic tree pipeline for metagenomic (mNGS) samples enables you to construct phylogenetic trees for organisms found within multiple samples and include reference sequences. This is useful when evaluating sequence similarity in the context of sample contamination or identification of potential outbreaks.
Here we outline steps to build and interpret phylogenetic trees for taxa identified within mNGS samples. After reading this guide, you will be able to:
- Learn about sample selection
- Create phylogenetic trees from Project and Sample Report pages
- Color trees by metadata
- Download tree data
Selecting Samples for Phylogenetic Trees
When creating a phylogenetic tree for an organism of interest, CZ ID automatically detects samples containing the organism within your project and other projects across CZ ID. You then select which samples you would like to include in the analysis. When selecting samples, keep in mind that it is best to use samples with relatively high coverage of the taxon of interest. Low-coverage samples may not be suitable for tree building and the tree may fail (see Why a pairwise matrix instead of a tree?).
In addition to selected samples, a maximum of 10 reference sequences are automatically added from the NCBI database. These reference accessions are selected based on the highest number of reads aligning to sequences in the database. Given that the coverage visualization selectively weights contig matches, reference accessions for phylogenetic tree building might differ from those used for coverage visualization. See steps below for how to create phylogenetic trees from the Project and Sample Report pages.
Creating a Phylogenetic Tree from a Project Page
To create a phylogenetic tree from a Project page:
1. Go to the Project page and select any of the samples to activate the More Actions icon () on the right-hand side of the page. Note that you don’t have to select all the samples for the tree because the platform will automatically detect samples containing the taxon of interest (step 6).
2. Click the More Actions icon and select "Create Phylogenetic Tree" from the dropdown menu.
3. A Phylogenetic Trees modal will appear with a list of trees previously created from your account and CZ ID Public projects. Click Create new tree.
4. During the first step, you will be prompted to specify project and taxon. Use search boxes under the dropdown menus to help you find the project and taxon of interest and click the Continue button.
5. The next step prompts you to select samples and add a name for the tree. Note that it may take a while (minutes) for the modal to load while the platform searches for samples containing the taxon of interest. When selecting samples, it is best to use samples with relatively high coverage breadth for the taxon of interest. Use the “Coverage Breadth” column in the modal to determine which samples to add (coverage breadth value is the coverage for the top accession). Low-coverage samples may not be suitable for tree building and the tree may fail (see Why a pairwise matrix instead of a tree?).
6. The final step prompts you to select samples from other projects. Use the provided text box to search for projects of interest. Note that it may take a while (minutes) for the modal to load while the platform searches for samples containing the taxon of interest. Keep in mind that it is best to select samples with relatively high coverage breadth for the taxon of interest. Click Create Tree to initiate tree building.
7. The modal will close automatically and you will see a confirmation message in the upper right-corner of the page indicating that your tree is being created. The number of samples and the length of the contigs will affect the amount of time it will take for the tree to be created. It can take anywhere from 10 minutes to 2 hours.
8. Navigate to the Discovery page ("My Data") and click the Visualizations tab to view the status of your phylogenetic tree.
- Discovery page (My Data page)
- Visualizations tab
- Phylogenetic tree status
9. Once completed, click on the tree name of interest to view the tree. If the data was not suitable to build a tree, you will see a pairwise distance matrix instead of a tree.
Creating a Phylogenetic Tree from the Sample Report Table
To create a phylogenetic tree from the Sample Report Table:
1. Go to the Sample Report page and find the species of interest listed on the report table.
2. Hover over the species name to see analysis icons and click on the Phylogenetic Analysis icon ( ).
3. A Phylogenetic Trees modal will appear with a list of trees previously created from your account. Click Create new tree.
4. You will be prompted to select samples and add a name for the tree. Note that it may take a while (minutes) for the modal to load while the platform searches for samples containing the taxon of interest. When selecting samples, it is best to use samples with relatively high coverage breadth for the taxon of interest. Use the “Coverage Breadth” column in the modal to determine which samples to add (coverage breadth value is the coverage for the top accession). Low-coverage samples may not be suitable for tree building and the tree may fail (see Why a pairwise matrix instead of a tree?).
5. The final step prompts you to select samples from other projects. Use the provided text box to search for projects of interest. Note that it may take a while (minutes) for the modal to load while the platform searches for samples containing the taxon of interest. Keep in mind that it is best to select samples with relatively high coverage breadth for the taxon of interest. Click Create Tree to initiate tree building.
6. The modal will close automatically and you will see a confirmation message in the upper right-corner of the page indicating that your tree is being created. The number of samples and the length of the contigs will affect the amount of time it will take for the tree to be created. It can take anywhere from 10 minutes to 2 hours.
7. Navigate to the Discovery page ("My Data") and click the Visualizations tab to view the status of your phylogenetic tree.
- Discovery page (My Data page)
- Visualizations tab
- Phylogenetic tree status
8. Once completed, click on the tree name of interest to view the tree. If the data was not suitable to build a tree, you will see a pairwise distance matrix instead of a tree.
Coloring the Tree by Metadata
The tree is automatically labeled to distinguish samples by project name and reference sequences from the NCBI database. However, you can easily change this to highlight metadata on the tree by clicking on the “Color by” dropdown menu. From here, you can choose which metadata you would like the see on the tree. If you choose “location”, the tree branch and sample name colors will reflect the location where samples were collected. If you would like to return to the original tree, change “color by” back to “Project name”.
Downloading Phylogenetic Tree Data
You can easily download tree data using the Download button on the right-hand side of the Phylogenetic Tree page.
Downloads available for the phylogenetic tree include:
- Tree file (.nwk): Tree file in Newick format. You can use this file to view the tree using other software (such as MEGA) to edit the tree.
- Tree image (SVG format)
- Tree image (PNG format)
- SKA distance (TSV format): Download the tab-delimited file to view mismatches, Mash-like distances, number of SNPs, and SNP distances between samples. You can read about each of the metrics here.
- SKA variants (.aln): Use this file to view split kmer alignments.
Troubleshooting
The phylogenetic tree module searches specific taxa through all the samples in your projects and other public projects across CZ ID. Therefore, the module can be slow, specifically the Phylogenetic Trees modal. Please be patient when adding information through the modal. Things to note:
- Trees that include older samples (samples uploaded before May 2024) may not work. This is because the tree module is using the latest NCBI database implemented in May 2024 to find reference sequences (click here for information about the latest NCBI database).
- If the Continue button is inactive after selecting samples from your project (step 5 when setting up trees from a Project page or step 4 when creating trees from the Sample Report page) simply re-type the tree name. This should activate the Continue button.
Comments
0 comments
Please sign in to leave a comment.