Jump to Section:
Overview
The CZ ID phylogenetic tree pipeline for metagenomic (mNGS) samples enables you to easily construct phylogenetic trees for organisms found within multiple samples and include reference sequences. This is useful when evaluating sequence similarity in the context of sample contamination or identification of potential outbreaks. Here we outline steps to build and interpret phylogenetic trees for taxa identified within mNGS samples.
After reading this guide, you will be able to:
- Learn about sample selection
- Create phylogenetic trees from Project and Sample Report pages
- Color tree by metadata
- Download tree data
Selecting samples for phylogenetic trees
When creating a phylogenetic tree for an organism of interest, CZ ID automatically detects samples containing the organism within your project and public projects across CZ ID. You then select which samples you would like to include in the analysis. When selecting samples, keep in mind that it is best to use samples with relatively high coverage of the taxon of interest. Low-coverage samples may not be suitable for tree building and the tree may fail (see Why a pairwise matrix instead of a tree?).
In addition to selected samples, a maximum of 10 reference sequences are automatically added from the NCBI database. These reference accessions are selected based on the highest number of reads aligning to sequences in the database. Given that the coverage visualization selectively weights contig matches, reference accessions for phylogenetic tree building might differ from those used for coverage visualization. See steps below for how to create phylogenetic trees from the Project and Sample Report pages.
Creating a phylogenetic tree from a Project page
To create a phylogenetic tree from a Project page:
- Go to the Metagenomics tab under a Project page of interest
- (Optional) Select samples of interest. You might select samples based on a heatmap where you identified a set of samples that contain the taxon that will be used for phylogenetic analysis.You don’t have to select samples given that the platform will automatically detect samples containing the taxon of interest (step 5).
- Click the More Actions icon on the right-hand side of the page and select "Create Phylogenetic Tree" from the dropdown menu.
-
A Phylogenetic Trees modal will appear where you see a list of trees previously created from your account and CZ ID Public projects. Click "Create new tree".
-
You will be prompted to select project, taxon, and samples of interest. Follow the prompts and click “Create Tree” when you are done making your selections. When selecting samples, keep in mind that it is best to use samples with relatively high coverage of the taxon of interest. Low-coverage samples may not be suitable for tree building and the tree may fail (see Why a pairwise matrix instead of a tree?). You can use the “Coverage Breadth” column in the modal to determine which samples to add (coverage breadth value is the coverage for the top accession).Note that it may take a while for the modal to load while the platform searches for samples containing the taxon of interest
-
You should receive confirmation that your tree is being created! The number of samples and the length of the contigs will affect the amount of time it will take for the tree to be created. It can take anywhere from 10 minutes to 2 hours.
-
Navigate to the “My data” page to view your phylogenetic tree under the Visualizations tab.
- Once completed, click on the tree name of interest to view the tree. If the data was not suitable to build a tree, you will see a pairwise distance matrix instead of a tree.
Creating a phylogenetic tree from the Sample Report table
To create a phylogenetic tree from the Sample Report table:
- Go to the Sample Report page for a metagenomic sample of interest and find the species of interest listed on the report table.
- Hover over the species name to see analysis icons and click on the Phylogenetic Analysis icon.
- A phylogenetic tree modal will appear for you to select samples you would like to include in the analysis. Follow the prompts and click “Create Tree” when you are done making your selections. When selecting samples, keep in mind that it is best to use samples with relatively high coverage of the taxon of interest. Low-coverage samples may not be suitable for tree building and the tree may fail. When the tree fails you will obtain a pairwise matrix instead (see Why a pairwise matrix instead of a tree?). You can use the “Coverage Breadth” column in the modal to determine which samples to add (coverage breadth value is the coverage for the top accession).Note that it may take a while for the modal to load while the platform searches for samples containing the taxon of interest.
-
You should receive confirmation that your tree is being created! The number of samples and the length of the contigs will affect the amount of time it will take for the tree to be created. It can take anywhere from 10 minutes to 2 hours.
- Navigate to the “My data” page to view your phylogenetic tree under the Visualizations tab.
- Once completed, click on the tree name of interest to view the tree. If the data was not suitable to build a tree, you will see a pairwise distance matrix instead of a tree.
Coloring the tree by metadata
The tree is automatically labeled by project name and NCBI database. However, you can easily change this to highlight metadata on the tree by clicking on the “Color by” dropdown menu. From here, you can choose which metadata you would like the tree to show. If you choose “location”, the tree branch and sample name colors will reflect the location where samples were collected. If you would like to return to the original tree, change “color by” back to “Project name”.
Download phylogenetic tree data
You can easily download tree data using the Download button on the right-hand side of the Phylogenetic Tree Visualization page.
Downloads available for the phylogenetic tree include:
- Tree file (.nwk): Tree file in Newick format. You can use this file to view the tree using other software (such as MEGA) to edit the tree.
- Tree image (SVG format)
- Tree image (PNG format)
- SKA distance (TSV format): Download the tab-delimited file to view mismatches, Mash-like distances, number of SNPs, and SNP distances between samples. You can read about each of the metrics here.
- SKA variants (.aln): Use this file to view split kmer alignments.
Comments
0 comments
Please sign in to leave a comment.