Overview
In this short article we explain how to interpret phylogenetic trees created with metagenomic (mNGS) samples. Click here to learn about the phylogenetic tree pipeline.
Interpreting a Phylogenetic Tree
When interpreting CZ ID's phylogenetic trees, take into consideration that trees are based on split kmer analysis (SKA). This means that sequences are compared based on matching short stretches of sequence (i.e., split kmers). This is important for interpreting the results because distances shown on the tree refer to “relative” genetic distance between matching split kmers rather than complete sequences. The more similar sequences are, the more stretches of sequences are compared between them. This allows you to quickly determine general phylogenetic relationships between closely-related samples and if sequences are identical or not. However, this approach does not provide accurate genetic distances between sequences and cannot be used to compare divergent sequences.
Schematic showing how the relative distance between two sequences is calculated through SKA.
The phylogenetic trees show the relative distance between taxon sequences. Short or no distance between samples indicates that they are more similar to each other compared to samples that cluster more distantly in the tree.
In the example below, FGS-G5-Boogie and FGS-K7-Boogie samples (Group A) are identical to each other (0 relative distance) and the same is observed for FGS-O1 Ban samples (Group B).
However, FGS-GS-Boogie and FGS-K7-Boogie samples (Group A above) have a relative distance of 8 against FGS-01-Ban samples (Group B). This distance means that there were 8 nucleotide variations between aligned split kmers.
Comments
0 comments
Please sign in to leave a comment.