ERCCs
Commercially available External RNA Controls Consortium (ERCC) spike-in mixes containing 92 synthetic RNAs can be used to assess experimental variability. Generated ERCC read data can also be used to estimate input RNA mass for samples that had concentrations that were too low to quantify during library prep. The total input RNA (in picograms) for a sample can be estimated using ERCC spike-ins by:
input_pg = ercc_pg / ercc_reads * (total_reads - total_ercc_reads)
If you are interested in using ERCC and total read counts in your analysis, you can find this information in the Project and Sample Report pages. You can also download raw ERCC counts.
Finding ERCC Data in the Project Page
You can view ERCC and Total read counts for all samples in the Sample Table found in the Project page. Make sure ERCC Reads and Total Reads are selected to display in the table.
Finding ERCC Data in the Sample Report Page
ERCC information is displayed in the Sample Details panel found in the Sample Report page. To open the Sample Details Panel, go to the Sample Report of interest and click the "Sample Details" link on the right-hand side of the page.
The Sample Details panel will open on the right of the page. Find ERCC and Total Read counts within the Pipelines tab.
In a clean, high-quality sequenced sample, counts for each ERCC sequence read should linearly track their spike-in concentrations. You can assess this correlation by looking at the ERCC Read Counts vs ERCC Spike-in Concentrations graph. The graph is found in the same Pipelines tab of the Sample Details panel that contains ERCC read counts. To find the graph, click "ERCC Spike-in Counts".
Downloading Raw ERCC Counts
You can download raw ERCC counts by looking at intermediate pipeline steps where the counts were generated. The tool used to generate the ERCC counts will depend on the pipeline version of the project (see Illumina Pipeline Updates for details). mNGS Illumina pipeline version 7 (v7) uses STAR to detect and calculate ERCC counts, whereas pipeline v8 (and later) uses Bowtie2.
mNGS Pipeline v7
For projects created before April 19, 2023 (mNGS pipeline v7), raw ERCC counts can be found in the reads_per_gene.star.tab file generated from STAR. This intermediate file includes host gene counts and ERCC read counts. You can find the reads_per_gene.star.tab file by going to the pipeline visualization and selecting the STAR step. Download the file from the Step Details panel that will open on the right-hand side of the page.
The STAR step operates on paired-end files but it collapses the read pairs and effectively treats the data as single-ended throughout the pipeline. Therefore, you should multiply ERCC counts found in the reads_per_gene.star.tab file by 2 if you are using paired-end data.
If you need to programmatically obtain Total ERCC read counts, they can be retrieved from the reads_per_gene.star.tab results file as follows:
grep "^ERCC" reads_per_gene.star.tab | awk '{sum += $2} END {print 2*sum}'
That is, filter to all rows starting with "ERCC", sum the second column, then multiply by 2.
mNGS Pipeline v8
For projects created after April 19, 2023 (mNGS pipeline v8 and later), raw ERCC counts can be found in the bowtie2_ERCC_counts.tsv file generated from Bowtie2. This intermediate file only includes ERCC counts. You can find the bowtie2_ERCC_counts.tsv file by going to the pipeline visualization and selecting the ERCC Bowtie2 Filter step. Download the file from the Step Details panel that will open on the right-hand side of the page.
Comments
0 comments
Please sign in to leave a comment.