Jump to Section:
Overview
CZ ID's Viral Consensus Genome pipeline accepts BED files used to trim primers during consensus genome assembly. Primer BED files should be included when sequences have been obtained through PCR-based methods or target enrichment given that primer trimming increases the accuracy of read mapping against the reference genome. Here we explain when you will need BED files, information included in these files, and how to create them.
When Do I Need a BED File?
You should submit a BED file specifying primer positions when uploading viral genome data obtained through primer spiking for target enrichment (e.g., MSSPE) or PCR-based assays. The information is used to soft-clip primers from sequences after aligning reads to a reference sequence to maximize the accuracy of mutation detection. Note that if you obtained your sequences through shotgun sequencing alone, you DO NOT need a primer BED file.
You can provide a BED file while uploading data to the viral consensus genome pipeline.
What is a BED File?
A BED (Browser Extensible Data) file is a tab-delimited file that contains information about specific genomic regions (i.e., genome annotations). BED files (file extension ".bed") provide a simple format to define genomic regions by specifying coordinates and annotations. These files require three mandatory fields or columns and may have up to nine additional columns (see BED file optional fields for details).
Mandatory BED File Columns
The mandatory columns provide the following information:
Examples:
Example 1: If you want to annotate or specify bases 1 through 30 of the VirusX genome, the BED file mandatory columns will read as follows in a single line:
Explanation: The programs reading the BED file would use base 0 to 29 from the VirusX genome, and not from 0 to 30 (which would be 31 bases). Remember, when you look at a BED file, the start position (column 2) should be interpreted to be one greater than the listed position (in this example "0+1"), whereas the end position (column 3) is listed in standard format (end position is 30).
Example 2: If you want to annotate or specify more regions, simply add a line per annotation or feature. For example, if you also want to specify a second region specifying positions 80 to 100 to the example above it would read as follows:
Columns Included in Primer BED Files
In addition to the three mandatory columns, the BED file used for the Viral Consensus Genome pipeline needs to include columns providing primer information.
The columns that are relevant when specifying primers include the following:
Let's review:
Below is an example of a BED file specifying 12 primers for the SARS-CoV-2 reference genome (accession number MN908947.3).
Based on the BED file above:
- Which genome region does Primer_CoV36 target?
- Is it a forward or reverse primer?
Answer: Primer_CoV36 is a reverse primer that aligns to positions 556 through 573 of the SARS-CoV-2 reference genome.
How Do I Prepare a BED File?
Given that BED files have a tab-delimited format, you can create these files in any table or word processor that will enable you to save files as tab-delimited. If you don’t have primer position information or need to verify position coordinates see section below.
To prepare your BED file:
1. Name the reference genome, scaffold, or contig of interest based on the description specified in the sequence FASTA file.
2. Prepare your file with six columns and no headers. The file should have one row per primer.
Columns should be in the following order:
- Reference name based on the sequence FASTA file
- Start position (base zero format, i.e., subtract one from the actual starting position)
- End position (standard format)
- Primer name
- Score (any number between 0 and 1000)
- Strand (“+” or “-”)
Preparing a BED file in Excel
3. Save your file as tab-delimited text (file extension ".txt") or tab-separated values (".tsv").
4. Rename your file extension as ".bed" (e.g., VirusX_primers.bed).
5. File is now ready to be uploaded to CZ ID.
How Do I Find Primer Positions?
If you have primer sequences, you can find primer positions and general primer stats by aligning your primers to the reference sequence using Primer-BLAST. One drawback of Primer-BLAST is that you can only search primer positions for one primer pair at a time. You can use Primer Map to search primer positions for multiple primers at the same time. Collect primer positions from the output to create your BED file.
Primer-BLAST interface highlighting relevant inputs and outputs for collecting primer information. In this example, we looked for primer positions using a SARS-CoV-2 sequence with accession number LC528233.2.
Primer Map interface highlighting inputs and outputs for collecting primer position coordinates.In this example, we looked for primer positions using a SARS-CoV-2 sequence with accession number LC528233.2.
Comments
0 comments
Please sign in to leave a comment.