Percentage of target bases covered as a function of a coverage threshold. Y axis represents the % of target bases covered. X axis represents different coverage thresholds. Each bar represents the percentage of target bases covered at the given coverage threshold.
A typical target-enrichment NGS experiment results in ~90% of target-bases covered at coverage >=1x. This value tends to decrease as the coverage threshold increases. How fast this percentage decreases with the coverage increment depends on the specific experimental design/results. A warning is issued if the percentage of bases with coverage >= 1.0x is less than a 90% for any of the samples.
Percentage of target-bases covered at coverage >=10x as a function of the number of mapped reads. Sequencing depth simulations were carried out by randomly selecting 0.0990336x106, 0.1980672x106, 0.2971008x106, 0.3961344x106 and 0.495168x106 reads from the bam file. For each of these sets of reads, the percentage of target-covered positions was calculated. This graph aims to give an idea on how much one can improve the percentage of target-bases covered by resequencing.
A flat curve on the right part indicates that resequencing will not improve the number of target-bases covered at 10x. A warning is issued if the curve does not tend to saturation on the right side (slope between the two last points > 1e-05). If the maximum depth provided as input is greater than the number of reads in the bam file, the last x-value corresponds to the number of reads in the bam file.
Overall percentage of reads on target:
Overall enrichment:
Bars represent the percentage of reads on-target per chr. Percentages for each bar were calculated relative to the total number of reads mapped in the corresponding chromosome. Enrichment was calculated as: (on-target reads per Kb)/(off-target reads per Kb).
In a typical experiment one may expect ~80% of reads mapping on-target. A warning is issued if the % of reads on-target is lower than 80% for any of the samples.
Percentages of duplicated on/off-target reads. Reads mapping at exactly the same starting and ending position were considered to be duplicated. X axis indicates de number of times the reads are duplicated (1 indicates unique reads). Green and red bars indicate the percentage of on- and off-target reads with respect to the total number of on-/off-target reads respectively.
One may expect a greater proportion of duplicated reads on-target due to the enrichment process. Duplicated off-target reads should be due to some other experimental artifacts (e.g. PCR). Thus, a warning is issued if the percentage of duplicated on-target reads is lower thant the percentage of duplicated off-target reads for any of the samples
Distribution of coverage per target base (only bases with coverage >= 1x are shown on the left graph). The star symbol in the boxplot graph indicates the mean coverage.
Low-medium coverage experiments may present a mean coverage of ~40x. A warning is issued if mean coverage is below 40x for any of the samples.
Coverage found at each target base. One graph is provided for each chromosome (contig) in the target bed. X axis represents target positions. Only target bases are represented in the X axis: target regions appear concatenated. Y axis represents coverage. The .txt file lists all those target intervals with 0 coverage.
Wide gaps or peaks may indicate capture biases. A warning is issued if more than 100 consecutive bases lie below <6x for any of the samples.
Distribution of the standard deviation of the coverage within target regions. In other words, for each target region the standard deviation of the coverage per base is calculated. All of these "standard deviations" are sampled to draw the histogram/boxplot above (y axis of the boxplot appears in log-scale).
Given a target region, it is usual to observe the below shown coverage profiles:
Bases near the 5'/3' ends of target regions tend to be worse covered than bases located in the middle of target regions. Graphs in this section are informative of the coverage variations within target regions, and are mainly useful to compare different target-enrichment NGS experiments. The lower the mean of this distribution is, the more uniform the coverage is within target regions.
A warning is issued if normalized std is greater than 0.3 for any of the samples.
For each target region, the mean coverage of its bases is calculated as well as the percentaje of Gs and Cs it contains. For each target region, a point (GCcontent,Meancoverage) is painted in the graph. This drawing allows to observe sequencing biases which depend on the GC content of target regions. For example, lower coverage in sequencing regions with high GC or high AT content has long been observed. GC bias in sequencing studies is in large part due to early PCR steps during library generation where high and low GC content cause reduced amplification and therefore lower sequencing coverage.