Please, note that the criteria to decide whether a particular experiment was successful, or not, are dependent on the capture design, sequencing platform and data analysis pipeline. The default warning thresholds are based on general whole exome enrichment experiments. If these default thresholds are not appropriated for your experiment, they can be modified by editing the configuration file config.py. As a general guideline, sensitivity parameters are more relevant than specifity parameters and the latter more than uniformity ones according to their impact in the performance of target enrichment experiments
Percentage of target bases covered as a function of a coverage threshold. Y axis represents the % of target bases covered.
X axis represents different coverage thresholds. Each bar represents the percentage of target bases covered at the given
A typical target-enrichment NGS experiment results in ~90% of target-bases covered at coverage >=1x. This value tends to
decrease as the coverage threshold increases. How fast this percentage decreases with the coverage increment depends on
the specific experimental design/results. A warning is issued if the percentage of bases with coverage >= 1.0x is less than a 90%
for any of the samples.
Percentage of target-bases covered at coverage >=10x as a function of the number of mapped reads. Sequencing depth simulations
were carried out by randomly selecting 0.0990336x106, 0.1980672x106, 0.2971008x106, 0.3961344x106 and 0.495168x106 reads from the bam file. For each of these sets of reads, the percentage of target-covered
positions was calculated. This graph aims to give an idea on how much one can improve the percentage of target-bases covered by
A flat curve on the right part indicates that resequencing will not improve the number of target-bases covered at 10x. A
warning is issued if the curve does not tend to saturation on the right side (slope between the two last points > 1e-05). If the maximum depth provided as input is greater
than the number of reads in the bam file, the last x-value corresponds to the number of reads in the bam file.
Overall percentage of reads on target:
Bars represent the percentage of reads on-target per chr. Percentages for each bar were calculated relative to the total number of reads mapped in the
corresponding chromosome. Enrichment was calculated as: (on-target reads per Kb)/(off-target reads per Kb).
In a typical experiment one may expect ~80% of reads mapping on-target. A warning is issued if the % of reads on-target is lower
than 80% for any of the samples.
Percentages of duplicated on/off-target reads. Reads mapping at exactly the same starting and ending position were considered to
be duplicated. X axis indicates de number of times the reads are duplicated (1 indicates unique reads). Green and red bars indicate
the percentage of on- and off-target reads with respect to the total number of on-/off-target reads respectively.
One may expect a greater proportion of duplicated reads on-target due to the enrichment process. Duplicated off-target
reads should be due to some other experimental artifacts (e.g. PCR). Thus, a warning is issued if the percentage of duplicated on-target
reads is lower thant the percentage of duplicated off-target reads for any of the samples
Distribution of coverage per target base (only bases with coverage >= 1x are shown on the left graph). The star symbol in the boxplot graph indicates the mean coverage.
Low-medium coverage experiments may present a mean coverage of ~40x. A warning is issued if mean coverage is below 40x for any of the samples.
Coverage found at each target base. One graph is provided for each chromosome (contig) in the target bed. X axis
represents target positions. Only target bases are represented in the X axis: target regions appear
concatenated. Y axis represents coverage. The .txt file lists all those target intervals with 0 coverage.
Wide gaps or peaks may indicate capture biases. A warning is issued if more than 100 consecutive bases lie below <6x for any of the samples.
Distribution of the standard deviation of the coverage within target regions. In other words, for each target
region the standard deviation of the coverage per base is calculated. All of these "standard deviations"
are sampled to draw the histogram/boxplot above (y axis of the boxplot appears in log-scale).
Given a target region, it is usual to observe the below shown coverage profiles:
Bases near the 5'/3' ends of target regions tend to be worse covered than bases located in the middle of
target regions. Graphs in this section are informative of the coverage variations within target regions, and
are mainly useful to compare different target-enrichment NGS experiments. The lower the mean of this
distribution is, the more uniform the coverage is within target regions.
A warning is issued if normalized std is greater than 0.3 for any of the samples.
For each target region, the mean coverage of its bases is calculated as well as the percentaje of Gs and Cs
it contains. For each target region, a point (GCcontent,Meancoverage) is painted in the graph. This drawing
allows to observe sequencing biases which depend on the GC content of target regions. For example, lower
coverage in sequencing regions with high GC or high AT content has long been observed. GC bias in sequencing
studies is in large part due to early PCR steps during library generation where high and low GC content cause
reduced amplification and therefore lower sequencing coverage.