BMC research notes2014Dec01Vol.7issue()

ヒトの次世代シーケンスデータに関する挿入/削除呼び出しアルゴリズムの比較

PMID：25435282DOI：10.1186/1756-0500-7-864

文献タイプ：

Comparative Study
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Validation Study

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

背景：挿入/削除（インデル）は、2番目に一般的なタイプのゲノムバリアントであり、最も一般的なタイプの構造バリアントです。次世代シーケンスデータにおけるインデルの識別は課題であり、インデル検出に一般的に使用されるアルゴリズムは、人間の被験者ゲノムデータの研究コホートで比較されていません。生物学的に重要なインデルの最適な検出のためのガイドラインは限られています。インデル検出のための3つのアルゴリズムを使用して、3セットのヒトの次世代シーケンスデータ（200遺伝子ターゲットエクソンシーケンスの48サンプル、全エクソームシーケンスの45サンプル、および全ゲノムシーケンスの2つのサンプル）を分析しました。およびhaplotypecaller）。結果：3つのアルゴリズム全体でインデルコールの変動が観察されました。3つのツールの交差点は、標的エクソンの5.70％のみ、エクソーム全体の19.52％、および全ゲノムインデルコールの14.25％で構成されていました。不一致のインデルの大部分は、読み取り深さが低く、偽陽性である可能性が高い。ソフトウェアパラメーターが3つのターゲット全体で一貫性を維持すると、HaplotypeCallerが最も信頼できる結果を生み出しました。Pindelの結果は、さまざまな読み取り深さと実行ごとのサンプルの数を説明するためにパラメーターを調整することなく、十分に検証しませんでした。PindelのM（イベントの最小サポート）パラメーターの調整により、一致率と検証率の両方が改善されました。ピンデルは、GATKアルゴリズムの長さの機能を上回る大きな削除を特定することができました。結論：Indelの識別に観察された変動性にもかかわらず、特定のデータセットの個々のアルゴリズム間の強みを識別しました。これにより、インデルコールのベストプラクティスを提案することができました。ターゲットを絞ったエクソンシーケンスで行われたインデルコールのピンデルの低い検証率は、HaploypeTecallerが非常に高い読み取り深度のターゲットでの短いインデルとマルチサンプルの実行に適していることを示唆しています。Pindelは、イベントの最小限のサポートの最適化を可能にし、低い読み取り深度での大きなインデルの検出に最適です。

BACKGROUND: Insertions/deletions (indels) are the second most common type of genomic variant and the most common type of structural variant. Identification of indels in next generation sequencing data is a challenge, and algorithms commonly used for indel detection have not been compared on a research cohort of human subject genomic data. Guidelines for the optimal detection of biologically significant indels are limited. We analyzed three sets of human next generation sequencing data (48 samples of a 200 gene target exon sequencing, 45 samples of whole exome sequencing, and 2 samples of whole genome sequencing) using three algorithms for indel detection (Pindel, Genome Analysis Tool Kit's UnifiedGenotyper and HaplotypeCaller). RESULTS: We observed variation in indel calls across the three algorithms. The intersection of the three tools comprised only 5.70% of targeted exon, 19.52% of whole exome, and 14.25% of whole genome indel calls. The majority of the discordant indels were of lower read depth and likely to be false positives. When software parameters were kept consistent across the three targets, HaplotypeCaller produced the most reliable results. Pindel results did not validate well without adjustments to parameters to account for varied read depth and number of samples per run. Adjustments to Pindel's M (minimum support for event) parameter improved both concordance and validation rates. Pindel was able to identify large deletions that surpassed the length capabilities of the GATK algorithms. CONCLUSIONS: Despite the observed variability in indel identification, we discerned strengths among the individual algorithms on specific data sets. This allowed us to suggest best practices for indel calling. Pindel's low validation rate of indel calls made in targeted exon sequencing suggests that HaplotypeCaller is better suited for short indels and multi-sample runs in targets with very high read depth. Pindel allows for optimization of minimum support for events and is best used for detection of larger indels at lower read depths.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google