PloS one20130101Vol.8issue(4)

NCBI-NRデータベースからのカスタマイズされたサブデータベースの構築のための巨大なメタゲノムデータセットの迅速な注釈のための爆発とミーガンアプローチを使用した

PMID：23573212DOI：10.1371/journal.pone.0059831

文献タイプ：

Journal Article
Research Support, Non-U.S. Gov't

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

NCBI-NRデータベースからローカルサブデータベースを構築するための高速な方法を開発し、Blast-Meganアプローチに基づいて巨大なメタゲノムデータセットの迅速な類似性検索と注釈を作成しました。直接NCBI-NRデータベースBLAST-MEGANアプローチよりもはるかに少ない計算容量を必要とするはるかに時間効率の良い方法で注釈を行うために、3段階のサブデータベースアノテーションパイプライン（SAP）がさらに提案されました。SAPの1（ST）爆発は、候補ターゲットシーケンスをすばやくスクリーニングするために、構築されたサブデータベースに対して元のメタゲノムデータセットを使用して実施されました。次に、1（ST）BLASTで特定された候補ターゲットシーケンスを、NCBI-NRデータベース全体に対して2（nd）BLASTにかけました。ブラストの結果は、最終的にミーガンを使用して注釈を付けて、1（ST）BLASTで誤って選択されたシーケンスを除外して、結果の精度を保証しました。この研究で実施されたテストに基づいて、SAPは、NCBI-NRデータベースに対する直接的な爆発と比較して、1E-5の爆風電子値で約150〜385回のスピードアップを達成しました。SAPの注釈結果は、非常に時間がかかり、計算集中的な直接NCBI-NRデータベースBlast-Meganアプローチの注釈とまったく一致しています。厳密なしきい値（1E-10の電子値など）を選択すると、SAPプロセスがさらに加速します。SAPパイプラインは、ブラスト以外の新しい類似性検索ツール（Rapsearchなど）と組み合わせて、巨大なメタゲノミックデータセットのさらに速い注釈を実現することもできます。とりわけ、このサブデータベース構造方法とSAPパイプラインは、高性能コンピューティング施設にアクセスすることなく、研究所の新しい時間効率の良い便利な注釈の類似性検索戦略を提供します。SAPは、より類似性の検索タスクを処理するための高性能コンピューティング施設のソリューションも提供しています。

We developed a fast method to construct local sub-databases from the NCBI-nr database for the quick similarity search and annotation of huge metagenomic datasets based on BLAST-MEGAN approach. A three-step sub-database annotation pipeline (SAP) was further proposed to conduct the annotation in a much more time-efficient way which required far less computational capacity than the direct NCBI-nr database BLAST-MEGAN approach. The 1(st) BLAST of SAP was conducted using the original metagenomic dataset against the constructed sub-database for a quick screening of candidate target sequences. Then, the candidate target sequences identified in the 1(st) BLAST were subjected to the 2(nd) BLAST against the whole NCBI-nr database. The BLAST results were finally annotated using MEGAN to filter out those mistakenly selected sequences in the 1(st) BLAST to guarantee the accuracy of the results. Based on the tests conducted in this study, SAP achieved a speedup of ~150-385 times at the BLAST e-value of 1e-5, compared to the direct BLAST against NCBI-nr database. The annotation results of SAP are exactly in agreement with those of the direct NCBI-nr database BLAST-MEGAN approach, which is very time-consuming and computationally intensive. Selecting rigorous thresholds (e.g. e-value of 1e-10) would further accelerate SAP process. The SAP pipeline may also be coupled with novel similarity search tools (e.g. RAPsearch) other than BLAST to achieve even faster annotation of huge metagenomic datasets. Above all, this sub-database construction method and SAP pipeline provides a new time-efficient and convenient annotation similarity search strategy for laboratories without access to high performance computing facilities. SAP also offers a solution to high performance computing facilities for the processing of more similarity search tasks.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google