Briefings in bioinformatics2018Sep28Vol.19issue(5)

プロテオミクスにおけるde novoシーケンスの評価：すでにデータベース駆動のペプチド識別に代わる正確な代替品ですか？

文献タイプ：

Evaluation Study
Journal Article
Research Support, Non-U.S. Gov't

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

質量分析（MS）ベースのショットガンプロテオミクスのペプチド同定は、データベース検索方法を使用して主に取得されますが、最新のMS機器の高解像度スペクトルデータは、最近の計算DE NOVOペプチドシーケンスのパフォーマンスを改善する見込みを提供します。de novoシーケンスの主な利点は、実験的タンデム質量分析スペクトルから直接フルレングスまたは部分的なタグベースのペプチド配列を推測するための参照データベースを必要としないことです。自動化されたde novoシーケンスのためにさまざまなアルゴリズムが開発されていますが、提案されたソリューションの予測精度は、独立したベンチマーク研究ではほとんど評価されていません。この作業の主な目的は、高解像度データ上のde novoシーケンスアルゴリズムのパフォーマンスに関する詳細な評価を提供することです。この目的のために、ソフトウェアパッケージNovor、Peaks、Pepnovoを使用して、衝突誘導解離とより高いエネルギー衝突解離（HCD）断片化モードからさまざまな機器タイプから取得した4つの実験データセットを処理しました。さらに、これらのアルゴリズムの精度は、ピーク強度予測ソフトウェアから生成されたシミュレートされたスペクトルに基づいて、グラウンドトゥルースデータでテストされています。Novorは、正しい完全なペプチド、タグベース、および単一レシドの予測の精度に関して、ピークとペプノボと比較して全体的な最高のパフォーマンスを示すことがわかりました。さらに、同じツールが、約12〜17の要因によるランニングタイムスピードアップの点で、商業競合他社のピークを上回りました。全体としてとられるHCDデータセットの完全なペプチド配列の約35％の予測精度にもかかわらず、評価されたアルゴリズムは実験データで適度に実行されますが、シミュレートされたデータでは大幅に優れたパフォーマンスを示します（最大84％の精度）。さらに、最も頻繁に発生するde novoシーケンスエラーを説明し、精度に対する断片イオンのピークとスペクトルノイズの影響を評価します。最後に、現場でより広く使用されるようになったDe Novoシーケンスの可能性について説明します。

While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google