BMC genomics2022Jul08Vol.23issue(1)

非人間の霊長類におけるプロテオミクスに対するラベルフリーの定量化と欠損値の抑制の評価

PMID：35804317DOI：10.1186/s12864-022-08723-1

文献タイプ：

Journal Article

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

背景：信頼性が高く効果的なラベルフリーの定量化（LFQ）分析は、質量分析計のデータ収集の方法だけでなく、ソフトウェアツール、クエリデータベース、データの正規化、代入を含む下流のデータ処理にも依存します。非ヒト霊長類（NHP）では、NHPのクエリデータベースが限られているため、LFQは困難です。これらの種のゲノムは包括的に注釈が付けられていないためです。これにより、常にタンパク質の限られた発見と関連する翻訳修正（PTM）および欠落データポイントの割合が高くなります。データベースの制限によるタンパク質とPTMSの識別は、重要かつ意味のある生物学的情報の発見に悪影響を与える可能性がありますが、欠損データは下流の分析（多変量分析など）を制限し、統計的なパワーを低下させ、統計的推論をバイアスし、データの生物学的解釈をより生物学的解釈にします。挑戦的。この研究では、両方の問題に対処しようとしました。1つ目は、Metamorphues Proteomics Search Engineを使用してNHPクエリデータベースの限界に対抗し、タンパク質と関連するPTMの発見を最大化し、2つ目は、正確なデータ推論のために異なる代入法を評価しました。欠落データの潜在的なソース（Run全体で不整合されていないM/zまたは欠損値のいずれか）を区別することなく、欠落データ代入分析に一般的なアプローチを使用しました。結果：Metamorpheus Proteomics Search Engineを使用して、NHP脳前頭皮質の多様な年齢範囲にわたって58種類のPTM（生物学、金属、アーティファクト）を含む1622個のタンパク質と10,634個のペプチドの定量データを取得しました。ただし、同定された1622のタンパク質のうち、欠損値のないすべてのサンプルで定量化された293のタンパク質のみが、欠損データを埋めるために正確で統計的な有効な代入法を実装することの重要性を強調しました。帰属分析では、一般化された尾根回帰（GRR）、ランダムフォレスト（RF）、局所最小二乗（LLS）、ベイジアン主成分分析方法（BPCA）などの相関タンパク質から情報を借りる単一の代入法があることを示しています。欠落しているタンパク質の存在量を非常に正確に推定することができます。結論：全体として、この研究はNHPで生成されたLFQデータの詳細な比較分析を提供し、NHPプロテオミクスデータのLFQを改善するための戦略を提案しています。

BACKGROUND: Reliable and effective label-free quantification (LFQ) analyses are dependent not only on the method of data acquisition in the mass spectrometer, but also on the downstream data processing, including software tools, query database, data normalization and imputation. In non-human primates (NHP), LFQ is challenging because the query databases for NHP are limited since the genomes of these species are not comprehensively annotated. This invariably results in limited discovery of proteins and associated Post Translational Modifications (PTMs) and a higher fraction of missing data points. While identification of fewer proteins and PTMs due to database limitations can negatively impact uncovering important and meaningful biological information, missing data also limits downstream analyses (e.g., multivariate analyses), decreases statistical power, biases statistical inference, and makes biological interpretation of the data more challenging. In this study we attempted to address both issues: first, we used the MetaMorphues proteomics search engine to counter the limits of NHP query databases and maximize the discovery of proteins and associated PTMs, and second, we evaluated different imputation methods for accurate data inference. We used a generic approach for missing data imputation analysis without distinguising the potential source of missing data (either non-assigned m/z or missing values across runs). RESULTS: Using the MetaMorpheus proteomics search engine we obtained quantitative data for 1622 proteins and 10,634 peptides including 58 different PTMs (biological, metal and artifacts) across a diverse age range of NHP brain frontal cortex. However, among the 1622 proteins identified, only 293 proteins were quantified across all samples with no missing values, emphasizing the importance of implementing an accurate and statiscaly valid imputation method to fill in missing data. In our imputation analysis we demonstrate that Single Imputation methods that borrow information from correlated proteins such as Generalized Ridge Regression (GRR), Random Forest (RF), local least squares (LLS), and a Bayesian Principal Component Analysis methods (BPCA), are able to estimate missing protein abundance values with great accuracy. CONCLUSIONS: Overall, this study offers a detailed comparative analysis of LFQ data generated in NHP and proposes strategies for improved LFQ in NHP proteomics data.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google