※翻訳は機械翻訳サービスを利用しております

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing20050101Vol.issue()

Gotrees：決定ツリーを使用したタンパク質ドメイン組成からのGOアソシエーションの予測

PMID：15759620DOI：

文献タイプ：

Journal Article

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

遺伝子オントロジー（GO）は、タンパク質の生物学的役割を説明する包括的で標準化された方法を提供します。タンパク質には、直接的または間接的な実験的証拠に基づいてGO用語が注釈されます。期間の割り当ては、相同性と文献採掘からも推測されます。使用される証拠の種類に関係なく、GO割り当ては手動でキュレーションされているか電子的です。残念ながら、手動のキュレーションは、出版物やさまざまな大規模な実験データセットから利用できるデータに対応できません。自動化された文献ベースの注釈方法は、注釈をスピードアップするために開発されました。ただし、それらは、実験的に調査されたタンパク質にのみ適用されるか、十分で一貫した注釈を備えた密接なホモログを有するタンパク質に適用されます。GO注釈のための相同性ベースの電子方法の1つは、InterProデータベースによって提供されます。InterPro2GO/PFAM2GOは、個々のタンパク質ドメインをGO用語に関連付けるため、研究されていないタンパク質に注釈を付けるために使用できます。ただし、単一の機能ドメインを介したタンパク質分類には、多数の誤検知を避けるためにストリンジェンシーが必要です。この作業により、基本的なアプローチが広がります。機能的ドメイン含有量全体を介してタンパク質をモデル化し、既知のタンパク質割り当てを使用して、各GO用語の個々の決定ツリー分類子をトレーニングします。私たちのアプローチは、敏感で、具体的かつ正確であり、まばらなデータに対してかなり堅牢であることを実証します。InterPro2GOのパフォーマンスと比較して、方法がより敏感であり、精度の低下のみに苦しむことがわかったことがわかりました。InterPro2GOと比較して、分子機能、生物学的プロセス、細胞GOの項について、感度を22％、27％、および50％改善しました。

The Gene Ontology (GO) offers a comprehensive and standardized way to describe a protein's biological role. Proteins are annotated with GO terms based on direct or indirect experimental evidence. Term assignments are also inferred from homology and literature mining. Regardless of the type of evidence used, GO assignments are manually curated or electronic. Unfortunately, manual curation cannot keep pace with the data, available from publications and various large experimental datasets. Automated literature-based annotation methods have been developed in order to speed up the annotation. However, they only apply to proteins that have been experimentally investigated or have close homologs with sufficient and consistent annotation. One of the homology-based electronic methods for GO annotation is provided by the InterPro database. The InterPro2GO/PFAM2GO associates individual protein domains with GO terms and thus can be used to annotate the less studied proteins. However, protein classification via a single functional domain demands stringency to avoid large number of false positives. This work broadens the basic approach. We model proteins via their entire functional domain content and train individual decision tree classifiers for each GO term using known protein assignments. We demonstrate that our approach is sensitive, specific and precise, as well as fairly robust to sparse data. We have found that our method is more sensitive when compared to the InterPro2GO performance and suffers only some precision decrease. In comparison to the InterPro2GO we have improved the sensitivity by 22%, 27% and 50% for Molecular Function, Biological Process and Cellular GO terms respectively.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google