Genes2019Nov12Vol.10issue(11)

タンパク質配列のグローバルベクター表現と、マルチグレインカスケード森林モデルを使用して自己相互作用タンパク質を予測するためのアプリケーション

文献タイプ：

Journal Article
Research Support, Non-U.S. Gov't

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

自己相互作用タンパク質（SIP）は、現在の分子生物学で最も重要です。過去数年間にSIPを予測するための多くの伝統的な生物学的実験方法が開発されてきました。ただし、これらの方法は費用がかかり、時間がかかり、非効率的であり、多くの場合、SIPを予測するための使用法を制限します。したがって、計算方法の開発は、時間に必要です。この論文では、タンパク質配列情報からの潜在的なSIPS予測のための自然言語処理（NLP）の方法を組み合わせた新しいディープラーニングモデルを初めて提案しました。より具体的には、タンパク質配列はK-Mersによって組み立てられたde novoです。次に、自然言語処理（NLP）技術を使用して、各タンパク質配列のグローバルなベクトル表現を取得しました。最後に、既知の自己相互作用および非接続タンパク質の知識に基づいて、SIPを予測するためにマルチグレインのカスケード森林モデルが訓練されています。包括的な実験は、酵母およびヒトデータセットでそれぞれ91.45％と93.12％の精度率を得ました。私たちの評価から、実験結果は、アミノ酸セマンティクス情報の使用が、タンパク質の自己相互作用ペアと非インタラクティブペアの両方を含むシーケンスの問題に対処するのに非常に役立つことを示しています。この作業には、さまざまな生物学的分類の問題に対する潜在的な応用があります。

Self-interacting proteins (SIPs) is of paramount importance in current molecular biology. There have been developed a number of traditional biological experiment methods for predicting SIPs in the past few years. However, these methods are costly, time-consuming and inefficient, and often limit their usage for predicting SIPs. Therefore, the development of computational method emerges at the times require. In this paper, we for the first time proposed a novel deep learning model which combined natural language processing (NLP) method for potential SIPs prediction from the protein sequence information. More specifically, the protein sequence is de novo assembled by k-mers. Then, we obtained the global vectors representation for each protein sequences by using natural language processing (NLP) technique. Finally, based on the knowledge of known self-interacting and non-interacting proteins, a multi-grained cascade forest model is trained to predict SIPs. Comprehensive experiments were performed on yeast and human datasets, which obtained an accuracy rate of 91.45% and 93.12%, respectively. From our evaluations, the experimental results show that the use of amino acid semantics information is very helpful for addressing the problem of sequences containing both self-interacting and non-interacting pairs of proteins. This work would have potential applications for various biological classification problems.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google