Bioinformatics advances20240101Vol.4issue(1)

Coneco：タンパク質複合体の名前付きエンティティ認識と正常化のためのコーパス

文献タイプ：

Journal Article

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

動機：生物医学情報抽出の大きな進歩にもかかわらず、タンパク質含有複合体の名前付きエンティティ認識（NER）および名前付きエンティティ正規化（NEN）のリソースが不足しています。現在のリソースは、さまざまな生物にわたるタンパク質を含む複雑な名前の認識に不十分に対処し、専用のコーパスの重要なニーズを強調しています。結果：複合体のNERとNENの注釈付きコーパスであるEntity Corpus（Coneco）という名前の複合体を紹介します。Conecoは、1976年の2052エンティティを持つ1621の文書で構成されており、そのうち遺伝子オントロジーに正常化されています。コーパスをトレーニング、開発、テストセットに分割し、トランスベースと辞書ベースのタガーの両方を訓練しました。テストセットでの評価は、それぞれ73.7％と61.2％のFスコアで堅牢なパフォーマンスを示しました。その後、公然とアクセス可能な生物医学文献全体の包括的なタグ付けに最適なタガーを適用しました。可用性と実装：注釈付きコーパス、トレーニングデータ、コードを含むすべてのリソースは、Zenodo https://zenodo.org/records/11263147およびGithub https://zenodo.org/records/10693653を通じてコミュニティが利用できます。

MOTIVATION: Despite significant progress in biomedical information extraction, there is a lack of resources for Named Entity Recognition (NER) and Named Entity Normalization (NEN) of protein-containing complexes. Current resources inadequately address the recognition of protein-containing complex names across different organisms, underscoring the crucial need for a dedicated corpus. RESULTS: We introduce the Complex Named Entity Corpus (CoNECo), an annotated corpus for NER and NEN of complexes. CoNECo comprises 1621 documents with 2052 entities, 1976 of which are normalized to Gene Ontology. We divided the corpus into training, development, and test sets and trained both a transformer-based and dictionary-based tagger on them. Evaluation on the test set demonstrated robust performance, with F-scores of 73.7% and 61.2%, respectively. Subsequently, we applied the best taggers for comprehensive tagging of the entire openly accessible biomedical literature. AVAILABILITY AND IMPLEMENTATION: All resources, including the annotated corpus, training data, and code, are available to the community through Zenodo https://zenodo.org/records/11263147 and GitHub https://zenodo.org/records/10693653.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google