Bioinformatics (Oxford, England)2020Feb15Vol.36issue(4)

Biobert：生物医学テキストマイニングのための事前に訓練された生物医学言語表現モデル

PMID：31501885DOI：10.1093/bioinformatics/btz682

文献タイプ：

Journal Article
Research Support, Non-U.S. Gov't

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

動機：生物医学文書の数が急速に増加するにつれて、生物医学的なテキストマイニングはますます重要になっています。自然言語処理（NLP）の進歩により、生物医学文献から貴重な情報を抽出することで、研究者の間で人気が高まっており、深い学習は効果的な生物医学テキストマイニングモデルの開発を後押ししました。ただし、NLPの進歩を生物医学のテキストマイニングに直接適用すると、一般的なドメインコーパスから生物医学コーパラへの単語分布シフトにより、不十分な結果が得られます。この記事では、最近導入された事前に導入された言語モデルBERTが生物医学のコーパスにどのように適応できるかを調査します。結果：大規模な生物医学コーパスで事前に訓練されたドメイン固有の言語表現モデルであるBiobert（生物医学テキストマイニングのための変圧器からの双方向エンコーダー表現）を紹介します。タスク間でほぼ同じアーキテクチャを備えたBiobertは、バイオメディカルコーポラで事前に訓練された場合、さまざまな生物医学的テキストマイニングタスクでBERTと以前の最先端モデルよりも大部分が優れています。Bertは以前の最先端モデルのパフォーマンスに匹敵するパフォーマンスを取得しますが、Biobertは次の3つの代表的な生物医学的テキストマイニングタスクで大幅に優れています：生物医学名のエンティティ認識（0.62％F1スコア改善）、生物医学的関係抽出（2.80％F1スコアの改善）および生物医学的な質問応答（12.24％MRR改善）。私たちの分析結果は、生物医学のコーパラに関するトレーニング前のBERTが複雑な生物医学的テキストを理解するのに役立つことを示しています。可用性と実装：Biobertの事前に訓練された重量をhttps://github.com/naver/biobert-pretrainedで自由に利用できるようにし、https://github.com/dmisで入手可能な微調整Biobertのソースコードを作成します。-lab/Biobert。

MOTIVATION: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. RESULTS: We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. AVAILABILITY AND IMPLEMENTATION: We make the pre-trained weights of BioBERT freely available at https://github.com/naver/biobert-pretrained, and the source code for fine-tuning BioBERT available at https://github.com/dmis-lab/biobert.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google