PloS one20200101Vol.15issue(8)

人間および機械学習アルゴリズムの相互作用におけるバイアスの進化と影響

PMID：32790666DOI：10.1371/journal.pone.0235502

文献タイプ：

Journal Article
Research Support, U.S. Gov't, Non-P.H.S.

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

従来、機械学習アルゴリズムは、予測を構築するために専門家からの信頼できるラベルに依存していました。しかし、最近では、アルゴリズムはラベル付け、注釈などの形で一般集団からデータを受信しています。その結果、アルゴリズムは、バイアスサンプルやバイアスラベルなどのチェックされていない情報の摂取から生まれるバイアスの影響を受けます。さらに、人とアルゴリズムは、人間もアルゴリズムも公平なデータを受信しないインタラクティブなプロセスにますます関与しています。アルゴリズムは、偏った予測も行い、現在アルゴリズムバイアスとして知られているものにつながる可能性があります。一方、アルゴリズムバイアスを使用した機械学習方法の出力に対する人間の反応は、偏った情報に基づいて決定を下すことで状況を悪化させます。これは、後でアルゴリズムによって消費されるでしょう。最近の研究では、社会に対する機械学習のアルゴリズムバイアスの倫理的および道徳的な意味に焦点を当てています。しかし、ほとんどの研究では、これまでのところ、アルゴリズムバイアスを静的因子として扱いましたが、これはバイアスの動的および反復特性をキャプチャできません。アルゴリズムバイアスは、アルゴリズムのパフォーマンスに長期的な影響を与える反復的な方法で人間と相互作用すると主張します。この目的のために、人間の言語の進化からインスピレーションを得て、機械学習アルゴリズムと人間の相互作用を研究する繰り返しの学習フレームワークを提示します。私たちの目標は、相互作用するバイアスの2つのソースを研究することです。アルゴリズムが情報のサブセットを選択して人々に提示するプロセス（反復アルゴリズムバイアスモード）。3つの形式の反復アルゴリズムバイアス（パーソナライズフィルター、アクティブ学習、ランダム）と、各タイプのバイアスの影響に関する研究質問を策定することにより、機械学習アルゴリズムのパフォーマンスにどのように影響するかを調査します。いくつかの制御された実験の結果の統計分析に基づいて、3つの異なる反復バイアスモード、および初期トレーニングデータクラスの不均衡と人間の行動は、機械学習アルゴリズムによって学習したモデルに影響を与えることがわかりました。また、パーソナライズされたユーザーインターフェイスで顕著な反復フィルターバイアスは、推定関連のより不平等を引き起こし、関連するデータを発見する人間の能力が限られていることがわかりました。私たちの調査結果は、関連するアイテムを予測するコンテンツベースのフィルターを使用している場合、関連性のある可能性が0.5未満であり、人間から隠されるリスクがあるテストセットからのアイテム）が、関連するすべてのアイテムの4％に相当する関連性の死角（テストセットの項目）を示しています。。実際の評価データセットを使用した同様のシミュレーションでは、同じフィルターが関連するテストセットの75％の死角サイズが得られることがわかりました。

Traditionally, machine learning algorithms relied on reliable labels from experts to build predictions. More recently however, algorithms have been receiving data from the general population in the form of labeling, annotations, etc. The result is that algorithms are subject to bias that is born from ingesting unchecked information, such as biased samples and biased labels. Furthermore, people and algorithms are increasingly engaged in interactive processes wherein neither the human nor the algorithms receive unbiased data. Algorithms can also make biased predictions, leading to what is now known as algorithmic bias. On the other hand, human's reaction to the output of machine learning methods with algorithmic bias worsen the situations by making decision based on biased information, which will probably be consumed by algorithms later. Some recent research has focused on the ethical and moral implication of machine learning algorithmic bias on society. However, most research has so far treated algorithmic bias as a static factor, which fails to capture the dynamic and iterative properties of bias. We argue that algorithmic bias interacts with humans in an iterative manner, which has a long-term effect on algorithms' performance. For this purpose, we present an iterated-learning framework that is inspired from human language evolution to study the interaction between machine learning algorithms and humans. Our goal is to study two sources of bias that interact: the process by which people select information to label (human action); and the process by which an algorithm selects the subset of information to present to people (iterated algorithmic bias mode). We investigate three forms of iterated algorithmic bias (personalization filter, active learning, and random) and how they affect the performance of machine learning algorithms by formulating research questions about the impact of each type of bias. Based on statistical analyses of the results of several controlled experiments, we found that the three different iterated bias modes, as well as initial training data class imbalance and human action, do affect the models learned by machine learning algorithms. We also found that iterated filter bias, which is prominent in personalized user interfaces, can lead to more inequality in estimated relevance and to a limited human ability to discover relevant data. Our findings indicate that the relevance blind spot (items from the testing set whose predicted relevance probability is less than 0.5 and who thus risk being hidden from humans) amounted to 4% of all relevant items when using a content-based filter that predicts relevant items. A similar simulation using a real-life rating data set found that the same filter resulted in a blind spot size of 75% of the relevant testing set.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google