Bioinformatics (Oxford, England)2006Jul01Vol.22issue(13)

CD-HIT：タンパク質またはヌクレオチド配列の大きなセットをクラスタリングして比較するための高速プログラム

PMID：16731699DOI：10.1093/bioinformatics/btl158

文献タイプ：

Journal Article
Research Support, Non-U.S. Gov't

5大医学誌の要約と
著名医師による解説が無料で読めます

会員登録(医師のみ)してログイン
すると翻訳の精度が向上します

概要

Abstract

2001年と2002年に、2つの論文（Bioinformatics、17、282-283、Bioinformatics、18、77-82）を公開しました。このプログラムは、数百万のシーケンスを備えた巨大なタンパク質データベースを効率的にクラスター化できます。ただし、基礎となるアルゴリズムのアプリケーションは、タンパク質シーケンスクラスタリングのみに限定されません。ここでは、CD-HIT-2D、CD-HIT-EST、CD-HIT-EST-2Dを含む同じアルゴリズムを使用していくつかの新しいプログラムを提示します。CD-HIT-2Dは2つのタンパク質データセットを比較し、それらの間に同様の一致を報告します。CD-HIT-ESTクラスターDNA/RNA配列データベースとCD-HIT-EST-2Dは、2つのヌクレオチドデータセットを比較します。これらのプログラムはすべて、数百万のシーケンスを備えた巨大なデータセットを処理でき、人気のあるシーケンス比較やBLASTなどのデータベース検索ツールに基づいて、数百倍高速にすることができます。

In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

医師のための臨床サポートサービス

ヒポクラ x マイナビのご紹介

無料会員登録していただくと、さらに便利で効率的な検索が可能になります。

Translated by Google