<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chuan-Yih, Yu &#187; OTUs</title>
	<atom:link href="http://www.paulyu.org/tag/otus/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paulyu.org</link>
	<description>Bioinformatic, Research, Life.... and more</description>
	<lastBuildDate>Wed, 11 Jan 2012 15:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences</title>
		<link>http://www.paulyu.org/bioinfo/esprit-estimating-species-richness-using-large-collections-of-16s-rrna-pyrosequences/</link>
		<comments>http://www.paulyu.org/bioinfo/esprit-estimating-species-richness-using-large-collections-of-16s-rrna-pyrosequences/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 17:15:51 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[OTUs]]></category>
		<category><![CDATA[Paper]]></category>
		<category><![CDATA[species richness]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=382</guid>
		<description><![CDATA[<p>ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences</p>
<p>Yijun Sun, et al. Nucleic Acids Research</p>
<p></p>
<p>This paper proposes a new method to classified operational taxonomic units (OTUs) in a large number of sequences sample. The goal of this paper is to develop a rapid, accurate and can handle large scale data for metagenomics researchers to [...]]]></description>
			<content:encoded><![CDATA[<p>ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences</p>
<p>Yijun Sun, et al. Nucleic Acids Research</p>
<p><span id="more-382"></span></p>
<p>This paper proposes a new method to classified operational taxonomic units (OTUs) in a large number of sequences sample. The goal of this paper is to develop a rapid, accurate and can handle large scale data for metagenomics researchers to estimate species richness. They first compare two different alignment approach, multiple sequences alignment (MSA) which is commonly use in previous study and pairwise sequences alignment (PSA), and show compatible result between MSA and PSA. They claim using PSA can have better calculation performance and more accurate result than MSA. The advantage of using PSA is problem set can be divided into multiple subsets than do the parallel computing. The full strategy of ESPRIT is as follows, removing low quality reads, computing pairwise distance, assigning sequences into OUTs and Statistical inference of species richness. First, the program will remove those reads reach one of the several thresholds such as reads contain ambiguous nucleotides, more than one mismatch at the beginning of a read and atypical lengths. This process shrink the problem set and reduce the computation complexity. The Needleman-Wunsch is performed for PSA alignment processing. They only take pairwise distance &lt; 0.1 and discard the rest reads to speed up processing time and save storage space. The <em>k</em>-mer is calculated and assigned a score for each pair of sequences. There is also a threshold for the <em>k</em>-mer score (default is 0.5). The Hcluster is introduced for assignment sequences into OTUs. This new algorithm can process the distance information on-the-fly. It has two different type of label for each sequence, active or inactive. Active define as the sequence have not enough distance information for clustering; inactive defines as the sequence have no distance information or already be clustered. This cluster algorithm, Hcluster, is a general classification method which can be use in any kind of clustering problem not limit to this problem. They compare ESPRIT with DOTUR and MOTHUR which are the commonly software use in many mstagenomics projects for several years. The result shows that using DOTUR or MOTHUR for species richness estimate will over estimate the number. The next-generation sequencing technology can produce tons of sequence in a lower price compare with previous method and ESPRIT give us a better aspect to study microorganism.</p>
<p>I think the major problems in metagenomics is how to efficiently processing huge amount of data and how to do data mining. This method give me a hint that we don’t need to improve every steps instead sometimes replace it will have a surprised result.</p>
<p><a href="http://nar.oxfordjournals.org/cgi/content/full/gkp285v1" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/esprit-estimating-species-richness-using-large-collections-of-16s-rrna-pyrosequences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness</title>
		<link>http://www.paulyu.org/bioinfo/introducing-dotur-a-computer-program-for-defining-operational-taxonomic-units-and-estimating-species-richness/</link>
		<comments>http://www.paulyu.org/bioinfo/introducing-dotur-a-computer-program-for-defining-operational-taxonomic-units-and-estimating-species-richness/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 17:14:14 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[OTUs]]></category>
		<category><![CDATA[Paper]]></category>
		<category><![CDATA[species richness]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=380</guid>
		<description><![CDATA[<p>Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness</p>
<p>Patrick D. Schloss and Jo Handelsman, APPLIED AND ENVIRONMENTAL MICROBIOLOGY</p>
<p></p>
<p>Since we start to study metagenomics, to estimate species richness form complex samples become a major issue. Normally the 16S rRNA gene sequences identity is the target when we want to estimate the number [...]]]></description>
			<content:encoded><![CDATA[<p>Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness</p>
<p>Patrick D. Schloss and Jo Handelsman, <em>APPLIED AND ENVIRONMENTAL MICROBIOLOGY</em></p>
<p><em><span id="more-380"></span></em></p>
<p>Since we start to study metagenomics, to estimate species richness form complex samples become a major issue. Normally the 16S rRNA gene sequences identity is the target when we want to estimate the number of species within sample. Most of gene sequence in 16S rRNA is highly conserved between species but it also has several variance regions which can help us to distinguish different species. The goal of this paper is to help us to estimate the number of species within a sample. First step is multiple sequence alignment, they follow the sequence identity threshold which is normal used in the research but these criteria are still controversial. Sequences have &gt;97% identity are same species, between 95% and 97% are same genus and between 80% and 95% are same phylum. DOTUR (Distance-Based OTU and Richness) assign each sequence into different operational taxonomic units (OTUs) based of sequence distance. DOTUR will calculate the Shannon-Weaver and Simpson diversity indices and estimate species richness within sample. They compare DOTUR result with other software, EstimateS, by using a clone library to validate their result. The results are similar to each other. There are several advantages in DOTUR, it runs faster and provide sequence alignment result which lack in EstimateS. DOTUR can produce different level of distance in separate files. They apply DOTUR to Amazonian soil sample which has 98 bacterial 16S rRNA in previous study. DOTUR reports there are 94 singleton and 2 doubletons which are total 96 OTUs. Then they apply DOTUR to the Sargasso Sea metagenome sequences. They reduce the sample set to two fragment sets; one is 690 partial 16S rRNA gene fragment and the other is 507 partial rpoB fragments as sample set. In 6% of sequence difference DOTUR report 114 16S rRNA and 304 rpoB. DOTUR has automatic, rapid and accurate features to estimate the richness of species within the sample. DOTUR can use different alignment algorithm and distances simultaneously to estimate the number of species more accurately.</p>
<p>Although DOTUR has some advantages, it can’t handle a large scale sample. Therefore it has been replace by MOTHUR. Unfortunately MOTHUR also have this issue. The performance</p>
<p><a href="http://aem.asm.org/cgi/content/abstract/71/3/1501" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/introducing-dotur-a-computer-program-for-defining-operational-taxonomic-units-and-estimating-species-richness/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

