<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chuan-Yih, Yu &#187; Metagenomics</title>
	<atom:link href="http://www.paulyu.org/tag/metagenomics/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.paulyu.org</link>
	<description>Bioinformatic, Research, Life.... and more</description>
	<lastBuildDate>Wed, 11 Jan 2012 15:51:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column</title>
		<link>http://www.paulyu.org/bioinfo/metatranscriptomics-reveals-unique-microbial-small-rnas-in-the-ocean%e2%80%99s-water-column/</link>
		<comments>http://www.paulyu.org/bioinfo/metatranscriptomics-reveals-unique-microbial-small-rnas-in-the-ocean%e2%80%99s-water-column/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 15:07:20 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Metatranscriptomics]]></category>
		<category><![CDATA[sRNA]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=495</guid>
		<description><![CDATA[<p>Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column</p>
<p>Yanmei Shi, et al., Nature </p>
<p>In the past weeks, the researcher use sequencing technique to reveal the environmental species&#8217; information such as richness and functionality. The problem is not all the DNA sequences will be translated to the functional proteins. Some of the proteins might not [...]]]></description>
			<content:encoded><![CDATA[<p>Metatranscriptomics reveals unique microbial small RNAs in the ocean’s water column</p>
<p>Yanmei Shi,<em> et al.</em>, <em>Nature</em> <span id="more-495"></span></p>
<p>In the past weeks, the researcher use sequencing technique to reveal the environmental species&#8217; information such as richness and functionality. The problem is not all the DNA sequences will be translated to the functional proteins. Some of the proteins might not to be the function under some situation, such as lack of enzyme, incorrect post-translation modification and so on. The information we cannot obtain directly from DNA sequence. Therefore, metaproteomics and metatranscriptomics are proposed to conquer these problems.</p>
<p>They perform RNA sequencing directly from natural microbial assemblages. Some previous studies report that they found some of the complementary DNA can’t find significant homology against with the current database. This indicates that some of the cDNA sequences might transcribe from uncharacterized proteins. The authors used four various data sets from the same location but distinct depth at Hawaii Ocean and found a large fraction of cDNA sequences can’t to be found in the coding region or even in ribosomal RNAs using homology search. That plentiful portion of cDNA sequences is classified as small RNAs and putative sRNAs..The sRNAs are a short piece of RNA, but it will not translate to protein, and usually we can be found then in the intergenic regions on the genomes.</p>
<p>They use a covariance-model-based algorithm to search all unassigned cDNA reads against with known sequence and structural similarity in Rfam. Then, the self-clustering approach is applied to all cDNA reads for better characterization of sRNA. Among found 66 groups, 9 groups were identified sRNA families in Rfam database and the rest groups were psRNA groups mapped to IGRs metagenomics fragments derived from marine planktonic microorganisms.</p>
<p>Finally, sRNAs are claimed to be the factor for controlling the gene expression in response to variable environmental conditions. Also, it is the first time they have some information about the correlation between metatranscriptomics and environment in RNA aspect. This method can be a good sensor for environment changing.</p>
<p>I have a little missing in this paper. They report so much biology information that I can’t fully understand. However, I think the address a very important issue that the environment absolutely can change the protein expression by the certain biology mechanisms.</p>
<p><a href="http://www.nature.com/nature/journal/v459/n7244/full/nature08055.html" target="_blank">Ref</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/metatranscriptomics-reveals-unique-microbial-small-rnas-in-the-ocean%e2%80%99s-water-column/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Shotgun metaproteomics of the human distal gut microbiota</title>
		<link>http://www.paulyu.org/bioinfo/shotgun-metaproteomics-of-the-human-distal-gut-microbiota/</link>
		<comments>http://www.paulyu.org/bioinfo/shotgun-metaproteomics-of-the-human-distal-gut-microbiota/#comments</comments>
		<pubDate>Thu, 29 Apr 2010 15:05:48 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[MassSpectrometry]]></category>
		<category><![CDATA[metaproteomics]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=493</guid>
		<description><![CDATA[<p>Shotgun metaproteomics of the human distal gut microbiota</p>
<p>Nathan C Verberkmoes, et al., International Society for Microbial Ecology</p>
<p>
</p>
<p> </p>
<p>This paper describes a newly approach for metagenomics data analysis, called metaproteomics. They take two individual samples from a pair of twins who have gastroenteritis. One is treated with anti-inflammatory drugs and the other one don’t have any treatment. [...]]]></description>
			<content:encoded><![CDATA[<p>Shotgun metaproteomics of the human distal gut microbiota</p>
<p>Nathan C Verberkmoes,<em> et al.</em>, <em>International Society for Microbial Ecology</em></p>
<p><em><span id="more-493"></span><br />
</em></p>
<p><em> </em></p>
<p>This paper describes a newly approach for metagenomics data analysis, called metaproteomics. They take two individual samples from a pair of twins who have gastroenteritis. One is treated with anti-inflammatory drugs and the other one don’t have any treatment. The fecal samples were collected for this research study.</p>
<p>The microbial cells were extracted from fecal samples by using differential centrifugation. Each sample has a technical replicate to test the reproducibility of this approach. Then the LC-MS/MS is performed for the high-throughput data acquisition. The number of identify peptides are around 2,000 and 3,000 for sample 7 and 8 respectively against two human subject metagenome database. There are other three databases, which contain normal gut microbiota, human contamination and so on.</p>
<p>Those protein identification results are mapped to the COG database. The results show highly reproducible and consistent in COG categories, and several COG categories were more represented in the average microbial metagenomes which reported by previous study in the present study. The majority of detected proteins were involved in translation, carbohydrate metabolism or energy production.</p>
<p>The protein quantification results were derived from the normalized spectral abundance factors (NSAF) across all samples. The most abundance proteins are elastase, chymotrypsin C and salivary amylases, which are expected in the human gut microbial communality.</p>
<p>Those proteins which cannot be identified against with the databases are classified as hypothetical proteins. This class of proteins may belong to novel protein families, which are not being fully study and characterized by previous research. The majority of these proteins are highly representing in the human gut only.</p>
<p>This work is the first large-scale metaproteomics research. They take advantages of shotgun sequencing techniques to discovery new knowledge from a different aspect of view. Now the research can have information from before and after protein translation. For protein quantification, if the internal standard is used while data collecting stage, it will be more accurate in protein abundance estimation, and it is still a problem to detect those low detectable and abundance peptides.</p>
<p><a href="http://www.nature.com/ismej/journal/v3/n2/abs/ismej2008108a.html" target="_blank">Ref</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/shotgun-metaproteomics-of-the-human-distal-gut-microbiota/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Worlds within worlds: evolution of the vertebrate gut microbiota</title>
		<link>http://www.paulyu.org/bioinfo/worlds-within-worlds-evolution-of-the-vertebrate-gut-microbiota/</link>
		<comments>http://www.paulyu.org/bioinfo/worlds-within-worlds-evolution-of-the-vertebrate-gut-microbiota/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 23:11:50 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Evolution]]></category>
		<category><![CDATA[Paper]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=464</guid>
		<description><![CDATA[<p>Worlds within worlds: evolution of the vertebrate gut microbiota</p>
<p>Ruth E. Ley, et al., Nature Reviews Microbiology</p>
<p></p>
<p>A human gut is an extreme environment for the microorganisms. The authors try to find out whether the different habitat affects the microorganism or not by using published 16S rRNA data. They compare humans with other mammals, metazoan, and other free-living [...]]]></description>
			<content:encoded><![CDATA[<p>Worlds within worlds: evolution of the vertebrate gut microbiota</p>
<p>Ruth E. Ley,<em> et al.</em>, <em>Nature Reviews Microbiology</em></p>
<p><em><span id="more-464"></span></em></p>
<p>A human gut is an extreme environment for the microorganisms. The authors try to find out whether the different habitat affects the microorganism or not by using published 16S rRNA data. They compare humans with other mammals, metazoan, and other free-living microbial communities. Before the modern age, human’s diet is heavily related to the environment. They only consume the food from the seed or fruit. When the times files, the tools and technologies help people to have their favorable food without controlling by the environment. Therefore, the differences of microorganisms comminutes within humans, which live in distinct geographic location, are getting smaller. They compare the difference of facial microorganisms comminutes inter human and human with other mammals. The result supports the previous assumption.</p>
<p>They took 99,801 16S rRNA from 464 samples and 181 studies. The samples contain 202 samples in mammalian, 34 samples from large sequencing efforts if free-living communities, other human body habitats, the guts of non-mammal vertebrates and from the guts or whole body of diverse metazoan. They apply principal component analysis on the final sets.</p>
<p>The first principal component can separate vertebrate gut-associated communities from free-living communities. Almost entire nonvetebrate gut communities clustered with free-living communities. They conclude that mammals have a strong host phylogenetic effect on the structure of microbiota of arthropods. The third principal component can separate saline and non-saline free-living environmental communities.</p>
<p>The Firmicutes and Bacteroidetes are the most common and ubiquitous in vertebrate gut samples including human. The other types of sample also contain high abundance of Firmicutes and Bacteroidetes but other phyla tend to have highly represented in non-gut samples. The phylum-specific analysis indicate the gut samples of carnivores tend to cluster closer to free-living communities.</p>
<p>Finally, the authors say the globalization and frequented movement increase the microbial transmission. This phenomenon rapidly lost the biodiversity. Plants and animals are becoming extinct and microbial communities as well. It is very important to maintain those microorganisms, because recent research said that terrestrial microbial community composition change might be resulting in global change.</p>
<p>This is really an exhausted work to analysis such as large amount data. It is, however, important to do such large-scale study. I believe all the living species are connected together. We cannot live alone without other species. Save the earth, save yourself.</p>
<p>&lt;a href=&#8221;http://www.nature.com/nrmicro/journal/v6/n10/abs/nrmicro1978.html#top&#8221; target=&#8221;_blank&#8221;&gt;Paper Link&lt;/a&gt;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/worlds-within-worlds-evolution-of-the-vertebrate-gut-microbiota/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quantifying environmental adaptation of metabolic pathways in metagenomics</title>
		<link>http://www.paulyu.org/bioinfo/quantifying-environmental-adaptation-of-metabolic-pathways-in-metagenomics/</link>
		<comments>http://www.paulyu.org/bioinfo/quantifying-environmental-adaptation-of-metabolic-pathways-in-metagenomics/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 23:10:17 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Environmental adaptation]]></category>
		<category><![CDATA[metabolic pathways]]></category>
		<category><![CDATA[Paper]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=462</guid>
		<description><![CDATA[<p>Quantifying environmental adaptation of metabolic pathways in metagenomics</p>
<p>Tara A. Gianoulisa, et al., PNAS</p>
<p></p>
<p>In the Origin, the Darwin believes the environment will affect the characteristics of individual. These are always interesting questions: Do the different species have similar effect in the same environment? Is there any common effect in the similar environment? Above questions only can study [...]]]></description>
			<content:encoded><![CDATA[<p>Quantifying environmental adaptation of metabolic pathways in metagenomics</p>
<p>Tara A. Gianoulisa,<em> et al.</em>, <em>PNAS</em></p>
<p><em><span id="more-462"></span></em></p>
<p>In the Origin, the Darwin believes the environment will affect the characteristics of individual. These are always interesting questions: Do the different species have similar effect in the same environment? Is there any common effect in the similar environment? Above questions only can study from metagenomics aspect. The authors focus on not common functions but pathways across the species. They try to find the common and different pathway within the environment to describe the environment. They propose two methods, regularized canonical correlation analysis (CCA) and discriminative partition matching (DPM). These two methods can correlate different features and different environment to find the similar pathway across the samples.</p>
<p>There are three cases have been analyzed. Firstly, the metabolic pathways associated with amino acid and cofactor transport and metabolism have significant variation with the environmental features. The species can uptake and recycle the exogenous amino acid, and the uptake is related to the light availability. They observed variation in different location from the north to the south. The temperature and chlorophyll have the same principal axis along with light availability.</p>
<p>Secondly, the amino acid synthetic pathways are not related to the energetic cost of synthesizing a particular amino acid. Instead, it has highly correlation between the structural correlation of the amino acid pathways and their dependence on potentially limiting cofactors. It means the species may be more favorable to uptake outside amino acid than synthesis it.</p>
<p>Thirdly, there is an essential amino acid, methionine, in oceanic microorganisms. The methionine synthesis, salvage, and degradation pathways are related to the environment. When methionine degradation and amino acid transporters increase the synthesis of methionine and cobalamin will decrease. This result shows methionine has a significant role in shaping the adaptations.</p>
<p>The authors also take a look on lipid and glycan metabolism. Those components are important in the extracellular, so it might reflect the change of environment. The glycan can modify the lipid to enable or disable the function on the lipid. They do find a strong correlation between environment and features.</p>
<p>The authors claim these finding can be good biosensors when there is no other indicator of the certain environment. I am thinking whether these can be used as the environment evolution index or not. If we can use these as the evolution indicator, we can describe these environment changes by looking at the lineage of the microorganism within this environment.</p>
<p><a href="http://www.pnas.org/content/106/5/1374.long" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/quantifying-environmental-adaptation-of-metabolic-pathways-in-metagenomics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples</title>
		<link>http://www.paulyu.org/bioinfo/statistical-methods-for-detecting-differentially-abundant-features-in-clinical-metagenomic-samples/</link>
		<comments>http://www.paulyu.org/bioinfo/statistical-methods-for-detecting-differentially-abundant-features-in-clinical-metagenomic-samples/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 01:42:41 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Paper]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=433</guid>
		<description><![CDATA[<p>Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples</p>
<p>James Robert White, et al. PLOS computational biology</p>
<p></p>
<p>In previous studies, all metagenomics software tools focus on two samples comparison. There is no software can compare two or more groups and each group has the multiple individuals. In this paper, they propose a statistical method to achieve [...]]]></description>
			<content:encoded><![CDATA[<p>Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples</p>
<p>James Robert White, <em>et al.</em> <em>PLOS computational biology</em></p>
<p><span id="more-433"></span></p>
<p>In previous studies, all metagenomics software tools focus on two samples comparison. There is no software can compare two or more groups and each group has the multiple individuals. In this paper, they propose a statistical method to achieve such goal.</p>
<p>They use so-called feature abundance matrix. Each row represents a feature and each column represents a sample individual. It forms an <em>i</em> by <em>j</em> matrix when there is <em>i</em> features and <em>j</em> samples. Each cell contains the abundance of the certain features for certain individuals. The matrix will be normalized the matrix by calculating the proportion of taxon <em>i</em> observed in individual <em>j</em>. They claim another normalization method can be chosen without any problem</p>
<p>They use a two-sample <em>t</em> test on two different groups to calculate the abundance. If the t-test score is above a specified threshold, it can be inferred to be differentially abundant across the two groups. This threshold is to minimize the number of false positives. In traditional methods, the p-value can be computed by t distribution, but this method, however, only can apply the underlying distribution is normal. Therefore, Storey and Tibshirani propose a nonparametric <em>t-test</em> which can provide accurate estimation when the distribution is non-normal.</p>
<p>Their methodology also can handle the sparse matrix, low frequency features. The Fisher’s exact test model is applied when the sparse matrix is detected. This method also been used in the Significance Analysis of Microarrays method (SAM).</p>
<p>The authors test their method on human gut 16S rRNA sequence data. The goal is to confirm the taxa associated to human obesity in the previous studies. The result not only confirmed the conclusion but also detected several strongly differentially abundance features which the previous studies failed to find.</p>
<p>This paper is the first study of comparison of multiple groups with multiple individuals. This study makes the clinical metagenomics data analysis become feasible. The next-generation sequencing technique opens a new aspect in the research area. A good, rapid and robust tool for exploded increase data still needed for gathering the result in the future, and this paper points out the path of development.</p>
<p><a href="http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.1000352" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/statistical-methods-for-detecting-differentially-abundant-features-in-clinical-metagenomic-samples/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information</title>
		<link>http://www.paulyu.org/bioinfo/metagenomics/phaccs-an-online-tool-for-estimating-the-structure-and-diversity-of-uncultured-viral-communities-using-metagenomic-information/</link>
		<comments>http://www.paulyu.org/bioinfo/metagenomics/phaccs-an-online-tool-for-estimating-the-structure-and-diversity-of-uncultured-viral-communities-using-metagenomic-information/#comments</comments>
		<pubDate>Wed, 31 Mar 2010 01:41:30 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Paper]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=431</guid>
		<description><![CDATA[<p>PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information</p>
<p>Florent Angly, et al. BMC Bioinformatics </p>
<p></p>
<p>In this paper, the authors use six different relative rank-abundance forms to find the best descriptive function of the metagenomics data: the power law, logarithmic, exponential, broken stick, niche preemption, and lognormal distributions. The [...]]]></description>
			<content:encoded><![CDATA[<p>PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information</p>
<p>Florent Angly, <em>et al.</em><em> BMC Bioinformatics<span style="font-style: normal;"> </span></em></p>
<p><span id="more-431"></span></p>
<p>In this paper, the authors use six different relative rank-abundance forms to find the best descriptive function of the metagenomics data: the power law, logarithmic, exponential, broken stick, niche preemption, and lognormal distributions. The best descriptive function is defined as having the smallest variation between experimental and predicted contig spectra. The contig spectrum is a vector containing the number of q-contigs from an assembled DNA shotgun sequencing dataset.</p>
<p>The PHACCS (PHAge Communities from Contig Spectrum) takes contig spectrum matrix and several parameters as input. By providing following parameters: the length of the genome, the number of DNA fragments studied, the average size of these fragments, and the minimum overlapped length to modified Lander-Waterman model, we can predict a contig spectrum. For each execution cycle, it will recursively execute and adjust the parameter which predefined by users until reaching the minimum error between actual and predicted spectrum. The error is calculated by the variance-weighted sum of squared deviation function between actual and the predicted contig spectrum.</p>
<p>The best fitting model is used to assess the sample community diversity. It uses the number of genotype found in the contig spectrum as the richness of the community diversity. Also the Shannon-Wiener index which is the indicator of the diversity of the sample also calculated.</p>
<p>The authors compare four dataset, surface seawater of Scripps Pier and Mission Bay, and sediments from Mission Bay and human face. The power law is the best and the exponential and niche preemption are the worst descriptive model for all the samples. The community diversity estimated results are 3350, 7180, 7340 and 2390 genotypes respectively. The evenness estimated results are 0.932, 0.9, 1.0 and 0.873 respectively.</p>
<p>The PHACCS can estimate the sample community structure and diversity by contig spectrum and GUI is also provided to user. For getting good estimation, the key point is having a high quality, high dimension, contig spectrum.</p>
<p><a href="http://www.biomedcentral.com/1471-2105/6/41" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/metagenomics/phaccs-an-online-tool-for-estimating-the-structure-and-diversity-of-uncultured-viral-communities-using-metagenomic-information/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analysis and comparison of very large metagenomes with fast clustering and functional annotation</title>
		<link>http://www.paulyu.org/bioinfo/analysis-and-comparison-of-very-large-metagenomes-with-fast-clustering-and-functional-annotation/</link>
		<comments>http://www.paulyu.org/bioinfo/analysis-and-comparison-of-very-large-metagenomes-with-fast-clustering-and-functional-annotation/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 17:20:50 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Paper]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=418</guid>
		<description><![CDATA[<p>Analysis and comparison of very large metagenomes with fast clustering and functional annotation</p>
<p>Weizhong Li, BMC [...]]]></description>
			<content:encoded><![CDATA[<p>Analysis and comparison of very large metagenomes with fast clustering and functional annotation</p>
<p><em>Weizhong Li, BMC Bioinformatics 2009</em></p>
<p><em><span id="more-418"></span><a href="http://www.paulyu.org/wp-content/uploads/2010/03/I609-Presentation-chuyuV2.pptx">I609-Presentation-chuyuV2</a></em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/analysis-and-comparison-of-very-large-metagenomes-with-fast-clustering-and-functional-annotation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Barcodes for genomes and applications</title>
		<link>http://www.paulyu.org/bioinfo/barcodes-for-genomes-and-applications/</link>
		<comments>http://www.paulyu.org/bioinfo/barcodes-for-genomes-and-applications/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 22:17:18 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Paper]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=396</guid>
		<description><![CDATA[<p>Barcodes for genomes and applications</p>
<p>Fengfeng Zhou, et al. BMC Bioinformatics</p>
<p></p>
<p>Whether the genomes have a unique signature across all the genomes is a question that the answer remains unknown. This paper analyzes the existed 586 sequenced prokaryotic genomes by calculating k-mer, 1&#60;k&#60;6, frequency distribution. They divide genome to equal length of sequence and calculate the k-mer frequency [...]]]></description>
			<content:encoded><![CDATA[<p>Barcodes for genomes and applications</p>
<p>Fengfeng Zhou, <em>et al.</em> <em>BMC Bioinformatics</em></p>
<p><span id="more-396"></span></p>
<p>Whether the genomes have a unique signature across all the genomes is a question that the answer remains unknown. This paper analyzes the existed 586 sequenced prokaryotic genomes by calculating k-mer, 1&lt;k&lt;6, frequency distribution. They divide genome to equal length of sequence and calculate the k-mer frequency of each sequence, then transferring the abundance of occurrence for each k-mer into gray-level scale to form the barcode. They have found four major points when observing the data. Firstly, the frequency of 4-mer is quite stable in the genome by observing the vertical bands. Secondly, there is a small difference on a small fraction of the genome and these differences typically present special classes of genes. Thirdly, different chromosomes within species will have majority similar part and a little different part of pattern. Finally, the barcode similarity can directly infer the phylogenetic distance. They use Markov chain as random nucleotide sequence generating model, and the third and above order Markov chain show the similar pattern as the real data. We also can differentiate coding and non-coding region by observing different pattern in the barcode. They also apply this method to eukaryotic, mitochondrial, plastid and plasmid genomes. In high-level eukaryotic genome, different region have similar backbone barcode and introns and promoter regions have low complexity compare with repetitive and coding region. The mitochondrial and plastid genomes don’t have similar pattern compare with other genomes. By above simulation they conclude different classes of genomes have their own unique characteristic. In the prokaryotic genomes analysis result the abnormal region tend to be horizontal gene transfers, phage invasions and highly expressed genes. The total amount of 30% have fall into these three categories, and the remaining 70% of abnormal sequences are still remain secret. They use the barcode similarity to differentiate the sequence into different bin. They use 11 random select genomes from different species but same genus, and another 30 and 100 random selected bacterial genera genomes. They use 500, 1000, 2000, 5000 and 10000 fragment size to test the boundary of the software accuracy. The result shows over 1000 bp will have more than 50% of accuracy in three sample sets. The barcode pattern only present on the genome sequence not in random sequence. This barcode method can provide us a global view of genomes and its presentation is easily to compare the distribution between genome and genome.</p>
<p>This research is major apply on prokaryotic, and the conclusion is address from the test set. Therefore we cannot have a strong hypothesis infer to other genomes. The boundary of the accuracy for this software is around 1000 bp. The NST is around 400 bp for each reads. It seems not suitable for using this barcode scheme on short reads.</p>
<p><a href="http://www.biomedcentral.com/1471-2105/9/546" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/barcodes-for-genomes-and-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads</title>
		<link>http://www.paulyu.org/bioinfo/compostbin-a-dna-composition-based-algorithm-for-binning-environmental-shotgun-reads/</link>
		<comments>http://www.paulyu.org/bioinfo/compostbin-a-dna-composition-based-algorithm-for-binning-environmental-shotgun-reads/#comments</comments>
		<pubDate>Tue, 02 Mar 2010 22:15:58 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[Paper]]></category>
		<category><![CDATA[PCA]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=394</guid>
		<description><![CDATA[<p>CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads</p>
<p>Sourav Chatterji, et al. RECOMB.</p>
<p>
</p>
<p>When we want to study microbial diversity, we face a huge challenge that most of microbial cannot be cultured in the lab. Therefore we cannot use the traditional procedure to analyze the microbial. The next generation sequencing technology and shot-gun sequencing become the [...]]]></description>
			<content:encoded><![CDATA[<p>CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads</p>
<p>Sourav Chatterji, <em>et al.</em><em> RECOMB.</em></p>
<p><em><span id="more-394"></span><br />
</em></p>
<p>When we want to study microbial diversity, we face a huge challenge that most of microbial cannot be cultured in the lab. Therefore we cannot use the traditional procedure to analyze the microbial. The next generation sequencing technology and shot-gun sequencing become the solution to analyze the microbial diversity, but other issues arise when we apply those techniques. We sequence all short reads from different species at the same time, and before we do any analysis we need to classify them. The reads binning, distinguishing which read belong to which genome, become the first issue we need to face. As long as we can accurate classify the reads, we can know the richness of this community. CompostBin is one of these kinds software. First, it uses principal component analysis (PCA) to extract the feature of every reads. Each sequence has a feature matrix which denotes the frequency of k-mer. In this paper they use 6-mer as the feature therefore 4<sup>6</sup>, 4 nucleotide in DNA, columns for each sequence and first three principal components are used for principal component analysis (PCA). Then a 6-nearest neighbor graph is constructed for clustering, the vector represent the sequence and the edge exist if one of the sequence is a 6-nearest neighbor of the other sequences. The edge also weighted by the exponential inverse of their normalized Euclidean distance. By incorporating the Phylogenetic Marker with the graph the CompostBin can calculate normalized cut score and bin the read into two different bins. This step can iteratively execute until reaching the bin number we setup. They test on some simulates datasets which generate the simulated Sanger reads from known genome datasets by ReadSim. They intend to make datasets have different characteristic such as different number of species, relative abundance, phylogenetic diversity and GC content, to test the accuracy of the program. For the simulated datasets there are 10 out of 12 dataset have less than 6% binning error. The other two datasets have higher misclassified error rate due to have the small phylogenetic distance. They also apply to real metagenomics date, Glyassy-winged sharpshooter endosymbionts. The result shows 5.9% error compare with original paper.</p>
<p>It shows a pretty good performance on those present results. This paper said this program is still progressing. They want to test on different clustering method since the PCA cannot catch the nonlinear structure. They claim the first three principal components are enough for clustering, but this is under simulated result. Does the first three components really enough for those large communities? Also, the 6-mer is another important criterion, can we increase or decrease for different purposes?  Like increase k-mer to get more detail characteristics and decrease k-mer to get less running time. Although this program doesn’t require any training set, it is based on very strong assumption to determine the parameters.</p>
<p><a href="http://arxiv.org/abs/0708.3098" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/compostbin-a-dna-composition-based-algorithm-for-binning-environmental-shotgun-reads/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences</title>
		<link>http://www.paulyu.org/bioinfo/esprit-estimating-species-richness-using-large-collections-of-16s-rrna-pyrosequences/</link>
		<comments>http://www.paulyu.org/bioinfo/esprit-estimating-species-richness-using-large-collections-of-16s-rrna-pyrosequences/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 17:15:51 +0000</pubDate>
		<dc:creator>paulyu</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Metagenomics]]></category>
		<category><![CDATA[OTUs]]></category>
		<category><![CDATA[Paper]]></category>
		<category><![CDATA[species richness]]></category>

		<guid isPermaLink="false">http://www.paulyu.org/?p=382</guid>
		<description><![CDATA[<p>ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences</p>
<p>Yijun Sun, et al. Nucleic Acids Research</p>
<p></p>
<p>This paper proposes a new method to classified operational taxonomic units (OTUs) in a large number of sequences sample. The goal of this paper is to develop a rapid, accurate and can handle large scale data for metagenomics researchers to [...]]]></description>
			<content:encoded><![CDATA[<p>ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences</p>
<p>Yijun Sun, et al. Nucleic Acids Research</p>
<p><span id="more-382"></span></p>
<p>This paper proposes a new method to classified operational taxonomic units (OTUs) in a large number of sequences sample. The goal of this paper is to develop a rapid, accurate and can handle large scale data for metagenomics researchers to estimate species richness. They first compare two different alignment approach, multiple sequences alignment (MSA) which is commonly use in previous study and pairwise sequences alignment (PSA), and show compatible result between MSA and PSA. They claim using PSA can have better calculation performance and more accurate result than MSA. The advantage of using PSA is problem set can be divided into multiple subsets than do the parallel computing. The full strategy of ESPRIT is as follows, removing low quality reads, computing pairwise distance, assigning sequences into OUTs and Statistical inference of species richness. First, the program will remove those reads reach one of the several thresholds such as reads contain ambiguous nucleotides, more than one mismatch at the beginning of a read and atypical lengths. This process shrink the problem set and reduce the computation complexity. The Needleman-Wunsch is performed for PSA alignment processing. They only take pairwise distance &lt; 0.1 and discard the rest reads to speed up processing time and save storage space. The <em>k</em>-mer is calculated and assigned a score for each pair of sequences. There is also a threshold for the <em>k</em>-mer score (default is 0.5). The Hcluster is introduced for assignment sequences into OTUs. This new algorithm can process the distance information on-the-fly. It has two different type of label for each sequence, active or inactive. Active define as the sequence have not enough distance information for clustering; inactive defines as the sequence have no distance information or already be clustered. This cluster algorithm, Hcluster, is a general classification method which can be use in any kind of clustering problem not limit to this problem. They compare ESPRIT with DOTUR and MOTHUR which are the commonly software use in many mstagenomics projects for several years. The result shows that using DOTUR or MOTHUR for species richness estimate will over estimate the number. The next-generation sequencing technology can produce tons of sequence in a lower price compare with previous method and ESPRIT give us a better aspect to study microorganism.</p>
<p>I think the major problems in metagenomics is how to efficiently processing huge amount of data and how to do data mining. This method give me a hint that we don’t need to improve every steps instead sometimes replace it will have a surprised result.</p>
<p><a href="http://nar.oxfordjournals.org/cgi/content/full/gkp285v1" target="_blank">Paper Link</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.paulyu.org/bioinfo/esprit-estimating-species-richness-using-large-collections-of-16s-rrna-pyrosequences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

