RNA-seq data assembler vs genomic shot gun data assembler - Biology版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Biology版 - RNA-seq data assembler vs genomic shot gun data assembler

相关主题
● 请问大家 RNA-Seq assembly 都用啥软件呢？	● server?
● RNA-seq map工具	● scientific support of "其实，那些四十几以上的男人不应该执
● 大鼠的RNA-seq应该使用那个reference genome？	● 谁能讲讲de novo assembly？
● 跟风, 请教ILLUMINA data analysis	● Oxford Nanopore 纽约 opening
● Do someone know how to use PAVE (program for assembling and viewing EST)? help me.	● 请给萤火虫基因组投上你的一票 (PacBio 年度最有意思的基因组大赛)
● 翻墙求合作	● 问个人基因组测序的问题
● giant panda genome in Nature from BGI	● 请推荐一些不需要补实验的小杂志
● 如何做两个细菌的基因组序列和蛋白质序列的比较软件	● 请推荐一下关于分子生物学技术的书籍，多谢

相关话题的讨论汇总
话题: assembler话题: rna话题: reads话题: genome话题: cufflinks

进入Biology版参与讨论

1

(共1页)

c***y 发帖数: 615	1 版上有人能介绍下它们之间的区别嘛？想用cufflinks to assemble shot-gun genome data, 不知可行否多谢！
f*******o 发帖数: 1	2 "genomic shot gun data assembler" assembles genomic DNA sequences while "RNA -Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of assemblers mostly uses de bruijn graph based methods for Illumina data. But there are many differences between. For example, RNA-Seq reads can be stranded while DNA sequences are strandless. So DNA assembler has to treat a read and its reverse complement sequence the same. Another major difference is the assumption on the sequence coverage. Disregarding fragmentation and sequencing biases, the coverage of the genome should be more or less even. Thus regions with more reads implies multiple copies/repeats. On the other hand, due to different expression levels of different mRNAs, the coverage of different transcripts can vary greatly. Additionally, RNA assembler has to be aware of different isoforms of the same gene, which corresponds to different traversal in the graph. There are many other differences but the bottomline line is that it's not a good idea to use Cufflinks to assemble genome. SPAdes is my goto for such purpose.
c***y 发帖数: 615	3 But SPAdes is a de novo tool. How about if I would like to include the reference to the assembly process? RNA of But complement genome multiple 【在 f*******o 的大作中提到】 : "genomic shot gun data assembler" assembles genomic DNA sequences while "RNA : -Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of : assemblers mostly uses de bruijn graph based methods for Illumina data. But : there are many differences between. : For example, RNA-Seq reads can be stranded while DNA sequences are : strandless. So DNA assembler has to treat a read and its reverse complement : sequence the same. : Another major difference is the assumption on the sequence coverage. : Disregarding fragmentation and sequencing biases, the coverage of the genome : should be more or less even. Thus regions with more reads implies multiple
f*******o 发帖数: 1	4 That's rare, not sure if I understand why but I can think of two ways: 1. Generate all possible K-mers (K = your sequencing length) from the genomic sequence. Include them in your fastq read files and do de novo assembly. The intrinsic information from the reference sequence should be useful to resolve ambiguity in the graph during assembly. Similar idea has been used in Cufflinks. 2. Align your reads to the genome, then separate aligned and unaligned reads . For the aligned reads, just call variations. For the unaligned reads, do de novo assembly and compare the assembled contigs with the reference genome to identify novel sequence or variations. Similar idea was used in Tophat to identify new splicing sites. Hope that this helps.
c***y 发帖数: 615	5 版上有人能介绍下它们之间的区别嘛？想用cufflinks to assemble shot-gun genome data, 不知可行否多谢！
f*******o 发帖数: 1	6 "genomic shot gun data assembler" assembles genomic DNA sequences while "RNA -Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of assemblers mostly uses de bruijn graph based methods for Illumina data. But there are many differences between. For example, RNA-Seq reads can be stranded while DNA sequences are strandless. So DNA assembler has to treat a read and its reverse complement sequence the same. Another major difference is the assumption on the sequence coverage. Disregarding fragmentation and sequencing biases, the coverage of the genome should be more or less even. Thus regions with more reads implies multiple copies/repeats. On the other hand, due to different expression levels of different mRNAs, the coverage of different transcripts can vary greatly. Additionally, RNA assembler has to be aware of different isoforms of the same gene, which corresponds to different traversal in the graph. There are many other differences but the bottomline line is that it's not a good idea to use Cufflinks to assemble genome. SPAdes is my goto for such purpose.
c***y 发帖数: 615	7 But SPAdes is a de novo tool. How about if I would like to include the reference to the assembly process? RNA of But complement genome multiple 【在 f*******o 的大作中提到】 : "genomic shot gun data assembler" assembles genomic DNA sequences while "RNA : -Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of : assemblers mostly uses de bruijn graph based methods for Illumina data. But : there are many differences between. : For example, RNA-Seq reads can be stranded while DNA sequences are : strandless. So DNA assembler has to treat a read and its reverse complement : sequence the same. : Another major difference is the assumption on the sequence coverage. : Disregarding fragmentation and sequencing biases, the coverage of the genome : should be more or less even. Thus regions with more reads implies multiple
f*******o 发帖数: 1	8 That's rare, not sure if I understand why but I can think of two ways: 1. Generate all possible K-mers (K = your sequencing length) from the genomic sequence. Include them in your fastq read files and do de novo assembly. The intrinsic information from the reference sequence should be useful to resolve ambiguity in the graph during assembly. Similar idea has been used in Cufflinks. 2. Align your reads to the genome, then separate aligned and unaligned reads . For the aligned reads, just call variations. For the unaligned reads, do de novo assembly and compare the assembled contigs with the reference genome to identify novel sequence or variations. Similar idea was used in Tophat to identify new splicing sites. Hope that this helps.
C*********X 发帖数: 10518	9 你是学生还是博士后？【在 c***y 的大作中提到】 : 版上有人能介绍下它们之间的区别嘛？ : 想用cufflinks to assemble shot-gun genome data, 不知可行否 : 多谢！

1

(共1页)

进入Biology版参与讨论

相关主题
● Help!: RNA genomic position 数据库哪里下载？	● Do someone know how to use PAVE (program for assembling and viewing EST)? help me.
● paper help!	● 翻墙求合作
● paper help	● giant panda genome in Nature from BGI
● paper help	● 如何做两个细菌的基因组序列和蛋白质序列的比较软件
● 请问大家 RNA-Seq assembly 都用啥软件呢？	● server?
● RNA-seq map工具	● scientific support of "其实，那些四十几以上的男人不应该执
● 大鼠的RNA-seq应该使用那个reference genome？	● 谁能讲讲de novo assembly？
● 跟风, 请教ILLUMINA data analysis	● Oxford Nanopore 纽约 opening

相关话题的讨论汇总
话题: assembler话题: rna话题: reads话题: genome话题: cufflinks

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)