由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Biology版 - RNA-seq data assembler vs genomic shot gun data assembler
相关主题
请问大家 RNA-Seq assembly 都用啥软件呢?server?
RNA-seq map工具scientific support of "其实,那些四十几以上的男人不应该执
大鼠的RNA-seq应该使用那个reference genome?谁能讲讲de novo assembly?
跟风, 请教ILLUMINA data analysisOxford Nanopore 纽约 opening
Do someone know how to use PAVE (program for assembling and viewing EST)? help me.请给萤火虫基因组投上你的一票 (PacBio 年度最有意思的基因组大赛)
翻墙求合作问个人基因组测序的问题
giant panda genome in Nature from BGI请推荐一些不需要补实验的小杂志
如何做两个细菌的基因组序列和蛋白质序列的比较软件请推荐一下关于分子生物学技术的书籍,多谢
相关话题的讨论汇总
话题: assembler话题: rna话题: reads话题: genome话题: cufflinks
进入Biology版参与讨论
1 (共1页)
c***y
发帖数: 615
1
版上有人能介绍下它们之间的区别嘛?
想用cufflinks to assemble shot-gun genome data, 不知可行否
多谢!
f*******o
发帖数: 1
2
"genomic shot gun data assembler" assembles genomic DNA sequences while "RNA
-Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of
assemblers mostly uses de bruijn graph based methods for Illumina data. But
there are many differences between.
For example, RNA-Seq reads can be stranded while DNA sequences are
strandless. So DNA assembler has to treat a read and its reverse complement
sequence the same.
Another major difference is the assumption on the sequence coverage.
Disregarding fragmentation and sequencing biases, the coverage of the genome
should be more or less even. Thus regions with more reads implies multiple
copies/repeats. On the other hand, due to different expression levels of
different mRNAs, the coverage of different transcripts can vary greatly.
Additionally, RNA assembler has to be aware of different isoforms of the
same gene, which corresponds to different traversal in the graph.
There are many other differences but the bottomline line is that it's not a
good idea to use Cufflinks to assemble genome. SPAdes is my goto for such
purpose.
c***y
发帖数: 615
3
But SPAdes is a de novo tool. How about if I would like to include the
reference to the assembly process?

RNA
of
But
complement
genome
multiple

【在 f*******o 的大作中提到】
: "genomic shot gun data assembler" assembles genomic DNA sequences while "RNA
: -Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of
: assemblers mostly uses de bruijn graph based methods for Illumina data. But
: there are many differences between.
: For example, RNA-Seq reads can be stranded while DNA sequences are
: strandless. So DNA assembler has to treat a read and its reverse complement
: sequence the same.
: Another major difference is the assumption on the sequence coverage.
: Disregarding fragmentation and sequencing biases, the coverage of the genome
: should be more or less even. Thus regions with more reads implies multiple

f*******o
发帖数: 1
4
That's rare, not sure if I understand why but I can think of two ways:
1. Generate all possible K-mers (K = your sequencing length) from the
genomic sequence. Include them in your fastq read files and do de novo
assembly. The intrinsic information from the reference sequence should be
useful to resolve ambiguity in the graph during assembly. Similar idea has
been used in Cufflinks.
2. Align your reads to the genome, then separate aligned and unaligned reads
. For the aligned reads, just call variations. For the unaligned reads, do
de novo assembly and compare the assembled contigs with the reference genome
to identify novel sequence or variations. Similar idea was used in Tophat
to identify new splicing sites.
Hope that this helps.
c***y
发帖数: 615
5
版上有人能介绍下它们之间的区别嘛?
想用cufflinks to assemble shot-gun genome data, 不知可行否
多谢!
f*******o
发帖数: 1
6
"genomic shot gun data assembler" assembles genomic DNA sequences while "RNA
-Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of
assemblers mostly uses de bruijn graph based methods for Illumina data. But
there are many differences between.
For example, RNA-Seq reads can be stranded while DNA sequences are
strandless. So DNA assembler has to treat a read and its reverse complement
sequence the same.
Another major difference is the assumption on the sequence coverage.
Disregarding fragmentation and sequencing biases, the coverage of the genome
should be more or less even. Thus regions with more reads implies multiple
copies/repeats. On the other hand, due to different expression levels of
different mRNAs, the coverage of different transcripts can vary greatly.
Additionally, RNA assembler has to be aware of different isoforms of the
same gene, which corresponds to different traversal in the graph.
There are many other differences but the bottomline line is that it's not a
good idea to use Cufflinks to assemble genome. SPAdes is my goto for such
purpose.
c***y
发帖数: 615
7
But SPAdes is a de novo tool. How about if I would like to include the
reference to the assembly process?

RNA
of
But
complement
genome
multiple

【在 f*******o 的大作中提到】
: "genomic shot gun data assembler" assembles genomic DNA sequences while "RNA
: -Seq assembler", such as Cufflinks, assembles transcriptome. Both classes of
: assemblers mostly uses de bruijn graph based methods for Illumina data. But
: there are many differences between.
: For example, RNA-Seq reads can be stranded while DNA sequences are
: strandless. So DNA assembler has to treat a read and its reverse complement
: sequence the same.
: Another major difference is the assumption on the sequence coverage.
: Disregarding fragmentation and sequencing biases, the coverage of the genome
: should be more or less even. Thus regions with more reads implies multiple

f*******o
发帖数: 1
8
That's rare, not sure if I understand why but I can think of two ways:
1. Generate all possible K-mers (K = your sequencing length) from the
genomic sequence. Include them in your fastq read files and do de novo
assembly. The intrinsic information from the reference sequence should be
useful to resolve ambiguity in the graph during assembly. Similar idea has
been used in Cufflinks.
2. Align your reads to the genome, then separate aligned and unaligned reads
. For the aligned reads, just call variations. For the unaligned reads, do
de novo assembly and compare the assembled contigs with the reference genome
to identify novel sequence or variations. Similar idea was used in Tophat
to identify new splicing sites.
Hope that this helps.
C*********X
发帖数: 10518
9
你是学生还是博士后?

【在 c***y 的大作中提到】
: 版上有人能介绍下它们之间的区别嘛?
: 想用cufflinks to assemble shot-gun genome data, 不知可行否
: 多谢!

1 (共1页)
进入Biology版参与讨论
相关主题
Help!: RNA genomic position 数据库哪里下载?Do someone know how to use PAVE (program for assembling and viewing EST)? help me.
paper help!翻墙求合作
paper helpgiant panda genome in Nature from BGI
paper help如何做两个细菌的基因组序列和蛋白质序列的比较软件
请问大家 RNA-Seq assembly 都用啥软件呢?server?
RNA-seq map工具scientific support of "其实,那些四十几以上的男人不应该执
大鼠的RNA-seq应该使用那个reference genome?谁能讲讲de novo assembly?
跟风, 请教ILLUMINA data analysisOxford Nanopore 纽约 opening
相关话题的讨论汇总
话题: assembler话题: rna话题: reads话题: genome话题: cufflinks