由买买提看人间百态

topics

全部话题 - 话题: mappable
(共0页)
n****1
发帖数: 1136
1
来自主题: Programming版 - OOP里面的Object其实是actor
Monad is not invented to solve immutability problem.
Monad is real, and people use it everyday without noticing it. For example,
list is the most illustrative monad, and list comprehension/LINQ in python/C
# can reduce a lot of boilerplate code and make code more readable.
Haskell just grouped the conceptually similar structures together and gives
this group a ridiculous name called "Monad". I would prefer to call them "
mappable and flatable".
Mappable means you can map ordinal functions on mon... 阅读全帖
L*******a
发帖数: 293
2
来自主题: Biology版 - NGS 二代测序分析,大家来评评
Generally, 70% mappable reads is an experienced criteria.
FYI.
ENCODE project requires 10M mappable reads for human ChIP-seq experiments,
and modENCODE project requires 4M for worm and fly.
From the perspective of genomic, some classic genes bound by peaks are not
enough to prove your ChIP-seq data is valid, in terms of saturation.
You'd better to have some replicates anyway, each of replicate should meet
saturation and mappability criteria, and the reproducibility rate also
should be in an acce... 阅读全帖
r****t
发帖数: 10904
3
来自主题: Programming版 - 请教 一个matlab画图的问题
读点 matlab colormap 的文档吧,我不知道。
我用 matplotlib 里面是
scatter(x,y,z,color=mappable)
mappable 是一个 mapping object, 里面你可以做很多种不同的 mapping.
matlab 里面应该是类似的。
interpolate/extrapolate in python 前几天有人在这儿也问了,我回在
那个帖子后面了。
n********k
发帖数: 2818
4
来自主题: Biology版 - NGS 二代测序分析,大家来评评
Just got some of my ChIP-seq data back:
For histone markers, looked great and the cores told me it is great above 70
% mappable reads and peak calling with FDR of <5%...
for my TF, the data make sense to me but the core said it is trash/useless,
9-20% mappable reads (out of 9-11M, meant to get 20M) and peaks calling with
a FDR of 100%.
Luckily my TF has been chipped many many time and has very conserved binding
sites. I randomly picked the mapped peaks, most of them with at least 1
high confid... 阅读全帖
n********k
发帖数: 2818
5
来自主题: Biology版 - NGS 二代测序分析,大家来评评
I did the ChIP part and the core prepared the libraries and I believe it was
the same protocol for all samples. Apparently TF groups have much lower
amount of DNA to start with and I was wondering whether that would make the
background noise much big an issue and thus the percentage of the mappable
reads is very low. I have three conditions: the percentage of the mappable
reads is decreasing as the same way the starting amount of ChIP-DNA does. I
was wondering whether over-amplification could ... 阅读全帖
j*p
发帖数: 411
6
来自主题: Biology版 - NGS 二代测序分析,大家来评评
"for my TF, the data make sense to me but the core said it is trash/useless,
9-20% mappable reads (out of 9-11M, meant to get 20M) and peaks calling with
a FDR of 100%. "
Mouse sample with 20%x11M = 2.2M is useless for publication. But it is still
potentially useable for trouble shooting.
Possible reasons(most likely -- least likely):
1. Anti-body doesn't work, did not pull down anything, therefore, no signal
enrichment on sites that are supposed TF-binding. The whole signal should
look no diffe... 阅读全帖
n********k
发帖数: 2818
7
来自主题: Biology版 - NGS 二代测序分析,大家来评评
This is so great and thank you very much.
1. the Antibody is at least decent---this gene has been chipped many many
times by many labs; and I confirmed the ChIP using QPCR...
2. I have been suspecting we may have the library overload problem or over-
amplification issue(if that makes sense). The core really followed the
histone protocol and was meant to get 20-30M reads, and it did for Histone
markers and Input DNA. However, for my TF, the 1st is 9M with 9% mappable
reads (I expect less bind ev... 阅读全帖
j*p
发帖数: 411
8
来自主题: Biology版 - ChIP-seq on H3K4me3
1. UCSC genome browser has many ChIP-seq tracks including TFs, Pol2 and
Histone modifications (H3K4me3, H3K27me3, H3K36me3, etc) on different cell
lines(through Encode/Gencode project). You can just active these tracks and
compare with what you have to see if your H3K4me3 data looks normal or not.
2. I suggest you normalize your ChIP-seq signals. For example, your normal
tissue ChIP-seq has total number mappable reads = 10M, your tumor sample has
20M mappable reads, then divide any signal from t... 阅读全帖
j*p
发帖数: 411
9
本人在wet lab里面做纯数据分析,for NGS data analysis, 简单介绍一些自己接触过
,并且觉得挺有用的工具,说的有点杂,权作抛砖引玉,还请不吝赐教。
Next-Gen sequencing(NGS)和现在正在发展的3rd-gen sequencing将会在生物学研究中
被越来越广泛应用。不管你信不信,反正我信了。一是基于实验成本的降低($1k
whole-genome sequencing is coming),越来越多的实验室可以操作;二是可以提供
相对low throughput experiment多的多的数据和信息,可以看到很多从前看不到的东
西;三是sequencer本身对测序的准确性正在逐渐提高,所以实验固有错误率降低;四
是各种算法的成熟应用,这使得很多由于实验产生的误差在出数据后通过对数据的分析
得以过滤。按照library preparation来分,NGS主要有DNA-seq和RNA-seq
DNA-seq is usually used as ChIP-seq to study transcription factor(TF)-DNA
bi... 阅读全帖
j*p
发帖数: 411
10
本人在wet lab里面做纯数据分析,for NGS data analysis, 简单介绍一些自己接触过
,并且觉得挺有用的工具,说的有点杂,权作抛砖引玉,还请不吝赐教。
Next-Gen sequencing(NGS)和现在正在发展的3rd-gen sequencing将会在生物学研究中
被越来越广泛应用。不管你信不信,反正我信了。一是基于实验成本的降低($1k
whole-genome sequencing is coming),越来越多的实验室可以操作;二是可以提供
相对low throughput experiment多的多的数据和信息,可以看到很多从前看不到的东
西;三是sequencer本身对测序的准确性正在逐渐提高,所以实验固有错误率降低;四
是各种算法的成熟应用,这使得很多由于实验产生的误差在出数据后通过对数据的分析
得以过滤。按照library preparation来分,NGS主要有DNA-seq和RNA-seq
DNA-seq is usually used as ChIP-seq to study transcription factor(TF)-DNA
bi... 阅读全帖
a******g
发帖数: 129
11
来自主题: Biology版 - NGS 二代测序分析,大家来评评
Your core is right. Normally, 70% mappable reads are mandatory for a good
ChIP-seq. 20% is way too low to convince people this ChIP-seq is valid.

70
,
with
binding
j*p
发帖数: 411
12
来自主题: Biology版 - NGS 二代测序分析,大家来评评
Agree.
Unmapped reads could be caused by (not limited to):
1. sequencing error. these reads probably won't map to any genome.
2. bacterial/viral contamination during library preparation. It won't be
easy to identify which contamination it is, if you don't have any candidates
ahead of time, however, if you do, it is pretty easy to confirm. We
recently found ~90% of our unmapped reads could be map to a bacterial genome
. This bacterial was used to replace bees to stick down the protein. while
in o... 阅读全帖
l**********1
发帖数: 5204
13
If not only bioinformatics approach to
>现在我们想做一些quality control,去掉一些不靠谱的transcripts
you team should try do bleow step by step:
1)
Chromatin immunoprecipitation (ChIP) and DNA
microarrays (chip)
2)
ChIP-PCR
3)
Expression RT-PCR
4)
Reporter plasmid construction.
5)
Cell culture, transfection, and reporter assay
6)
Western blots
cited from
//www.ncbi.nlm.nih.gov/pmc/articles/PMC2643481/
during above ChIP assay of course you can try below Bioinformatics tools package.
//info.gersteinlab.org/Tools#Ch... 阅读全帖
P*********y
发帖数: 41
14
来自主题: Biology版 - RNA-seq 表达量问题
First, you need biological replicates to get statistical significance. This
is required by most journals (especially high-profile). Second, 400M reads
per sample are overkill. For expression analysis, 5-10M mappable reads are
enough.
I don't think there is a standard cutoff RPKM value. It's arbitrary.

为0
n*********4
发帖数: 99
15
来自主题: Biology版 - RNA-seq 表达量问题
'5-10M mappable reads are enough' for bacteria. It is too low for eukaryotic
organisms like human.

This
A**H
发帖数: 4797
16
我要normalize NGS read count with GC-content
这里有一篇文章
http://journals.plos.org/plosone/article?id=10.1371/journal.pon
中间有些不懂。看了些网站教材,还不是很懂,到这里问一下。只有包子答谢了。
GC-content correction
第一句
To correct for sequencing biases that arise due to preferential sequencing
of certain levels of GC content, we normalize the read depth based on the GC
content of each bin.
第二句
These values incorporate mapability information, such that we only consider
the GC content of mappable bases.
第三句
To correct the data, we first c... 阅读全帖

发帖数: 1
17
来自主题: Biology版 - 含有indel的reads怎么比对?
It's an interesting question. To the best of my knowledge, no RNA-seq
alignment tool was designed to tolerate CRISPR-mediated indels to date. I'm
not an RNA-seq expert, so it's possible that there are some on the way. You
can also email Luca Pinello, the author of CRISPResso. He might know more,
and he's a really nice guy.
Here I'm trying to think about a solution. There are two situations - I don'
t know which one is your case.
a) The sample for RNA-seq was derived from a single mutant clone. I... 阅读全帖
B*****U
发帖数: 38
18
来自主题: Biology版 - 请教Chip-seq的问题
总体没错,但也要考虑IP efficiency
具体到每个region还要考虑mappability,blacklist,fragment size等等
western观察到genome-wide变化的话还应该加spike-in

DNA
y*******1
发帖数: 164
19
对repetitive region的研究是ChIP-Seq技术上的瓶颈,或者说是ChIP本身的瓶颈,暂
时可能无法克服。在ChIP的时候sonication后基本上DNA应该平均300bp左右,但是人类
基因组很多repeat都远比这个长,所以从ChIP的角度resolution根本不够。
另外如果从测序的角度来说,除非用MiSeq,read长度300+可能还好一些,能提高一些
mappability。所谓的pair-end 100bp根本对repeat研究没有什么太大的帮助。对绝大
多数alignment软件来说,100+100并不等于200.
最后就是有一些in silico methods,例如跟RSEM类似的CSEM可以使用EM来map ChIP-
seq reads,但是效果并不是很好。
repeat研究任重道远,但是估计研究来研究去,其实还是一堆junk DNA :D

发帖数: 1
20
像Alu刚好300bp,pair-end seq是不是能看到了?


: 对repetitive region的研究是ChIP-Seq技术上的瓶颈,或者说是ChIP本身的瓶
颈,暂

: 时可能无法克服。在ChIP的时候sonication后基本上DNA应该平均300bp左右,但
是人类

: 基因组很多repeat都远比这个长,所以从ChIP的角度resolution根本不够。

: 另外如果从测序的角度来说,除非用MiSeq,read长度300 可能还好一些,能提
高一些

: mappability。所谓的pair-end 100bp根本对repeat研究没有什么太大的帮助。
对绝大

: 多数alignment软件来说,100 100并不等于200.

: 最后就是有一些in silico methods,例如跟RSEM类似的CSEM可以使用EM来map
ChIP-

: seq reads,但是效果并不是很好。

: repeat研究任重道远,但是估计研究来研究去,其实还是一堆junk DNA :D

(共0页)