j*p 发帖数: 411 | 1 Agree.
Unmapped reads could be caused by (not limited to):
1. sequencing error. these reads probably won't map to any genome.
2. bacterial/viral contamination during library preparation. It won't be
easy to identify which contamination it is, if you don't have any candidates
ahead of time, however, if you do, it is pretty easy to confirm. We
recently found ~90% of our unmapped reads could be map to a bacterial genome
. This bacterial was used to replace bees to stick down the protein. while
in o... 阅读全帖 |
|
l**********1 发帖数: 5204 | 2 Samtool file
flag inside sam file
SEQanswers - Bioinformatics — I got a read aligned as below: ~ HWI-1KL138:2
:2105:12847:125331#GCCAATGCCAAT 161 chr1 12036 1 74M = 12645 1188
CTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGG
hhhhhhhhhhhghhhhhhhhhhhhhhhhhheffffdffffhhhghhehefhhhhhghhfhfffWdbfffbffff
NM:i:0 NH:i:3 CC:Z:chr15 CP:i:102519061 HI:i:0 ~ The flag here is 161=128+32
+1. 128 means 2nd pair; 32 means mated reverse strand; 1 means paired read.
I am wondering why th... 阅读全帖 |
|
u*********1 发帖数: 2518 | 3 samtools view -F 4 map.bam
4 (in decimal) or 0x0004 (in HEX) indicates that the query read is unmapped.
If you use the filter option (-F 4) you will remove unmapped reads, hence
the output will only contain mapped reads.
This flag will extract any mapped reads regardless of what mate pair looks
like.
If you'd like to account mate pair, please look at the link below:
http://www.biostars.org/p/14518/ |
|
x*****d 发帖数: 704 | 4
如果你用Tophat做的mapping,输出文件有unmapped reads。把这些unmapped reads用
Trinity做de novo assembly,看看是什么菌株。因为你的基因组有97%都能map,所以
你测的不太可能是一种新的物种。 |
|
t**x 发帖数: 20965 | 5 这到底算靠谱吗?楼主回国了?
Not surprisingly we found a direct correlation (R2=0.8) between the amount
of bacterial DNA in the samples and the proportion of reads that did not map
to the human reference genome (see the chart below). For the blood samples,
which were confirmed via qPCR to contain virtually no bacterial DNA, an
average of 4% of reads did not align to the human reference and this value
was used as a background correction for all samples. For the Oragene/saliva
samples, 5.3% of the total reads, ... 阅读全帖 |
|
t**x 发帖数: 20965 | 6 这到底算靠谱吗?楼主回国了?
Not surprisingly we found a direct correlation (R2=0.8) between the amount
of bacterial DNA in the samples and the proportion of reads that did not map
to the human reference genome (see the chart below). For the blood samples,
which were confirmed via qPCR to contain virtually no bacterial DNA, an
average of 4% of reads did not align to the human reference and this value
was used as a background correction for all samples. For the Oragene/saliva
samples, 5.3% of the total reads, ... 阅读全帖 |
|
|
|
S*A 发帖数: 7142 | 9 这个问题问得好。
Malloc 一般可以从两个地方得到 address space. 一个就是 heap.
address space 是通过 brk 系统调用来调整 heap 的最高可用的
地址。但是 brk 可以扩展的有限,至少在 x86 是这样。
另外一个地方就是通过 mmap. mmap anonymous page 就是可以
获得新的可以用的 memory.
但是这个和系统的 memory 又是另一回事。对 kernel 来说,
kernel 看到的是 raw pages. 这些 page 可以根据需要 bind
在某一个 virtual memory address space 里面。这个 binding
不是固定的。 kernel 可以把 某个 process 的 page unmap
出来用干其他的事情,等那个 process 用到那个 page 才产生
page fault, kernel 知道去什么地方把那个 page 内容从新搞
出来。一般是在某个文件,如果是 file backing, 或者是 swap
file 如果没有 file backing.
所以这里有... 阅读全帖 |
|
S*A 发帖数: 7142 | 10 这种情况似乎是自己管理写底层的 mmap/unmap/remap area 大概是最
直接管理内存的。
C++ 自己悄悄做很多内存分配怕是很难简单搞定这个问题。 |
|
c********e 发帖数: 598 | 11
unmapped.
Thanks for the reply. I already have mapped the read, I need further enrich
the reads using my own rules contain genomic region that I am interested. |
|