由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Biology版 - R 编程面试题,被弄残废了,在这里求解,钱不多,但会鼎力散财(转载)
相关主题
有奖问答:每个细胞所含的dna是完全一样的吗?做线虫的有人试过WGS+SNP的方法一步测序出mutant吗?
真心请教SNP跟疾病相关的文章一般发啥杂志如果genotyping结果发现一个SNP不符合HWE说明啥?
SNP 分析请教新手求教,用什么办法根据genotypes的数据,分析risk hyplotype??
一个GWAS genotype imputation的问题genotype 的价格
【求推荐】获得SNP information的网页machine learning来对GWAS结果建模
吵起来了,关于GWASgenotyping 问题求助
两个靠近的SNP的transmission是独立的吗?【求教】illumina HumanOmni2.5-quad genotyping array
怎样检测一个基因的变异与疾病的关系Re: 北大95生物系女生,海归,潜水不幸遇难 (转载)
相关话题的讨论汇总
话题: genotype话题: molecules话题: detector话题: simulation
进入Biology版参与讨论
1 (共1页)
x******0
发帖数: 1490
1
【 以下文字转载自 Statistics 讨论区 】
发信人: xzxz0000 (凶涨凶涨), 信区: Statistics
标 题: R 编程面试题,被弄残废了,在这里求解,钱不多,但会鼎力散财,
发信站: BBS 未名空间站 (Sat Apr 14 20:21:09 2012, 美东)
面试了一个software职位,本以为很有戏,但考我的题基本上全是job description上
没有的R 编程,希望这里的大侠帮助解惑答疑,在下感恩不尽。
1)
An analytical technique used in a molecular biology lab involves dispensing
solutions of DNA in very
low concentration into 384-well plates. Consider the perfectly random
distribution of N=30 molecules
onto this device
a) What is the probability that two molecules fall in the same well? Derive
the closed-form
equation for this probability.
b) Plot the above expression for the probability as a function of the number
of molecules
dispensed.
c) Solve the same problem by means of a simulation. Include your R code and
provide
appropriate statements to insure exact reproducibility of your simulation
results
d) Consider the case where the above-described device is affected by the so
called “edge
effects” – that is, the probability of a molecule landing in the wells
located on the edge of the plate is
smaller than the probability of a molecule landing in any other well.
Assuming that the probability ratio
is 1/3, revise the simulation above to calculate the probability that half
of the molecules are found in the
center wells of the plate.
2)
For a given SNP with alleles a and A, the minor allele frequency is .
Assume that this frequency is the
same for both males and females, that there is no migration in or out of the
population, and that there
is no selective advantage for either allele. The proportions of these
alleles are stable in the population
over time. Denote the possible genotype states by aa=1, Aa=2, and AA=3. The
evolution of a population
over time considering this SNP alone can be described, e.g, as follows: in
the first step, a female is of
genotype aa so it is in state 1 ( . In the next step, a mate is selected
at random and one or more
daughters are produced, eldest of whom had genotype . In the following
step, this daughter selects a
mate at random and produces an eldest daughter with genotype and so on.
a) Calculate the transition matrix for the above Markov chain.
b) Show that this chain is ergodic. What is the smallest number of
iterations, N, for which the
power N of the transition matrix is strictly positive?
c) According to the Hardy-Weinberg law, this chain is supposed to have a
steady-state
distribution σ= [ ( ( ]. Does this match the calculated steady
state?
d) For , simulate this chain for n=100,000 iterations and compare the
sampling
distribution of the simulated states with the one from the Hardy-Weinberg
vector. Show your
code and include statements to insure exact reproducibility of your
simulation.
3)
A certain molecular analyte, comprised of long quasi-linear macromolecules
with approximately
constant length l is analyzed with the help of a specialized detector. This
detector is assembled in the
form of many parallel long strips, each strip of width L. If the
macromolecules are randomly distributed
across the surface of the detector, what is the probability that such a
macromolecule would cross the
boundary between two strips? (Assume that l is smaller than L and ignore any
“edge effect”, i.e., assume
the detector has a large surface.)
b******y
发帖数: 627
2
Not crystal clear about the questions. But still try to answer some.
a) 1-(384!/354!)/(384^30)
b) plot 1-(384!/(384-n)!)/(384^n)
c) should be easy as long as the # of MC runs are large enough
d) make sure to count the # of wells on the edge right, i.e. don't double
count the ones on the four corners.
more later if time permitted.
x******0
发帖数: 1490
3
对R实在一天内不能速成,我是用perl写的,好像和你的不一样嘛,主要部分是
my $prob=0;
for(my $i = 2; $i<=$nofN; $i++)
{
$prob += (1 - $prob) / ($nofM - $i + 2);
}
print "\nProbability that two molucules fall in the same well is " . sprintf
("%.4f", $prob) . "\n";
l**********1
发帖数: 5204
4
以下的R script code 大部分看明白了
那个384 wells-plate 题目必会解了
DEGseq R Script mian branch:
cited:
###################################################
### code chunk number 4: DEGseq.Rnw:244-252
###################################################
kidneyR1L1 <- system.file("extdata", "kidneyChr21.bed.txt", package="DEGseq")
liverR1L2 <- system.file("extdata", "liverChr21.bed.txt", package="DEGseq")
refFlat <- system.file("extdata", "refFlatChr21.txt", package="DEGseq")
mapResultBatch1 <- c(kidneyR1L1) ## only use the data from kidneyR1L1 and liverR1L2
mapResultBatch2 <- c(liverR1L2)
outputDir <- file.path(tempdir(), "DEGseqExample")
DEGseq(mapResultBatch1, mapResultBatch2, fileFormat="bed", refFlat=refFlat,
outputDir=outputDir, method="LRT")
others Script link:
//bioconductor.org/packages/release/bioc/vignettes/DEGseq/inst/doc/DEGseq.R
original paper:
//www.ncbi.nlm.nih.gov/pubmed/19855105
or
//bioconductor.org/packages/release/bioc/html/DEGseq.html
more details:
//bioinfo.au.tsinghua.edu.cn/software/degseq/
relative E-books link:
//bioconductor.org/packages/release/bioc/vignettes/DEGseq/inst/doc/DEGseq.
pdf
or
//bioconductor.org/packages/release/bioc/manuals/DEGseq/man/DEGseq.pdf

sprintf

【在 x******0 的大作中提到】
: 对R实在一天内不能速成,我是用perl写的,好像和你的不一样嘛,主要部分是
: my $prob=0;
: for(my $i = 2; $i<=$nofN; $i++)
: {
: $prob += (1 - $prob) / ($nofM - $i + 2);
: }
: print "\nProbability that two molucules fall in the same well is " . sprintf
: ("%.4f", $prob) . "\n";

m*********r
发帖数: 2456
5
R是比较麻烦

dispensing

【在 x******0 的大作中提到】
: 对R实在一天内不能速成,我是用perl写的,好像和你的不一样嘛,主要部分是
: my $prob=0;
: for(my $i = 2; $i<=$nofN; $i++)
: {
: $prob += (1 - $prob) / ($nofM - $i + 2);
: }
: print "\nProbability that two molucules fall in the same well is " . sprintf
: ("%.4f", $prob) . "\n";

t******s
发帖数: 55
6
见过懒的,但没见过这么懒的。
公司的面试题你不会做,拿来问也就算了,但你至少改写一下吧,你还原文copy。
你拿到题目的时候也答应过公司不外传的,现在你要给公司的人查出来一群中国人在讨
论,你让公司以后还找中国人不?而且这个公司里的中国人也不少。
我建议你或者版主把这个贴删了。
s******y
发帖数: 28562
7
这些个话虽然有点难听,但的确是对的。楼主的做法的确不是很professional,
而且还到处贴。至少得把题目的一些字眼改一改,不然万一被公司的HR发现就
麻烦了。

【在 t******s 的大作中提到】
: 见过懒的,但没见过这么懒的。
: 公司的面试题你不会做,拿来问也就算了,但你至少改写一下吧,你还原文copy。
: 你拿到题目的时候也答应过公司不外传的,现在你要给公司的人查出来一群中国人在讨
: 论,你让公司以后还找中国人不?而且这个公司里的中国人也不少。
: 我建议你或者版主把这个贴删了。

1 (共1页)
进入Biology版参与讨论
相关主题
Re: 北大95生物系女生,海归,潜水不幸遇难 (转载)【求推荐】获得SNP information的网页
怎样检测肿瘤样本里单个基因的LOH(loss of heterozygosity)?吵起来了,关于GWAS
问个基因组的问题两个靠近的SNP的transmission是独立的吗?
两个只有一个碱基不同的DNA怎么区分?怎样检测一个基因的变异与疾病的关系
有奖问答:每个细胞所含的dna是完全一样的吗?做线虫的有人试过WGS+SNP的方法一步测序出mutant吗?
真心请教SNP跟疾病相关的文章一般发啥杂志如果genotyping结果发现一个SNP不符合HWE说明啥?
SNP 分析请教新手求教,用什么办法根据genotypes的数据,分析risk hyplotype??
一个GWAS genotype imputation的问题genotype 的价格
相关话题的讨论汇总
话题: genotype话题: molecules话题: detector话题: simulation