第3页 - 关于glm的讨论汇总 - 话题女王

w********t
发帖数: 96

大家好！
我想自学一些统计的课程。我们学校新开统计的program，所以也没有什么人可以问。
这些课程的名字如下，希望大家能够根据课程名字推荐一些经典书籍。硕士水平的就可
以。如果能够提供电子书下载资源不甚感激！
Fundamentals of Probability
Contemporary Statistical Inference
Advanced Regression Analysis I (L&GLM)
Advanced Regression Analysis II (GLM&LDA)
Applied Survival Analysis
Introduction to Statistical Computing(这个一门什么样的课？讲计算理论还是讲一
门软件)
谢谢大家提供宝贵意见！

l***a
发帖数: 12410

来自主题: Statistics版 - 有没有用R做小规模并行计算的？

能不能给个简单的例子怎么才对？
我试过glm和logistic（因为glm是SAS为多线程优化过的，logistic还不能使用多线程
），但是parallel promgraming以后都没有提高code的效率

S********a
发帖数: 359

来自主题: Statistics版 - 【包子】弱问个dummy variable问题

如果有个变量 var1, 有四个值（categorical variable)分别是a,b,c,d，然后我run一个GLM model 1 如下，var2 和 var3也是categorical variables：
model y=var1 var2 var3;
我再把var1细分成4个dummy variables, 如果var1=a则 var1a=1, 否则var1a=0. var1b, var1c, var1d 同理，再run GLM model 2 如下：
model y=var1b var1c var1d var2 var3;
俩个model我都是用var1=a做reference group, 我可以说俩个model是等价的吗？如果可以，为什么STD ERROR，P-VALUE，还有代入同样变量值组合得到的Y不相等呢，虽然差的不太多。

c**********e
发帖数: 2007

来自主题: Statistics版 - Isn't this stupid?

In the following procedure, how to write a test (or contrast) statement to
test a=b=0?
proc glm data=one;
class z;
model y=a b c z;
*TEST a=0, b=0; // This statement works in REG, but not in GLM.
run;
I feel very stupid, either me or SAS. By the way, I have been using SAS for
10+ years.

c****y
发帖数: 94

来自主题: Statistics版 - 请教大家个问题

proc glm data=upsit;
class agegroup;
model smell = agegroup;
means agegroup / hovtest=LEVENE;
run;
mean 不太关心underline distribution. 比较的时候只用考虑unequal variance.用
proc glm 的Levene's test 可以比较variance, 这个test 也算比较robust to
different from normal distribution 了。

h***x
发帖数: 586

来自主题: Statistics版 - Lots of jobs (sas programmer/biostatistician) posted

CALIFORNIA:
0000087272
SAS Programmer (12m)
Bachelor's or Masters in Computer Science or other relevant (Engineering)
degrees with 5+ years of pharmaceutical experience preferred- The work
experience should include at least two years of technical leadership in a
statistical programming environment in a pharmaceutical or biotechnology
environment including the analysis and reporting of clinical trial data-
Knowledge and application of p-values, confidence intervals, linear
regression analysis, ad... 阅读全帖

z****a
发帖数: 58

来自主题: Statistics版 - 有做non-traditional pricing的么？ (转载)

【以下文字转载自 Actuary 讨论区】
发信人: zzbaba (遛狗走遍世界), 信区: Actuary
标题: 有做non-traditional pricing的么？
发信站: BBS 未名空间站 (Tue Feb 12 18:29:32 2013, 美东)
刚拿到一个德国工业集团的offer，为集团内国际子公司的化工产品做actuarial
pricing，主要应用GLM模型，软件采用SAS和EMB套件。我纯粹是处于好奇才面了
他们，经理很自豪的说，他们是业内最早应用GLM技术做定价的公司，已经使用了几
年，可预见的未来不会改变。
我个人比较喜欢接触未知领域的新东西，想问问有没有类似经历的大拿？

d******e
发帖数: 7844

来自主题: Statistics版 - 讨论个问题，classification 的label 非常不平均

这说明你没有理解问题所在。
> n = 100000
> X = matrix(runif(n*2),n,2)
> y0 = sign((X[,1]<0.1)-0.5)
> y = (y0*sign(runif(n)-0.1)+1)/2
> sum(y==1)
[1] 17998
> sum(y==0)
[1] 82002
> out = glm(y~X,family="binomial")
> yhat=sign(cbind(X,rep(1,n))%*%out$coefficients>0)
> sum((yhat==1)*(y==1))
[1] 2
> sum(yhat==y)
[1] 82003
> idx1 = which(y==1)
> idx0 = which(y==0)[1:length(idx1)]
> out = glm(y[c(idx0,idx1)]~X[c(idx0,idx1),],family="binomial")
> yhat=sign(cbind(X,rep(1,n))%*%out$coefficients>0)
> sum((yhat==1)*(y==1... 阅读全帖

I*****a
发帖数: 5425

来自主题: Statistics版 - 讨论个问题，classification 的label 非常不平均

你这个不算是吧。
n = 1000 # training size
ntest = 1000 # test size; make this big only for illustration
id.train = 1:n
id.test = (n + 1):(n + ntest)
ratio = 0.99
n0 = round(n * ratio)
n1 = n - n0
nsimu = 100
res = NULL
for (i in 1:nsimu){
p = c(runif(n0, 0, 0.5), runif(n1, 0.5, 1), runif(ntest, 0.6, 1) )
y = sapply(p, function(x){rbinom(n = 1, size = 1, prob = x)})
x = log(p / (1 - p)) # beta is c(0, 1)
dat = data.frame(x = x, y = y)
f... 阅读全帖

c***z
发帖数: 6348

来自主题: Statistics版 - 弱问到底什么是fixed/random effect model?

As usual.
If you use LM, then the usual LM way; if you use GLM, then the usual GLM way
...

A*******s
发帖数: 3942

来自主题: Statistics版 - model validation 工作前景如何？

model validation应该会涵盖银行的所有model
作为一个在risk和aml呆过的model developer来说，
除了你说的logistic regression之外,
我搞过的东西有--
model and simulate panel data with temporal and spatial correlation;
competing risk Cox model;
various forecasting models with exogenous variable;
GLM & Double GLM;
Copula;
retrospective case-control matching;
likelihood based missing data analysis;
text clustering and classification
所以说银行的model没你想的那么单调，
不过这完全取决于老板是不是在乎模型是否严格....
如果不在乎的话，
银行里面有一大堆聪明人(可惜不懂数学和统计)搞出来的quick & dirty的方法
保证让科班出身的人看了... 阅读全帖

R******d
发帖数: 1436

来自主题: Statistics版 - 重复测量中单个时间点的组间差异

习惯用proc glm做重复测量分析，分组差异，时间差异，交互作用和单时间点的组间差
异一次全部给出。
proc glm data=data2;
class group;
model t1-t6=group;
repeated time 6 (1 2 3 4 5 6);
run;
现在想换proc mixed，分组差异，时间差异，交互作用都有，但是单时间点的组间差异
没有输出，请问应该怎么写才能输出这个？
proc mixed data=data;
class group individual time;
model col1=group time group*time;
random individual(group);
repeated time/sub=individual(group) type=ar(1);
run;
多谢了。

v******6
发帖数: 23

来自主题: Statistics版 - Capital One统计职位面试题？

几个月前面的，还勉强记得些内容。。。
面试一共四部分，senior statistician role
1. Airplane delay case，请自行考古。不过注意下，面试的时候资料已经变成v1.1版
本了，所以在细节上（比如温度已经对晚点不影响了）跟帖子里写的稍有出处。最后问
了multicolinearity。还有就是如果你重建模，假设原有的glm没有问题，这个glm模型
该如何使用？
2. 考条件概率，内容是credit card delinquency risk。最后算出来的概率需要总结
出cutoff line需要往下挪（原有的假设95%的人都是good risk）
3. Behavioral questions. 就是你遇到过什么挑战啊，学习新知识啊。面试官拿了一
张纸，貌似他们用STAR (situation, task, action, results)的方式给你打分，所以
最好都讲到。
4. Americana (the Brazil Walmart) credit card case. 就是算revenue, fix cost,
variable cost和m... 阅读全帖

v*******e
发帖数: 11604

来自主题: Statistics版 - 求助一道题

the
term
你这个问题是这样的，你的model是u=log(38.7)+eta，这里eta是需要估计的变量，u是
均值。这不是一个simple linear model with Gaussian noise，所以你需要的是用GLM
的方法去估计eta，这个GLM的方法同时会给出eta的方差。如果用计算机实现，直接就
得到方差；如果手算，是这样的：var(log(y)) = [E(d(log(y))/dy)]^2 *var(y) = （
1/yhat） *yhat = 1/u。我算得不一定对，但是思路是这样的。

Q*****T
发帖数: 558

来自主题: Statistics版 - 很desperate，求问生物统计牛人一些interview技术问题。。。。

lz对统计的迷惑程度估计已经被各位笑掉大牙了。。。其实我有挺多东西不懂（虽然其
实日常工作中也基本用不到），但是仍然很想搞清楚的。。。
1，被你们拍砖说GLM的residual不一定是正态分布以后，我google了一下，学习了
residual的分布跟response variable的分布相关，譬如data是binary，那residual就
是binomial。那么问题来了，http://www.mun.ca/biology/dschneider/b7932/B7932Final10Dec2008.pdf 这篇文章第二页第六行，说model fit improvement是chi－square distribution （关于这点我也是一知半解，我课上跟老师做过nested model comparison，就是用两个model的－2log likelihood的差，再用degree of freedom的差，用chisquare statistics比较两个model是不是有显著不同），然后这篇文章还是第二页，第13行说到“The importance of normality ... 阅读全帖

i***y
发帖数: 98

来自主题: Statistics版 - 很desperate，求问生物统计牛人一些interview技术问题。。。。

1.说model fit improvement是chi－square distribution （关于这点我也是一知半解
，我课上跟老师做过nested model comparison，就是用两个model的－2log
likelihood的差，再用degree of freedom的差，用chisquare statistics比较两个
model是不是有显著不同）
likelihood ratio test
然后这篇文章还是第二页，第13行说到“The importance of normality of residuals
in GLMs, on the
other hand, is debated.”
means some people don't care the residual in GLM
try to read this book:An Introduction to Generalized Linear Models
3.上面模型中，b和c的point estimate是用OLS或者Maximum likelihood的方法估计出
来的（这种说法对吗？？），
I... 阅读全帖

c****t
发帖数: 19049

来自主题: DataSciences版 - f.t."我不会编程"

统计被认为是predictive modeling的基础纯属意外。原因是统计常用词里有
prediction。统计里用这词需要很强的design of experiments设定的，小伙伴们就直
接通用了。当然小伙伴们是被引导的，最早是marketing里兼做点data mining的人这么
宣传的。20年后，这变成“常识”了。反正最早data mining里用的decision trees,
clustering, association rules统计也教。machine learning这东东小伙伴一说起来
不是decision trees就是neural network。其实这俩都没啥代表性。decision tree这
套跟传统glm/gam的framework的思考方式本质上没啥区别，整个体系可没传统glm/gam
发达。neural network几起几落也没弄出个系统，太开放了没法优化。在出kernel
learning之前machine learning在理论体系上比起传统统计也没啥亮点。传统统计本质
上就是做优化。一般优化不可能是global的就弄出bootstra... 阅读全帖

g***j
发帖数: 40861

来自主题: Military版 - 今夜我们都是盖

GLM！

a****r
发帖数: 12375

来自主题: Military版 - 今夜我们都是盖

其实你早就是菊汉了

GLM！

v*******e
发帖数: 11604

来自主题: Military版 - 完全不比现在搞弦理论差

写书经常不是根据思考过程写的，而是按照作者认为好的表述过程表述的。这两者可以
有根本的不同。我学那个什么glm，教科书书上写的根本就不懂作者在搞什么。后来自
己找网上别的教程，还有答疑什么的，才搞明白思考过程，其实很简单的。

w********2
发帖数: 632

来自主题: Military版 - 诺奖得主：人工智能其实就是统计学，只不过用了一个华丽辞藻

看看google怎么收集数据的吧，你不给真ip它不让你查，这才是最关键的。数据噪音大
了，random variance超过function的作用大时候，再好的算法也没戏。有好数据，传
统anova一样可以有不错的结果。ai比基础算法glm anova强多少？有时候好20-30%，有
时候一样，有时候差。

d********m
发帖数: 3662

来自主题: Military版 - 诺奖得主：人工智能其实就是统计学，只不过用了一个华丽辞藻

there is a theorem. i forget its name.
basically it says when the data are very messy and you have no idea of how
they get generated. linear models always provide better predictive capacity.
that's why i almost always choose glm when i model data.

发帖数: 1

来自主题: Military版 - AI药丸！无数坐在泡沫上的ML，DL，NLP调参师一路好走

ai有其特征应用，但同时传统anova一样有其应用领域。现在ai明显被吹大了。
说白了，人脸识别没有中国这种大脸库training set，根本不会精确，因为pattern
recognition有太多的parameters，optimization需要数据和计算时间。
ai的最大应用是在过去non relationship database不行的数据可以比sql加sas glm强
。ai一般接nonsql database.
low hanging fruits很快就会没了。到时候泡泡就破了。

发帖数: 1

来自主题: Military版 - AI药丸！无数坐在泡沫上的ML，DL，NLP调参师一路好走

dimension reduction用pca pls就可以了，这个没啥。feature selection一直这样做
。说给没做过的可能很牛，其实都是routine。我曾经比较过，每一个index单独做
anova/linear regression，和pca pls减维，基本差不多。只有很少的feature会有
correlation。而这种correlation可以用传统的简单统计发现。
总之，nn确实有独特应用，但觉得没有现在这种吹的普遍，很多应用其实可以用传统
proc glm加sql query解决。以后顶多多一个pro gnlm罢了。
ai泡泡很大，也有一定成就基础吹泡泡。就这么回事。

发帖数: 1

来自主题: Military版 - 今天突然明白了cdo竟然是ai的失败应用

proc glm只讲相关，anova可以分析因果。

d*********h
发帖数: 972

来自主题: Automobile版 - 南加买ML350记

GLE 就是现在的 ML 吗? 真是这样，奔驰起名挺傻的，应该叫 GLM.

d*********h
发帖数: 972

来自主题: Automobile版 - 南加买ML350记

GLE 就是现在的 ML 吗? 真是这样，奔驰起名挺傻的，应该叫 GLM.

s****i
发帖数: 116

来自主题: Automobile版 - 租了保时捷，全当小说看吧

打错了，是最后的价格和 GLM 差不多，不好意思

d*******o
发帖数: 107

来自主题: Automobile版 - 租了保时捷，全当小说看吧

GLM是什么车？
你研究了这么久连GLE都拼不对？

m****s
发帖数: 18160

来自主题: Classified版 - 【Statistical Programmer Position】

【以下文字转载自 Statistics 讨论区】
发信人: damacount (damacount), 信区: Statistics
标题: 【Statistical Programmer Position】
发信站: BBS 未名空间站 (Sun Apr 21 13:18:57 2013, 美东)
The Statistical Programmer will assist project’s lead statistical
programmer on generating tables, figures, listings as outlined by the
project’s biostatistician(s).
Primary Duties:
1. Program statistical analyses (i.e., tables, listings, figures, and
inferential statistical output) using SAS®.
2. Act as an integral member of project te... 阅读全帖

k*z
发帖数: 4704

来自主题: Classified版 - Entry level Data and Optimization Analyst (转载)

【以下文字转载自 Statistics 讨论区】
发信人: kiz (泥偶), 信区: Statistics
标题: Entry level Data and Optimization Analyst
发信站: BBS 未名空间站 (Fri Jun 6 10:55:58 2014, 美东)
日常工作是简单的ETL和performance reporting,项目什么都有，segment,pricing
optimization, operation optimization, performance optimization, heat map.
工作语言是SQL和SAS，Reporting语言是Cognos+VBA+MDX/SSRS,不过以上这些不会不要
紧，可以培训。
需要有基础编程经验，Python, C++,Java，R, Matlab任何语言都可以，会写简单的
simulator和calculator. 这个会面试问到。
需要了解各种模型可以如何解决运营的实际问题，例如：信用估值，精准营销，预测需
求，工作排班，客户分类，市场调查. 涉及到的有glm, logist... 阅读全帖

k*z
发帖数: 4704

来自主题: Classified版 - Entry level Data and Optimization Analyst

日常工作是简单的ETL和performance reporting,项目什么都有，segment,pricing
optimization, operation optimization, performance optimization, heat map.
工作语言是SQL和SAS，Reporting语言是Cognos+VBA+MDX/SSRS,不过以上这些不会不要
紧，可以培训。
需要有基础编程经验，Python, C++,Java，R, Matlab任何语言都可以，会写简单的
simulator和calculator. 这个会面试问到。
需要了解各种模型能解决运营实际问题，不需要知道如何具体apply,但是需要知道问题
应该在哪个track上解决。例如：信用估值(logistic)，精准营销(cluster/decision
tree)，预测需求 (time series)，工作排班(linear programming)，客户分类（
cluster），市场调查（marketing research）. 工作会涉及到的有glm, ets,logistic
,
linear in... 阅读全帖

s*i
发帖数: 388

来自主题: JobHunting版 - 有没有人面过Amazon的 statistical Enginner

glm, bayesian, dbn, etc

s*****n
发帖数: 134

来自主题: JobHunting版 - 非CS PhD 找Machine Learning 工作求指导

在读PhD Psychology/Neuroscience 专业 ABD，准备一年之内毕业。现在有一些行为和
脑成像数据分析和统计的经验，使用的模型主要有基本的统计工具， GLM，Clustering
，PCA, 最近上手了一些机器学习的经验，比如使用神经网络等等的Classifier，但主
要是用的Matlab和Python下的工具包，以及在CrossValidation步骤里的一些
MonteCarlo Simulation。
完成了Stanford的Andrew Ng的网上课程（使用Matlab)实现，感觉不是很难。目前在看
CMU 的Tom Mitchell的machine learning的课。对这类工作比较有兴趣，但目前的困难
是C++编程应用的实践。自己做数据处理的matlab, python, linux shell script运行
起来都没有问题，但感觉总是不够规范。尝试了一下Hadoop，但之前完全没有碰过Java
，所以进展缓慢。
恳请大家指点一下：目前最需要提高的技能有哪些，简历如何加工争取通过第一轮筛选。
还有一点疑问是：这方面的职位是否是纯 SDE的工种，... 阅读全帖

d*******t
发帖数: 1

来自主题: JobHunting版 - 【Statistical Programmer Position at Boston】

The Statistical Programmer will assist project’s lead statistical
programmer on generating tables, figures, listings as outlined by the
project’s biostatistician(s).
Primary Duties:
1. Program statistical analyses (i.e., tables, listings, figures, and
inferential statistical output) using SAS®.
2. Act as an integral member of project team. Attend project team meetings,
work with biostatisticians, data managers, and project managers.
3. Perform SAS® programming using such techniques a... 阅读全帖

C*******o
发帖数: 47

来自主题: JobHunting版 - 分享总结的G家统计面经

都是其他网友的经验分享，这里汇总一下，希望帮到国人
1. Flip 10 coins, observed 6 heads. Q: fair or not?
2. Type I error, Power, relationship between them?
3. How would you model the # of years some patients would survive after a
primary surgery, given their family history, demographic covariates (e.g.,
age, race, etc), how to diagnose?
4. Given a sample size of n, how do you obtain 95% confidence interval for
the median? Two cases: a) n is large, say n > 100. b) n is small, say n < 10
5. Search engine comparison and sear... 阅读全帖

v********c
发帖数: 41

来自主题: JobHunting版 - 电子工程转Big Data需要哪些准备？

各位大牛，小弟来美快三年，读电子工程PhD。来美之前，在国内有过六年的工作经验
，领域是电子产品的设计和规划。做过电子工程师，电子部门负责人，产品经理等。
来美国之后，彻底转换了方向，读PhD的领域是大脑图像，使用的方法偏向统计，比如
GLM，PCA，ICA，Clustering之类，信号处理的方法也用一些，比如小波变换。因为手
上的项目不顺利，加上老板不太nice，现在被逼的有点受不了。因为需要挣钱养家，有
意想找找工作试试看。但是千头万绪，不知从哪里开始。感觉读了几年PhD，以前拿手
的已经淡忘，新的东西又学得很烂。目前coding方面主要用的是Matlab，用了一点点R
，其他的基本没有基础（只是在n年前学过C）。
希望大牛们能给个建议，我想找DS方面的工作，需要加强哪方面的知识和技能，需要大
概多少时间，能不能帮忙推荐一下合适的学习书籍或者材料，非常感谢了！

发帖数: 1

来自主题: JobHunting版 - 现在哪家的退休计划比较好的？

GLM?
回帖要有十个字是个字

y***o
发帖数: 254

来自主题: shopping版 - 有木有达人在lenovo canada买过thinkpad?

哦。不过我在美国网站上配时好像那个价钱是拿不下来。
intel quad core i7 820qm
windows 7 32
15.6 fhd multi led w/ww ant
nvidia glm dg 1gb, amt
ultranavfpr for color sensor
2G ddr3 (1 dimm)
500GB hard disk 7200
dvd recordable, ultrabay enhanced
9cell
bluetooth
ultimate-n 6300
smart card

p*y
发帖数: 61

来自主题: Bridge版 - YAHOO 牌例

DLM才是比较厉害的,因为位置分要求比较高
查一查GLM要多少大师分，容易的话就弄一个

o****o
发帖数: 8077

来自主题: SCU版 - Job Opening (转载)

【以下文字转载自 Statistics 讨论区】
发信人: jackinbottle (小吴), 信区: Statistics
标题: Job Opening
发信站: BBS 未名空间站 (Fri Jan 20 10:43:38 2012, 美东)
大家好
我们公司(a US actuarial consulting firm)最近想在上海招一个pricing。
希望candidate有较强的统计背景（GLM,machine learning和其他的predictive
modeling），对金融保险行业感兴趣，愿意静下来做数据分析。工作的话应该是P&C为
主，health有可能也会有。
没有特别的专业要求，stats master/phd a plus.
欢迎有兴趣的各位和我联系私信或者email: jackinbottle at gmail dot com
祝各位新年快乐。

f*********e
发帖数: 1144

来自主题: SCU版 - Job Opening (转载)

我的妈，我已经晓得how to spell out GLM.....哪个敢说我是文科生！

K****n
发帖数: 5970

来自主题: CS版 - probit regression一问 (转载)

【以下文字转载自 Computation 讨论区】
发信人: KeeVan (Kevin), 信区: Computation
标题: probit regression一问
发信站: BBS 未名空间站 (Fri Aug 22 21:53:32 2008)
请问有没有现成的教材把maximum likelihood的导数求出来的? 我想对一下,网上居然
google不出来... 我不太放心matlab里的glm方程之类的,那个training的时候震荡比较
大.
另外如果对probit方程的参数设一个gaussian prior,然后求bayesian的
P(data)=Integrate(P(data|parameter)*P(parameter),over parameter)
好像这里用probit方程作P(data|parameter),用Gaussian作P(parameter),在optimize
bayeisan likelihood的时候比较好算?不知道有没有人已经算过?又google不出来...
谢谢!

f**y
发帖数: 138

来自主题: Database版 - Help: SAS

SAS definetely can write the results of GLM to a separate database.
Check help for LIBNAME.

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天