第7页 - 关于oloolo的讨论汇总 - 话题女王

o******6
发帖数: 538

来自主题: Statistics版 - [合集] 统计 Ph.D. 找工业界的工作，考sas有帮助吗？

☆─────────────────────────────────────☆
mattwang (aa) 于 (Fri Feb 27 14:37:31 2009) 提到:
要毕业了，现在找工作很难，
很多都要求两年以上的工作经验。
可是fresh Ph.D. 哪来那末多工作经验亚。
我只有intern的经验，看是不行。
是不是考个sas会好些。
有经验的xdjm上来说说吧。
☆─────────────────────────────────────☆
ahab (ahab) 于 (Fri Feb 27 14:45:39 2009) 提到:
you are not programmer
☆─────────────────────────────────────☆
eujobs (eujobs) 于 (Fri Feb 27 15:10:10 2009) 提到:
肯定有帮助，不过，你可以自学啊，不用非得要那个证书啊，会用就行

☆─────────────────────────────────────☆
oloolo (似人非兽) 于 (Fri

o******6
发帖数: 538

来自主题: Statistics版 - [合集] How to use SAS for complete permutation

☆─────────────────────────────────────☆
davfox121 (davfox) 于 (Wed Mar 4 20:55:52 2009) 提到:
For permutation like AABBCC, or AAABBCCC, which have same characters, how
could I write SAS to get complete permutation without repeat?
It seems for Proc Plan, it only work for series like ABCDEF without same
characters.
☆─────────────────────────────────────☆
oloolo (似人非兽) 于 (Wed Mar 4 21:07:24 2009) 提到:
把第一个A和第二个A调换一下位置，新的字符串跟老的有啥区别？算一个permutation
么？

☆─────────────────────────────────────☆

s*r
发帖数: 2757

来自主题: Statistics版 - 珍惜生命，远离hsbc。

later i remembered that i had a similar argument with statcompute
he gave a very complex datastep with many macros in it and challenge SQL-
lovers to implement in sql
oloolo then commented PL/sql could do it job
i said that the database structure can be improved (this is of course non-
sense for a big corporation)
but complexity and number of rows are two different things

o******6
发帖数: 538

来自主题: Statistics版 - [合集] 传销一个下载google book的利器 (转载)

☆─────────────────────────────────────☆
oloolo (似人非兽) 于 (Tue Mar 3 20:06:50 2009) 提到:
发信人: thrivechen (得陇望蜀), 信区: HUST
标题: 传销一个下载google book的利器
发信站: BBS 未名空间站 (Tue Mar 3 01:57:07 2009)
update:不好使的要注意里面其实可以使用代理的，里面option里面有大量的代理
google book downloader
google以上三个单词
然后照着做就可以下载整本书了
可能前人介绍过
其实limit review应该是针对某个IP而言
和时间也有关系
不同的时间相同的IP去刺，生成的review文件其实也不一样
那么不同的IP去刺，生成的review page也不一样
这玩意就是不停的用代理去刺，只要刚好生成那一页，就下载下来
最后下载的文件可以完全生成一本PDF书
☆─────────────────────────────────────☆
sir ( 郎 ) 于 (Tu

S******y
发帖数: 1123

来自主题: Statistics版 - 报个offer，鼓励一下大家

Congrats! oloolo!
Are you going to move to Hartford area?

b******1
发帖数: 367

来自主题: Statistics版 - 报个offer，鼓励一下大家

cong! oloolo是做保险的？

b******1
发帖数: 367

来自主题: Statistics版 - 谈谈最近两次面试经历

oloolo有绿卡了吗？

p********a
发帖数: 5352

来自主题: Statistics版 - 谈谈最近两次面试经历

我常常搞不清楚statcompute和OLOOLO谁是谁，这两都是牛人，又都是搞MARKETING的，
不会是互为马甲吧？

S******y
发帖数: 1123

来自主题: Statistics版 - 谈谈最近两次面试经历

Congrats again! oloolo.
Thanks for sharing the experience, which will benefit a lot of people here.

h******e
发帖数: 6

来自主题: Statistics版 - 谈谈最近两次面试经历

佩服oloolo，理论和应用都有很深的理解, 牛人。。。。

a********s
发帖数: 188

来自主题: Statistics版 - 谈谈最近两次面试经历

oloolo是大牛的，要向你学习！

D******n
发帖数: 2836

来自主题: Statistics版 - 好吧，各位要网址的进

突然发现，我的bookmark里面已经有了，oloolo以前发过吧。。。

d*******1
发帖数: 854

来自主题: Statistics版 - 刚刚面完

你描述你的project 显得比较entry level, 什么sort merge都出来了，这也挺好。但
是后来你又说你还manager两个手下，给人的感觉比较奇怪。我看你还是应该把你的
project吹到一个比较高的层次, 建议以oloolo的面世为蓝本， hehe。

time
I

s*****n
发帖数: 2174

来自主题: Statistics版 - R: cut

除非是均匀分布(比如你的例子), 一般不可能同时保证下面两点
1. 划分区间等长
2. 每个区间内的count一样多.
如果你希望的是1, 则按照上面dashagen说的做. 如果你希望的是2, 则按照上面oloolo
说的做. 这个没什么太好的办法.

S******y
发帖数: 1123

来自主题: Statistics版 - How to get rid of loop in R code?

Thanks. oloolo and songkun!
Here is the input data 'dat' sorted by score (if displayed in csv format) --
score,weight

p*****0
发帖数: 3104

来自主题: Statistics版 - 统计硕士是只能做SASprogrammer吗？

oloolo的水平比很多博士都牛
不过他比较例外

SAS

a********s
发帖数: 188

来自主题: Statistics版 - 统计在保险业（Casualty & Property）中的应用以及发展

大家来谈谈你们所了解或者经历的统计方法或者模型在保险业（Casualty and
Property）中有哪些应用？用到了哪些统计模型？预测一下以后发展的前景如何？
听说最多的用的是GLM.
至于前景，oloolo大作中提到的“现在保险业正在从传统精算逐渐转型到依赖现代统计
模型来精细化业务流程和产品类别”，很受鼓舞。

S******y
发帖数: 1123

来自主题: Statistics版 - How to compute the area between my curve and diagonal line

Thanks oloolo!
How can I prove that my approch is equivalent to yours mathematically?

A*******s
发帖数: 3942

来自主题: Statistics版 - 有人用markov chain做 customer migration analysis没有？

好像挺好的，看板上的大牛songkun和oloolo的经历有感。不过可能没啥参考意义，因
为他们俩都是outliers on the right side....

p*********8
发帖数: 1039

来自主题: Statistics版 - oloolo的博客在哪呢？

知道的同学给各link 吧，我想看看他怎么求导

w*****e
发帖数: 806

来自主题: Statistics版 - oloolo的博客在哪呢？

http://sas-programming.blogspot.com/
版上搜下就有了。。

d*******1
发帖数: 854

来自主题: Statistics版 - 怎样用R除掉DUPLICATED RECORD

是我局的例子不好，应该是
data[!duplicated(data[,c("ID1","ID2")]),], 根据OLOOLO

S******y
发帖数: 1123

来自主题: Statistics版 - Python - scraping 统计版 - 如何翻页？

Thanks. littlebirds!
I did not know Python has this mechanize package too!
BTW, what if I would like to add a cool feature to my Python scripts --
For example,
if there is a new post spotted here from 统计版celebrities such as SongKun,
oloolo, Dashagen, PaperTigera, qqzj, tosi, sir, fanta... , it will send out an
email notice to a designated email address... 8-)

p********a
发帖数: 5352

来自主题: Statistics版 - [合集] 报个offer，鼓励一下大家

☆─────────────────────────────────────☆
oloolo (似人非兽) 于 (Thu Mar 11 14:03:36 2010, 美东) 提到:
hartford，CT某大家都讨论过的保险公司，涨了36%
☆─────────────────────────────────────☆
libra (秤子) 于 (Thu Mar 11 14:09:39 2010, 美东) 提到:
con

☆─────────────────────────────────────☆
sleephare (I+don't+know.) 于 (Thu Mar 11 14:10:17 2010, 美东) 提到:
cong
☆─────────────────────────────────────☆
PharmD (夜里发呆) 于 (Thu Mar 11 14:16:13 2010, 美东) 提到:
Congrats! Give us baozi!!
☆─────────────────────────────────────☆

p********a
发帖数: 5352

来自主题: Statistics版 - [合集] 谈谈最近两次面试经历

☆─────────────────────────────────────☆
oloolo (似人非兽) 于 (Sun Mar 14 15:05:52 2010, 美东) 提到:
主要谈一个负面，一个正面的经历，希望对大家有所帮助。
一家是NYC的某国际著名银行个人信用风险部门；另外就是一家保险公司；都是猎头主
动找到我。
那家保险公司的HM先给了个电话面试【HM是ivy统计博士】，主要问了我简历上列的技
术方面的东西。比如一些市场研究的工具，MCA，MDS等，这些方法是啥，跟
CATEGORICAL DATA ANALYSIS的联系等；然后就是一些数据挖掘和统计方面的问题，典
型的问题就是这么几大类：
分类问题中变量筛选的方法和过程，比较各类方法的优劣；
非线性函数关系的处理方法，比较各类方法的优劣以及如何在SAS/STAT中搞出来这些方
法；
另外问了我在简历里面提到的一些我以前的projects，包括背景和技术细节；
最后问如何建立MAXIMIZE LIKELIHOOD FUNCTION和MINIMIZE CLASSIFICATION ERROR
RATE之间的关系，

p********a
发帖数: 5352

来自主题: Statistics版 - [合集] 请教一个关于R的问题

☆─────────────────────────────────────☆
fang0219 (miracle) 于 h 提到:
我有两个表格，怎么样才能把它们combine到一起呢？我用了cbind，但是这样的话，第
二个表格就直接在第一个表格后
了，我想一个column一个column的combine，即第二个表格的第一column combine到第
一个表格的第一个column
后，第二个表格的第二个column combine到第一个表格的第二个column后。。。。。。
怎样才能用R做到呢？谢谢了！
☆─────────────────────────────────────☆
fang0219 (miracle) 于 (Thu Mar 18 21:55:21 2010, 美东) 提到:
对了，这两个表格相对应column的header都是一样的。thx!
☆─────────────────────────────────────☆
oloolo (似人非兽) 于 (Thu Mar 18 21:56:14 2010, 美东) 提到:
rb

A*******s
发帖数: 3942

来自主题: Statistics版 - [问题]怎么用proc sql获取row number的值

这话再肉麻我也要说：允许我膜拜一下全能的oloolo。就没有你搞不定的sas问题...

a***r
发帖数: 420

来自主题: Statistics版 - 请教一个数据分类的问题

纠结了一个周末
还是没有学会用SAS算K-means（proc fastclus？）,sigh（to Dashagen: 再次请问一
下，你附件里的图是用什么软件什么命令生成的呢？？谢谢~）
确实应该可以在data步里解决，谢谢oloolo的思路
但是程序不能run，我再想想，多谢大家~

A*******s
发帖数: 3942

来自主题: Statistics版 - 如何用SAS Macro来计算这个公式？

agree with u. SAS is like 青龙偃月刀，not for 绣花针活. But there is always
some 牛人 like oloolo who can use 青龙偃月刀 as 绣花针. This is also my goal
, lol...

A*******s
发帖数: 3942

来自主题: Statistics版 - [SAS] Efficient way for subsetting data?

I wrote a tree macro which needs to split a dataset many times based on
different conditions, but it runs quite slowly. For a 700 rows dataset and
500 conditions, it takes 3~5 minutes to complete a loop... Is there any
general way to improve the efficiency? I can only come up two ways:
1. Create index on the variables in if conditional statement?
2. Multithread/parallel programming? I just read oloolo's blog about this
part, wish I could figure out how to do that.
Any other ideas?

d*******o
发帖数: 493

来自主题: Statistics版 - 请教大侠，投美国那些关于SAS Programming 的杂志文章容易被接受？？？

你太牛了。SAS global 是SAS里的顶级会议了。只看到oloolo，weishui 几个牛人的灌水。你那这篇poster，找个工作应该没问题吧。

s*******d
发帖数: 132

来自主题: Statistics版 - 一个关于SAS macro的问题

我也问过同样的问题。有牛人oloolo 解答。。
proc sql noprint;
select b into :b1-:b3
from yourvectordataset
;
quit;
then use &b3 to call this one element
example:
************************;
data _vector;
do b=1 to 3; output; end;
run;
proc sql noprint;
select b into :b1-:b3
from _vector
;
quit;
%put &b3;

A*******s
发帖数: 3942

来自主题: Statistics版 - [SAS] multi-thread programming and parameters...

oloolo的blog，http://www.sas-programming.com/2009/03/leverage-multi-core-with-single-core.html
还有他引用的几篇文章。
我似乎搞懂了sysparm怎么用了。

S******y
发帖数: 1123

来自主题: Statistics版 - 问个比较具体的算法问题

########### Python ############
in_file = 'C:\\_original.txt' #oloolo s example data
f = open(in_file, 'r')
ls =[]
f.next() #skip header
for line in f:
obs, group_id, id1, ID = line.split()
if id1 in ls and ID in ls: #if both already in
pass
else: #if one of them is new
print group_id, id1, ID
ls.append(id1)
ls.append(ID)
ls = list(set(ls)) #dedupe
###################### END ###################

l*********s
发帖数: 5409

来自主题: Statistics版 - 有没有只用R不用SAS的statistician

I think it is not a big issue for oloolo. but it is definitely a big problem
for you. LOL

s*********e
发帖数: 1051

来自主题: Statistics版 - 有没有只用R不用SAS的statistician

oloolo makes a good point. sas / r is just an implementation of algorithm.
it is not worth to make a big deal about it.
plus, it is totally pointless to argue which is better. although r and sas
are similar in terms of functionality, their use cases in a business
environment are very different. while r is often used as a prototyping tool
by statisticians with moderate-size datasets, sas is often preferred for
large scale deployment. they fit different business needs respectively.

A*******s
发帖数: 3942

来自主题: Statistics版 - 有没有只用R不用SAS的statistician

说说你用sas实现了啥算法吧。我不知道T大师你sas水平如何，我用sas写个CART都痛苦
地要命。
你也可以去看看oloolo的blog，看看能不能明白他的sas code。

A*******s
发帖数: 3942

来自主题: Statistics版 - An interesting question from mysas.net/forum

i didnt go through oloolo's paper about DFS since i dont know much about
hash tables, but i think standard DFS routine is able to deal with graph-
structure data, including the loop a-b-c-d-a as you said.

issue.

s******r
发帖数: 1524

来自主题: Statistics版 - An interesting question from mysas.net/forum

I do believe recursion is still necessary.
In oloolo's, it always begin with lowest level, go through one direction,
from children to parents. So one complete hash table could solve the problem
.
It is different in this case. For instance a-b-c-d-e, like you met id d
first, so cde is one group now. And then you find a, so ab is marked as one
group. One more step is necessary.
Too late, need to sleep now. Have phone interview tomorrow. :)

p********a
发帖数: 5352

来自主题: Statistics版 - 统计版分享经验强帖必读以及奖励办法

奖励-
scimitar 伪币1200
所有以下人员伪币200
itsclear
Actuaries
mitguests
wakeupgogo
flyerr
weekendsunny
papertigra
Pingping0
ninenine
oloolo
gsk
songkun

w*****e
发帖数: 806

来自主题: Statistics版 - Statistics版 - 水枪排名

时段: 2010, July-Aug
排名 ID 帖数
1 actuaries 131
2 littlebirds 111
3 dashagen 96
4 papertigra 79
5 tnegietni 78
6 scimitar 69
7 zerk 64
8 pepsico 53
9 sir 46
10 fanta 43
11 oloolo 35
12 statsguy 34
13 baicaibangzi 33
14 shuibao 29
15 songkun 29
16 providential 28
17 drburnie 28
18 tape 27
19 dapangmao 26
20 woodbridge 26
21 aquar 25
22 bighappy 25
23 hehehehe

D******n
发帖数: 2836

来自主题: Statistics版 - Statistics版 - 水枪排名

1-8 月
1 dashagen 691
2 papertigra 441
3 actuaries 426
4 sir 294
5 littlebirds 276
6 oloolo 226
7 hehehehe 202
8 zerk 186
9 aquar 173
10 pharmd 152
11 orange06 146
12 libra 134
13 statsguy 134
14 flyerr 133
15 drburnie 129
16 tnegietni 126
17 wallice 124
18 bighappy 120
19 daydayup1 119
20 songkun 117
21 westjourney 103
22 bullren 99
23 dapangmao

D******n
发帖数: 2836

来自主题: Statistics版 - Statistics版 - 水枪排名

2009
1 songkun 360
2 qqzj 332
3 orange06 261
4 dashagen 256
5 papertigra 253
6 oloolo 213
7 sir 170
8 drburnie 163
9 geography 157
10 zhongdianshi 129
11 statsguy 120
12 pharmd 115
13 birspring 113
14 doublefish 86
15 himalaya 80
16 goldmember 72
17 gutenacht 70
18 hezhi 62
19 jackspears 61
20 hehehehe 57
21 mitguests 53
22 bighappy 53
23 alexwater 51
24 fanta 48
25 moncheri427 46
26 cloverzj 44
27 baicaibangzi 43
28 daydayup1 43
29 acervulina 42
30 zaoxie 42
31 yyll51 42
32 zhaohuiziwo 41
33 zi

w*****e
发帖数: 806

来自主题: Statistics版 - 广告贴---欢迎大家加入LINKEDIN MB_STATISTICS GROUP...

我以为oloolo已经加了。。
有位是ct的保险公司的。。。

s******y
发帖数: 352

来自主题: Statistics版 - 求一个简单点的方法写一段SAS

借用oloolo 的sample data。输出变量名字被一一列出。如果太多的话，可考虑用自
动生成放入到macro 变量里。
data test;
do id='a', 'b', 'c';
do day=1 to 100;
sale=ranuni(9796876)*100;
output;
end;
end;
run;
proc sort data=test;
by id day;
run;
data want;
if _n_=0 then set test;
array salesum {10} sale10d sale20d sale30d sale40d
sale50d sale60d sale70d sale80d
sale90d sale100d;
array _sale{100};
do _n_=1 by 1 until(last.id);
set test;
by id;
_sale(_n_)=sale;
... 阅读全帖

d*******o
发帖数: 493

来自主题: Statistics版 - 求一个简单点的方法写一段SAS

/*SET UP A RESULT DATASET*/
data result;
do id = 'a', 'b', 'c' ;
output;
end;
run;
/*SPECIFIC INTERVALS ARE ASSIGNED*/
%macro summary2(time1, time2, time3, time4);
%do i=1 %to 4;
proc sql;
create table result as
select a.*, b.first&&time&i
from result as a, (
select id, sum(sale) as first&&time&i
from test
where day le &&time... 阅读全帖

T*******I
发帖数: 5138

来自主题: Statistics版 - 什么是计算数学 zz (转载)

多谢oloolo，欣赏了，也基本赞成。我大概属于应该选择统计学和计算数学之列的人，且
是业余的民科。
一个问题是，那些搞代数几何和分析数学的人应该如何搞统计？他们的思维方式能否用到
统计学中？如何体现这种应用？有何实例？

c****s
发帖数: 395

来自主题: Statistics版 - 海量SAS data的处理

Hey, Thank you guys for the replies.
my company obviously can't afford more to buy software unless it is free.
this data step is in the beginning period, and it needs to be merged with
other tables. so sampling obviously is not a good way.
oloolo :i don't know if there are many ways to do it in sas.
by using c++, do you mean design a new function and interfaced with sas?
will it let the data process faster?
right now, I just drop some redundant and large-sized variable

A*******s
发帖数: 3942

来自主题: Statistics版 - a question about adaboost

。。。。。。
一年前我就在oloolo的blog看到他对某本教科书上sas boost code的改进了

o****o
发帖数: 8077

来自主题: Statistics版 - 有没有人能贡献点UNIX SAS的经验？

diehard old school:
/user/oloolo/sas -nodms

TIP

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天