由买买提看人间百态

topics

全部话题 - 话题: data
首页 上页 1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)
B*****g
发帖数: 34098
1
【 以下文字转载自 Database 讨论区 】
发信人: Beijing (我是猪--听说猪是被祝福的), 信区: Database
标 题: Business Intelligence and Data Warehouse Seminar on 12/7 (CINAOUG/CINASSUG)
发信站: BBS 未名空间站 (Mon Dec 3 11:47:54 2012, 美东)
Business Intelligence and Data Warehouse
时间: 十二月七日,星期五,美东时间晚上8点-9点
讲座者:Mr. George Shen (Specialist Master, Information Management, Deloitte
Consulting)
参加办法:全部免费,无需报名,讲座当天连接到网址
http://www.AnyMeeting.com/cinaoug1
(Link available after 7:50 PM)
Main topics include:
1. what is BI and DW? and why?
2. what ... 阅读全帖
s******u
发帖数: 757
2
官网上是这么写的
No-commitment data plans from AT&T or Verizon
In the U.S., you can choose from data plans with no long-term contract.
Easy sign-up
就是说我不必绑定data plan?
You sign up for 3G service right on your iPad. And you can monitor your data
usage and change your plan at any time, including adding data or canceling
service if you know you won’t need it.
就是说,加了data plan之后,也可以随时停掉?
s****e
发帖数: 1180
3
【 以下文字转载自 Statistics 讨论区 】
发信人: sheide (shei), 信区: Statistics
标 题: 诚心请教大data set到底该怎么分析?
发信站: BBS 未名空间站 (Wed Jun 22 18:39:19 2011, 美东)
诚心请教大data set到底该怎么分析?今天面试的一个问题,说是有一个data set要分
析,有100 million个observations,200 thousand个covariates,公司不用SAS,只用
R和Python,但这么大的data set R 完全handle不了,问我该怎么办?用C?我会C。好
象版上以前有讨论过大data set,但好象一般学校的phd program 都没这方面的
project(whatever,我胡说的,反正我们学校是这样,不知道其他学校怎么样?),今
天终于让我碰上了。大家知道这方面一般都怎么办?有什么常规方法?或是有什么实用
的参考书吗?还有如果用C的话,我一般就用 dev c++ IDE,或是用linux gcc,请问这
两种C平台能分析了这么大的data se... 阅读全帖
a*******t
发帖数: 891
4
how is the data collected, or updated/inserted? do you get a data feed at
certain time of the day, or is this going to be a static set of data you are
working with?
have you considered breaking the data into smaller groups of files, and load
only the needed data into a temp table when requested.
cabID_0000.txt
cabID_0001.txt
....
each of those files would contain all the data associated with that cabID.
and when a request comes in, read the file(s) and write them to a table for
query
you can do
l******t
发帖数: 660
5
来自主题: Database版 - 关于big data
同意big data现在就是一个marketing term, 对于95%的传统公司, 目前我看不到没有
传统
的数据库架构(ER, 3nf, olap/dw)不能解决的问题, big data, 现在针对的还是
internet 技术的公司, 比如google/yahoo/淘宝, 传统数据库ER对于unstructure
data, data mining, 多并发的领域的先天不住, 就是hadoop/mapreduce的用武之地。
每次我看到big data的buzz word, 就老是想起前几年的bioinformatics, 当时也是热
的不行, ms, 摩托罗拉等纷纷往里跳, 但是没有成熟的市场化的产品,成熟的市场需
要、 等不了多久就纷纷破。
big data不会是那么惨, 毕竟数据越来越大是趋势, 不过一定要有某方面的大突破(
machine learning?) 才能让这个技术真正的从阳春白雪变成能赚钱的产品。

solution
B*****g
发帖数: 34098
6
【 以下文字转载自 Database 讨论区 】
发信人: Beijing (我是猪--听说猪是被祝福的), 信区: Database
标 题: Business Intelligence and Data Warehouse Seminar on 12/7 (CINAOUG/CINASSUG)
发信站: BBS 未名空间站 (Mon Dec 3 11:47:54 2012, 美东)
Business Intelligence and Data Warehouse
时间: 十二月七日,星期五,美东时间晚上8点-9点
讲座者:Mr. George Shen (Specialist Master, Information Management, Deloitte
Consulting)
参加办法:全部免费,无需报名,讲座当天连接到网址
http://www.AnyMeeting.com/cinaoug1
(Link available after 7:50 PM)
Main topics include:
1. what is BI and DW? and why?
2. what ... 阅读全帖
w*r
发帖数: 2421
7
来自主题: Java版 - the best way to transfer data?
I think you question is that you want to make a 'generic' plot package
which can take varioius forms of data.
I am working on my project by letting the chart/plot package accept the
XML form of data. Basically most of my plot is 2D plot, then I just define
the data as






.......

by letting this generic structure , you can write your parser to parse it
into your chart's data object, then you can draw
you are free to exten
B*****g
发帖数: 34098
8
【 以下文字转载自 Database 讨论区 】
发信人: Beijing (我是猪--听说猪是被祝福的), 信区: Database
标 题: Business Intelligence and Data Warehouse Seminar on 12/7 (CINAOUG/CINASSUG)
发信站: BBS 未名空间站 (Mon Dec 3 11:47:54 2012, 美东)
Business Intelligence and Data Warehouse
时间: 十二月七日,星期五,美东时间晚上8点-9点
讲座者:Mr. George Shen (Specialist Master, Information Management, Deloitte
Consulting)
参加办法:全部免费,无需报名,讲座当天连接到网址
http://www.AnyMeeting.com/cinaoug1
(Link available after 7:50 PM)
Main topics include:
1. what is BI and DW? and why?
2. what ... 阅读全帖
b****t
发帖数: 114
9
Hi all,
I want to append new column data to a data file. The data has been formated
in a tabbed way (very clean data without any missing/empty cells). This is
usefull when I run my c/c++ code and save data each time as one or more
columns. Ideally, it will be done within c/c++ code with fstream, but it is
ok to use stript split single (or 2 column) column data into multiple
columns.
e.g.
file1:
1 2 3
2 3 4
5 6 7
z***e
发帖数: 5393
10
来自主题: Programming版 - Best practice for updating user data?
这应该是个常见的design问题,但是不知道怎么才是最好的方式。
打个比方说,服务器有几百万个user,所以用了多个data server,每个data server存
一部分user和相关信息(每个user的相关data很大).这个server存放哪些user并不是固
定的,会根据user的具体情况变化,问题就在于如果user这个具体情况非常频繁地变化的时
候如何有效处理。
user需要不停频繁地update信息,所以不能直接写数据库。那么每次user update的时
候,就会向服务器发网络请求,送出自己的userid和需要update的东西,网络server首先根据
userid找到user所在的data server,然后对该data server发出update请求.
这种情况,网络server就要专门用一个table来map userid<=>data server的关系,感
觉很浪费,有没有更好的方法?
G***n
发帖数: 877
11
来自主题: Programming版 - 做Big data的前景如何?
最近对big data比较感兴趣。个人分析了一下,主要是编程型的应用方向Cloud
computation - 包括parallel computation, Hadoop 和研究型的应用方向Data Mining
and Machine Learning - 包括Pattern recognition, Prediction. 还有
一些数据处理过程比如data cleaning, data visualization.
现在好像很多企业都需要做Data Scientist的人,跟Hadoop,.Net一起做Web service
。想往这个方向发展,感觉做研究型的应用方向应该越做越吃香,不容易被新技术淘汰
。但不知道这个方向能走多久,以后的薪酬如何。本人菜鸟级,对big data只略知一二
。各位大牛有何高见?
g*****g
发帖数: 34805
12
原来你没弄清楚,你们软软hadoop也撸得很起劲。你鼓吹的啥scope,纯属手淫是真的。
Microsoft believes big data should be in the hands of the people closest to
your business who are moments away from that next big idea. Our approach is
simple - wed the power of Hadoop with your core databases and bring
unstructured and structured data to life through rich, 3D data
visualizations with the tools that your business uses most. We bring Hadoop
and big data to end users through powerful and familiar tools like Excel
2013. Discover and comb... 阅读全帖
s*****r
发帖数: 43070
13
来自主题: Programming版 - 你们有没有一种感觉,其实big data
big data就是解放了生产力,让以前不太重要的data能像DB一样存放,关键是成本低廉
。这年头硬盘和CPU都不值钱,Oracle的license死贵。没有big data,不重要的数据就
只能放弃,比如用户在界面上的活动规律,不可能用RMDB来存取。
有了big data,想存什么data都没不用计较成本,对于data mining立马进入新时代。
z****e
发帖数: 54598
14
来自主题: Programming版 - 自学big data有啥好办法
big data不仅仅是big data啊
所有unstructured data都可以被认为是big data
自学big data就跟说自己自学j2ee一样
这只是一大堆产品的抽象概称
你找每一个领域有代表的产品,挨个搞定就好了
其实你会了其中一个,出去跟人家说你会big data
人家也不敢拿你怎么样,就像以前
写过servlet的就跑出去说,做过j2ee
人家能拿他怎么着呢?servlet的确是j2ee呀
j*******g
发帖数: 331
15
来自主题: Programming版 - Data Engineer @ ADP data team
The team is trying find a good data engineer in NJ headquarter office. We
want someone who is really good and comfortable with linux and data. We use
a lot of python here, and in future we gonna adapt to scala a lot. Of course
, we also use hive/pig/spark. We don't really need someone with the big data
or machine learning experience, we just want a good programmer and unix
system guru. You will get the chance to work on a lot of machine learning
and big data projects going on here, we have spark... 阅读全帖
s********g
发帖数: 161
16
【 以下文字转载自 Windows 讨论区 】
发信人: shengguang (shengguang), 信区: Windows
标 题: 非常焦虑,有没有任何办法可以恢复在PORTABLE DRIVE 删掉的DATA
发信站: BBS 未名空间站 (Sat Jun 7 23:42:34 2014, 美东)
我有两个PORTBLE DRIVES, DATA 被 CUT AND PASTE 到了另一个DRIVE 上了,现在那个
有DATA的DRIVE 被偷了。
请问,有没有任何办法恢复DATA,从那个被CUT掉的DRIVE 上?诚恳希望得到你的帮助
。那些DATA是我来美国后,所有非常重要的DATA,工作NOTES,和技术DOCS。
或者,有没有什么商店我可以出钱恢复的?
万分感谢!
B*****i
发帖数: 1246
17
1,package description:
data CD都在书得第一页,所以没有拍出来
1). 2009 FAR 教材(含Data CD) + LECTURE光盘
2). 2009 BEC 教材(含Data CD) + LECTURE光盘
3). 2009 REG 教材(含Data CD) + LECTURE光盘(这个的lecture 光盘被我不小心弄
碎了,抱歉,其他的都是好的)
4). 2009 AUD 教材(含Data CD)+ LECTURE光盘
5). Course CD(三次安装已经用完了,现在卖2009版本的1次安装几十dollar左右,我
可以给你个网站的连接,装上的话就可以使用)。我比较被动学习,看书会睡着的,还
因为稍微转过专业,还是喜欢听听lecture之类的,对着电脑做做题目,我觉得和2011
的基本差不多,我自己考下来的感觉是2011考IFRS的很少,几乎可以忽略,REG几乎就
没有的。我会送2011的pdf的,你自己可以对比一下。还有一些打印出来的2010 FAR 和
REG的passmaster,maggie’s CPA notes之类的,都一起邮寄给... 阅读全帖
c****l
发帖数: 53
18
来自主题: Biology版 - 招人 (big data 相关)
老板最近成立一个center (Center for Statistical Inference in Biomedical Data
Science), 有很多钱, 想招一些人. 老板是UPenn Biostatistics的 Full Professor
http://statgene.med.upenn.edu/, 新成立的center主要做跟Big data相关的东西, 比如genomic data, image data, EMR/EHR data, social network data . 老板人很好, 职位都是学校里面职位, 身份问题比较好解决, 薪水应该都是UPenn的薪水标准.
目前有如下几个职位:
1. Programmer: 硕士就可以了, 因为实验室主要做一些统计方法, 所以需要招人把统
计方法做成可以用的软件, 供其他人使用, 所以主要工作就是编程. 我们主要用Python
和R, 偶尔用一些C++和JAVA. 这个职位是长期职位 (permanent position).
2. Scientist: 要求有博士学位, 我们实验室和医院(主要是CHOP... 阅读全帖
s****e
发帖数: 1180
19
【 以下文字转载自 Statistics 讨论区 】
发信人: sheide (shei), 信区: Statistics
标 题: 诚心请教大data set到底该怎么分析?
发信站: BBS 未名空间站 (Wed Jun 22 18:39:19 2011, 美东)
诚心请教大data set到底该怎么分析?今天面试的一个问题,说是有一个data set要分
析,有100 million个observations,200 thousand个covariates,公司不用SAS,只用
R和Python,但这么大的data set R 完全handle不了,问我该怎么办?用C?我会C。好
象版上以前有讨论过大data set,但好象一般学校的phd program 都没这方面的
project(whatever,我胡说的,反正我们学校是这样,不知道其他学校怎么样?),今
天终于让我碰上了。大家知道这方面一般都怎么办?有什么常规方法?或是有什么实用
的参考书吗?还有如果用C的话,我一般就用 dev c++ IDE,或是用linux gcc,请问这
两种C平台能分析了这么大的data se... 阅读全帖
s*****a
发帖数: 310
20
来自主题: Engineering版 - PhD Positions in Data Mining-2014 Spring&Fall)
UT Arlington工业与系统工程系数个PhD Full Financial Support机会, Data mining/
Machine Learning/Optimal Decision Making方向 at Center on Stochastic
Modeling, Optimization, & Statistics (COSMOS). 该研究方向应用性强,毕业工作
机会多,适合数学和编程基础好的申请,发简历到s*****[email protected].
招生信息:
The objective of the Center on Stochastic Modeling, Optimization, &
Statistics (COSMOS) at The University of Texas (UT) at Arlington is to
research the designing and modeling of complex real-world systems, in
particular, to develop new methods for making sound ... 阅读全帖
m******t
发帖数: 273
21
【 以下文字转载自 Quant 讨论区 】
发信人: myregmit (myregmit), 信区: Quant
标 题: how to do data fitting to find distribution
发信站: BBS 未名空间站 (Sat Mar 15 11:02:05 2014, 美东)
Hi,
I need to do data fitting to find the distribution of a given data.
I need to find the pdf funtion of the distribution.
I can use data fitting functions in matlab and python.
It looks like a truncated gamma.
But, how to find the paramters of the distribution ?
What if the data cannot fit the truncated gamma well ?
The QQ-plot (qunatile-qua... 阅读全帖
g******7
发帖数: 19
22
简单的说,data mining就是从庞大的数据里面发掘出有意义或是价值的信息 (就如
同从地下矿石里淘出金子或其它有价值的成分)。从这个角度来说,它是一个跨多领域
的工作。可能涉及到数据的整理/处理,也可能会有涉及到建模的成分
here is one definition of "data mining":
Simply stated, data mining refers to extracting or “mining” knowledge from
large amounts of data.
here is a good reference book on data mining: "data mining: concepts and
techniques" by Han, J.& Kamber, M.(2006).
s*****z
发帖数: 202
23
insurance claim data一般都存在oracle,db2等server上,大学医院的data应该不是
claim data, 而是health outcome, or billing, 应该也是存
在server
repeated measures, test variability in different hierarchical data应该就是
longitudinal data analysis Or so called mixed effect model

data
o******6
发帖数: 538
24
☆─────────────────────────────────────☆
gutenacht (嗯) 于 (Thu Feb 26 20:28:13 2009) 提到:
sas的raw data file 和raw data set有什么不同啊。谢谢!
☆─────────────────────────────────────☆
gutenacht (嗯) 于 (Thu Feb 26 21:39:15 2009) 提到:
r
是不是raw data set就在output里面出现,raw data file是一个sas文件?

☆─────────────────────────────────────☆
Cymbalta (Cymbalta) 于 (Fri Feb 27 11:56:39 2009) 提到:
give you a simple example:
you have some apples in a basket. The basket just likes the raw data file.
The apples like the
j*****i
发帖数: 47
25
请问:SAS如何生成这三个DATA SETS: test1, test2, test3?
能给详细解释一下处理的过程吗?
谢谢。
DATA x;
INPUT x1 x2 x3 x4;
CARDS;
1 0 0 0
1 0 0 0
1 0 0 0
1 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 1 1
0 0 1 1
0 0 1 1
0 0 1 1
0 0 0 1
0 0 0 1
0 0 0 1
0 0 0 1
;
RUN;
DATA avg;
INPUT avgnum;
CARDS;
3
;
RUN;
DATA test1;
SET avg;
SET x;
RUN;
PROC PRINT;
RUN;
DATA test2;
SET avg x;
x***x
发帖数: 3401
26
来自主题: Statistics版 - 问2个基本概念: ad hoc, data validation.
1. data validation is like auditing, you need to compare data with other
sources to make sure the data is correct. data cleaning is just a general
term of making changes to data (to make the data correct)
2. ad hoc means something specially designed to fit a purposes.
t**i
发帖数: 688
27
Set and Merge are not the same!
Try the following code:
data Dairy;
input Item $ Inventory Price ;
datalines;
Milk 15 1.99
Milk 3 1.99
Soymilk 8 2.99
Eggs 24 2.99
;
proc sort data=Dairy;
by Item;
run;
data Dairy2;
input Item $ Inventory Price;
datalines;
Soymilk 8 2.99
Eggs 24 2.99
Cheese 14 3.29
Yogurt 10 2.49
;
proc sort data=Dairy2;
by Item;
run;
data cc;
set Dairy Dairy2;
by Item;
run;
data dd;
merge Dairy Dairy2;
by It... 阅读全帖
P****D
发帖数: 11146
28
居然还有问这个的。这个公司很看重missing data的问题,还是只想考察你的能力?
我们教missing data的老师给的最简单实惠方法:先imputation,然后用imputed data
去做回归,与此同时assume本来的data是missing at random再来个回归,如果这两个
回归的结果差不多,那就用假设data是mar的结果。
这是最便宜大碗的方法,认真起来这missing data问题可大了,phd的课讲一学期……
c*****a
发帖数: 808
29
来自主题: Statistics版 - 真心请教: data cleaning
在这看过几篇paper相关文章,希望对你有用
SAS缺失数据处理 Missing Data Imputation in SAS
Multiple Imputation for Missing Data: Concepts and New Development(Version 9
.0) (very good article)
An Introduction to Multiple Imputation Methods: HandlingMissing Data with
SAS V8.2
Imputation Techniques Using SAS Software for Incomplete Datain Diabetes
Clinical Trials
A SAS Macro for Single Imputation
quote:
"This paper reviews methods for analyzing missing data, including
basic concepts and applications of multiple imputation
te... 阅读全帖
s********1
发帖数: 54
30
来自主题: Statistics版 - One question about data step in sas
Question(1)The following is my sas code: Who can tell me why it is wrong?
###############################
data temm;
length A B 3 X;
input A B X;
cards;
10 20 3
23 34 56
run;
#################################3
Error message:
##########################
78 data temm;
79 length A B 3 X;
-
22
1 1
-
352
ERROR 22-322: Expecting a numeric constant.
ERROR 352-185: The length of numeric variables is 3-8.
80 input A B X;
81 cards;
NOTE: The SAS Sy... 阅读全帖
S******y
发帖数: 1123
31
来自主题: Statistics版 - big data analysis in Revolution R
Interesting topic :-)
Many people think that there would be such a thing coming that user could
simply plug in R or SAS and make all existing functions/packages/procedures
to run on Hadoop-scaled data and "solve" the ultimate data size problem.
Unfortunately, there is no such thing. To achieve that, somebody has to
virtually rewrite every R package or every SAS/STAT procedure since most of
their underlying code/algorithms are simply not map-reduce compatible.
That is industry-scaled development... 阅读全帖
s*****d
发帖数: 267
32
来自主题: Statistics版 - 请教:SAS处理Longitudinal data 的问题
data step 是你的朋友,Proc SQL有Performance 问题,而且能用Proc SQL 解决的,
都能用Data Step解决。甚至用几个Proc SQL才能解决的,一个Data Step就能解决。下
面的SAS Code我没测试过, 但基本思路给你了。
请发包子
首先,Sort this dataset
Proc Sort data=patient_ds out=patient_ds_sort;
by subject units descending time;
run;
其次,一个Data Step 搞定问题
data patient_morethan_5(keep=subject);
set patient_ds_sort;
if _n_=1 then do;
cur_p=-1; /* assuming your patient id are all positive*/
total_unit=0;
cur_t=0;
is_find=0;
run;
if cur_p... 阅读全帖
l******9
发帖数: 579
33
I am working on data analysis.
Given a group of data vectors, each of them has the same dimension. Each
element in a vector is a floating point number.
V1 [ , , , … ]
V2[ , , , … ]
...
Vn [ , , , … ]
Suppose that each vector has M numbers. M can be 10000.
n can be 200.
I need to find out how to partition the n vectors into sub-groups such that
each vector in one subgroup can be represented by a basic vector in the
subgroup.
For example,
W = union of V1, V2, V3 … Vn
Find subgroup i... 阅读全帖
m******t
发帖数: 273
34
【 以下文字转载自 Quant 讨论区 】
发信人: myregmit (myregmit), 信区: Quant
标 题: how to do data fitting to find distribution
发信站: BBS 未名空间站 (Sat Mar 15 11:02:05 2014, 美东)
Hi,
I need to do data fitting to find the distribution of a given data.
I need to find the pdf funtion of the distribution.
I can use data fitting functions in matlab and python.
It looks like a truncated gamma.
But, how to find the paramters of the distribution ?
What if the data cannot fit the truncated gamma well ?
The QQ-plot (qunatile-qua... 阅读全帖
c*****p
发帖数: 51
35
来自主题: Statistics版 - A question about combine SAS data steps
When I combine below first three data steps into one single data step at the
bottom, it generates different results. Can anybody figure out why? Thank
you very much.
data A;
merge X(in=a) Y(in=b);
by key_id;
if a=1;
run;
data A;
set A;
if year<=2006 then id=new_id;
run;
data A;
set A;
drop new_id;
run;
data A(drop = new_id);
merge X(in=a) Y(in=b);
by key_id;
if a=1;
if year<=2006 then id=new_id;
run;
c***z
发帖数: 6348
36
【 以下文字转载自 DataSciences 讨论区 】
发信人: chaoz (晨钟暮鼓), 信区: DataSciences
标 题: [Data Science Project] Location data quality
发信站: BBS 未名空间站 (Wed Sep 24 14:35:40 2014, 美东)
Hi all,
This is my first project in the new company, and it is about third party
data quality. There is no gold standard for quality, but we know that
repetition of location in the dataset might imply bad quality, because in
this case the location might come from a centroid (e.g. a cell tower, rather
than a cell phone).
There is also no... 阅读全帖
b**********l
发帖数: 116
37
最近做的几个project的data类型都差不多,但是我没有经验,还请熟悉这种data的大
牛赐教。
一般课本上的预测模型,比如一些机器学习的算法,都是row:observation,column:
features。然后每一列都是一个random variable。
但实际我遇到的问题是,又多了一个时间轴:上面的这个obs*feature的data只是固定
一个时间点的切面。需要解决的问题是怎么进行预测。
开始我特别迷茫,后来听人家说这叫做panel data,也就是item*feature*time。我似
乎听说过,在计量经济学的领域。不过计量经济学貌似都是一些线性模型。
我想知道,现在预测模型这么发达,比如LASSO,RT,GBM,SVM,NN,乃至于Deap
learning,有没有能够应用在这种Panel data上的?有没有什么文献可以参考?
实际的一些经验:
- 很显然,这种data 不是iid的。相邻时间的observation可能有关系,肯定需要利用
一下吧。有些问题的item之间也有联系,这种联系也可能要利用一下。
- 如果生硬的把这个三维的数据给melt成二维的,... 阅读全帖
j**w
发帖数: 382
38
来自主题: Statistics版 - [job opportunity] data scientist (转载)
【 以下文字转载自 JobHunting 讨论区 】
发信人: jscw (two cats), 信区: JobHunting
标 题: [job opportunity] data scientist
发信站: BBS 未名空间站 (Mon Jan 25 13:31:37 2016, 美东)
Job Description:
We are looking for a highly motivated data scientist to join our innovative
startup (Elastica/Blue Coat Systems). You would be part of a multi-
disciplinary team chartered to build scalable distributed solutions to solve
the most challenging security problems.
Responsibilities:
● Work alongside Elastica’s software engineer team to i... 阅读全帖
c****l
发帖数: 53
39
来自主题: Statistics版 - 招人 (big data 相关)
老板最近成立一个center (Center for Statistical Inference in Biomedical Data
Science), 有很多钱, 想招一些人. 老板是UPenn Biostatistics的 Full Professor
http://statgene.med.upenn.edu/, 新成立的center主要做跟Big data相关的东西, 比如genomic data, image data, EMR/EHR data, social network data . 老板人很好, 职位都是学校里面职位, 身份问题比较好解决, 薪水应该都是UPenn的薪水标准.
目前有如下几个职位:
1. Programmer: 硕士就可以了, 因为实验室主要做一些统计方法, 所以需要招人把统
计方法做成可以用的软件, 供其他人使用, 所以主要工作就是编程. 我们主要用Python
和R, 偶尔用一些C++和JAVA. 这个职位是长期职位 (permanent position).
2. Scientist: 要求有博士学位, 我们实验室和医院(主要是CHOP... 阅读全帖
l***y
发帖数: 4671
40
Big data 领域大约可以分三块:big data generation, big data management and
analysis, big data applications。
如果是一个 team,建议最好这三块都做,才是完整的故事。
纯做 data science 的人做 big data 的最大问题是不懂数据背后的故事,因此不知道
为什么这些数据要被采集/生成,不知道为什么要用这种方式生成,为什么要有这些数
据项而不是别的,这些数据是为了解决什么问题的,如何解决,如何评估,等等。是否
能够成功,其实很大程度上在于对数据和问题的理解。
l**********e
发帖数: 336
41
there is no data science or now, or there are very few data science
positions right now, for a big IT firms (>=5000 ppl, etc), probably only 10-
20 positions are close to data science
some firms just name the data engineer or data analyst related roles with
the "data science" in the title,
b*****e
发帖数: 853
42
来自主题: DataSciences版 - suggestion on geospatial data? (转载)
In GIS (geographic information science), spatial analysis to find best
location usually needs more than one data layer. For example, to find gold
mine in the mining industry, the GIS Specialist/scientist will consider the
geology data, hydrology data, soil data, vegetation data, elevation data (
slope, aspect, etc.), besides the existing location of gold mines.
X********1
发帖数: 707
43
向各位请教下我的背景是否能申请data scientist职位,先感谢下大家。
背景:本科毕业后国内1年TOP金融机构Financial Advisory全职经验.后来申请了美国
50名MBA marketing analytics concentration,毕业后在美国找到一份digital
marketing方面的工作但是没拿到H1B。因为想留在美国,所以决定回本校继续读第二个
master-Business Analytics。我们大概有这些课程:
必修: Database Management(SQL), Data Warehousing, Web Analytics (Python),
Business Analytics for Manager(Cognos, Tableau), Data Mining for Business(
SPSS), Text Anaytics, Statistic
选修:SAS Marketing Decision Models, Big Data Analytics(AWS Hadoop and
Cloudera),Statistic Me... 阅读全帖
f**********t
发帖数: 11
44
Please send resume to [email protected]
/* */ and note which position
you are interested. Thanks
1. Cloud and Big Data Lead Engineer
• Bachelor degree in engineering in computer science or related;
Masters
preferable.
• Minimum of 3-4 years of hands-on experience of designing and
developing comprehensive big data solutions on the cloud.
• Cloud certified; preferably in - AWS Certified Solutions Architect -
Associate or Professional
• Fluency in English (oral and wri... 阅读全帖
f**********t
发帖数: 11
45
Please send resume to [email protected]
/* */ and note which position
you are interested. Thanks
1. Cloud and Big Data Lead Engineer
• Bachelor degree in engineering in computer science or related;
Masters
preferable.
• Minimum of 3-4 years of hands-on experience of designing and
developing comprehensive big data solutions on the cloud.
• Cloud certified; preferably in - AWS Certified Solutions Architect -
Associate or Professional
• Fluency in English (oral and wri... 阅读全帖
d****n
发帖数: 12461
46
这算data scientist,不算big data engineer。
我说说我们这里的big data engineer都干啥的吧:
1. 设计一个系统,把传统的etl工作用hadoop框架跑起来。当中用到hadoop, mr/pig,
spark, kafka还有诸多nosql数据库不说;
2. 负责系统稳健运行;
3. 各种运行中的问题;
4. 自动化工具, 测试工具;
资深一点的在干嘛:
1. 根据系统特点设计集群大小,设计网络
2. 给BI设计UI自动生成脚本和job产生结果。
还有一些给内部用户和部分外部用户写api的,还有就是解决data scientist各种稀奇
古怪问题的。例如有的data scientist要求在avro和parquet格式之间转换,还有的
data scientist没法处理全部数据,要求帮忙取样。
b*********n
发帖数: 2975
47
来自主题: DataSciences版 - data challenge ... 现在公司都咋tmd想的
use tableau, a quick way to learn data, ;-)

扔了一大团快1G的内部的原始data过来。没有任何文档,任何data model, data log。
单看名字完全不知道是什么玩意, 连是categorical data还是numerical data都分不
出来,还让人作model,现在这些公司都是这么玩别人的吗?
m*********r
发帖数: 119
48
来自主题: DataSciences版 - 内推Indeed data sci/product sci position
楼主提供内推Indeed data scientist或者是product scientist positions。
简单说一下Indeed:
job search engine,二线IT公司,package不错,工作轻松,福利也还行,overall性
价比还是不错的。
公司美国locations有SF, Austin,Seattle.
公司内两个与data science有关系的职位一个是data scientist,另外一个是product
scientist。
data scientist:要求会machine learning, stats, coding, 侧重engineer,类似于
其他公司的machine learning engineer;
product scientist:也要求会ML,stats,coding要求小一些,工作重点偏向于
product impact,类似于其他公司的analytical data scientist。
目前公司2019还有很多headcout,非senior level最好是master有1-2年工作经验或者
是Phd,同时也在大... 阅读全帖
首页 上页 1 2 3 4 5 6 7 8 9 10 下页 末页 (共10页)