由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
DataSciences版 - Some thoughts on data science and data scientists
相关主题
给大家看个好文章,一个Data Scientist需要啥一个困扰我一段时间的问题:big data为什么要搞ml那些algorithm?
data scientists 都要求 PhD吗?洛杉矶LA求Data Science/Stat 相关工作内推,有经验
请教一个Big Data/Analysis 方面的设计问题Data Analyst/Statistician 工作求内推
Fresh Grad求助:该不该接这个contractor (转载)下周面A和L的data scientist and data engineer. 有没有面经?一般问些啥?
Re: 攒人品,发Google Statistician/Data Scientist电面面经谁敢自称data scientist?
Coursera上拿到了Data Science的certificate,可以找什么样的工作恭喜新版成立。什么背景的人会成为data scientist
内推Google - Data Scientist & Quantitative Analyst[转载] Data Scientists专业要求
是选statistician 还是data warehouse ETL?Data Scientist Subway Map
相关话题的讨论汇总
话题: data话题: science话题: scientists话题: volumes话题: inference
进入DataSciences版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
Below are just some of my personal opinions, please don't take them
personally :)
1. Data science is a very broad term. If I dare to put down a definition,
the fundamental question for data science should be:
Are we really doing what we thinking we are doing?
In formal words, data science is the science of measuring inference from
data. Not only inference, but also the confidence of such inference.
Data scientists are most concerned about what we don't know (e.g. data
quality, panel bias, model validity, etc), and this is exactly why we are
called scientist.
An analogy is that software engineers are most concerned about what hasn't
happened yet (e.g. site reliability, scalability, etc).
2. My definition is closer to that of statistics, although statisticians
seldom need to worry about too much (dirty, unstructured, unlabeled) data.
Under this definition, many data scientist positions are actually for
analysts and engineers, because they only care about inference or
reliability, rather than confidence and validity.
Specifically, by the nature of input data:
Statisticians work on small volumes of clean data, likely with lots of
assumptions, likely from academic literature;
Data analysts work on small volumes of dirty data, not knowing how to clean
data and making assumptions mostly from business knowledge;
Data engineers work on large volumes of clean data, likely structured for
query and display;
Data scientists work on large volumes of dirty data, likely unstructured and
unlabeled.
3. The key questions a data scientist working in business settings should
ask:
Do we have well defined questions?
Do we have truthfully labeled data?
Do we have unbiased panel?
Features and models are secondary to questions and data. Specifically, the
first steps of research should be to ask the right questions and decide the
level and unit of analysis.
Essentially, a data scientist need skills from business, science and
engineering, which basically cover three functional roles:
A data architect,
A solution architect,
A software architect,
This is exactly why many data scientists are under unreasonable expectation
and enormous stress.
d****n
发帖数: 12461
2
版主厉害。
好吧,我能吐槽data science里面有一半时间是在data mangling吗?
c***z
发帖数: 6348
3
多谢前辈捧场
一半时间还好啦,我是80% :(
剩下20%是fit curve,挺没意思的

【在 d****n 的大作中提到】
: 版主厉害。
: 好吧,我能吐槽data science里面有一半时间是在data mangling吗?

1 (共1页)
进入DataSciences版参与讨论
相关主题
Data Scientist Subway MapRe: 攒人品,发Google Statistician/Data Scientist电面面经
Job trends of data scientist from indeedCoursera上拿到了Data Science的certificate,可以找什么样的工作
求data analysis/engineer/scientist intern的面试经验及就业方向指导 谢谢!内推Google - Data Scientist & Quantitative Analyst
请教LinkedIn Data Scientist's technical phone screen (转载)是选statistician 还是data warehouse ETL?
给大家看个好文章,一个Data Scientist需要啥一个困扰我一段时间的问题:big data为什么要搞ml那些algorithm?
data scientists 都要求 PhD吗?洛杉矶LA求Data Science/Stat 相关工作内推,有经验
请教一个Big Data/Analysis 方面的设计问题Data Analyst/Statistician 工作求内推
Fresh Grad求助:该不该接这个contractor (转载)下周面A和L的data scientist and data engineer. 有没有面经?一般问些啥?
相关话题的讨论汇总
话题: data话题: science话题: scientists话题: volumes话题: inference