由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
DataSciences版 - 工作中遇到的一个现象,问问大家怎么解释 (转载)
相关主题
问一个关于clustering analysis的问题一道药厂computational biology的面试题
[Data Science Project Case] Data Monitoring[Data Science Project Case] Bias Correction
一道面试题,向本版求教一下。look alike model 有什么学习资料吗?
问一道面试题有人面过square吗?
我现在有一个15个variable的回归模型。 有什么系统性的方法去大家电话面试都怎么准备的啊
刚入行新人的两个问题求Uber、Airbnb、Square的Data Scientist面经
关于data preprocessing的问题求教为什么要知道DETAILS OF A MACHINE LEARNING ALGORITHM
predict的时候对于test data,要不要standardized?如何evaluate an unsupervised learning method?
相关话题的讨论汇总
话题: model话题: rmse话题: cart话题: old话题: bought
进入DataSciences版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
【 以下文字转载自 Statistics 讨论区 】
发信人: chaoz (面朝大海,吃碗凉皮), 信区: Statistics
标 题: 工作中遇到的一个现象,问问大家怎么解释
发信站: BBS 未名空间站 (Sat Mar 22 17:38:16 2014, 美东)
就是我们公司有一个model,预测网上shopping cart里面会有什么东西
我改进的model比老model RMSE要低
但是老model把cart size normalize to 1,虽然没什么依据
于是让我把新model也normalize,结果RMSE显示老model好
虽然我解释说因为normalization distorted data at local level所以RMSE不再是有
效的指示,并且用了US coast line 和 加州 coast line谁更长做例子,但是自己感觉
不是很hit the point
大家有什么idea么?谢谢啦
M*Q
发帖数: 54
2
RMSE算的是shopping cart里东西的个数?
所以还要先predict什么会出现在shopping cart里面吗?

★ 发自iPhone App: ChineseWeb 8.6

【在 c***z 的大作中提到】
: 【 以下文字转载自 Statistics 讨论区 】
: 发信人: chaoz (面朝大海,吃碗凉皮), 信区: Statistics
: 标 题: 工作中遇到的一个现象,问问大家怎么解释
: 发信站: BBS 未名空间站 (Sat Mar 22 17:38:16 2014, 美东)
: 就是我们公司有一个model,预测网上shopping cart里面会有什么东西
: 我改进的model比老model RMSE要低
: 但是老model把cart size normalize to 1,虽然没什么依据
: 于是让我把新model也normalize,结果RMSE显示老model好
: 虽然我解释说因为normalization distorted data at local level所以RMSE不再是有
: 效的指示,并且用了US coast line 和 加州 coast line谁更长做例子,但是自己感觉

d****n
发帖数: 12461
3
So your model outperforms the old model if cart is large but underperforms
the old model when cart is small.
(or it could be totally the other way)

【在 c***z 的大作中提到】
: 【 以下文字转载自 Statistics 讨论区 】
: 发信人: chaoz (面朝大海,吃碗凉皮), 信区: Statistics
: 标 题: 工作中遇到的一个现象,问问大家怎么解释
: 发信站: BBS 未名空间站 (Sat Mar 22 17:38:16 2014, 美东)
: 就是我们公司有一个model,预测网上shopping cart里面会有什么东西
: 我改进的model比老model RMSE要低
: 但是老model把cart size normalize to 1,虽然没什么依据
: 于是让我把新model也normalize,结果RMSE显示老model好
: 虽然我解释说因为normalization distorted data at local level所以RMSE不再是有
: 效的指示,并且用了US coast line 和 加州 coast line谁更长做例子,但是自己感觉

c***z
发帖数: 6348
4
My model (a decision tree) outperforms the old model if the unit of analysis
is items bought
(it should, since the old model predicts that everything the person viewed
is bought)
the old model outperforms when we fix the cart size to be 1 (the old model
then predicts that 1/n of each item viewed is bought, where n = number of
items viewed)
I am not comfortable about fixing the cart size to be 1 at the first place...

【在 d****n 的大作中提到】
: So your model outperforms the old model if cart is large but underperforms
: the old model when cart is small.
: (or it could be totally the other way)

d****n
发帖数: 12461
5
找最终的metric,例如revenue之类的。

analysis
...

【在 c***z 的大作中提到】
: My model (a decision tree) outperforms the old model if the unit of analysis
: is items bought
: (it should, since the old model predicts that everything the person viewed
: is bought)
: the old model outperforms when we fix the cart size to be 1 (the old model
: then predicts that 1/n of each item viewed is bought, where n = number of
: items viewed)
: I am not comfortable about fixing the cart size to be 1 at the first place...

M*Q
发帖数: 54
6
I'm confused about the old model. Why doest it predict everything viewed is
bought in one case, and predict 1/n of items viewed are bought? How could
this happend?
Just a guess (I'm still confused), your model outperforms if we're
interested in the number of items bought, but underperforms if we're
interested in the probability of purchasing.

analysis
...

【在 c***z 的大作中提到】
: My model (a decision tree) outperforms the old model if the unit of analysis
: is items bought
: (it should, since the old model predicts that everything the person viewed
: is bought)
: the old model outperforms when we fix the cart size to be 1 (the old model
: then predicts that 1/n of each item viewed is bought, where n = number of
: items viewed)
: I am not comfortable about fixing the cart size to be 1 at the first place...

h********3
发帖数: 2075
7
你怎么做normalization的?

【在 c***z 的大作中提到】
: My model (a decision tree) outperforms the old model if the unit of analysis
: is items bought
: (it should, since the old model predicts that everything the person viewed
: is bought)
: the old model outperforms when we fix the cart size to be 1 (the old model
: then predicts that 1/n of each item viewed is bought, where n = number of
: items viewed)
: I am not comfortable about fixing the cart size to be 1 at the first place...

c****t
发帖数: 19049
8
没看懂,好像大家也没看懂。不想用squared errors就用别的,关键是要能用1,2个数
字表达你想用的标准,否则business people就不会买账。你不是做digital marketing
的吗,别用什么海岸线做例子。
squared errors这东东30年前做统计理论和决策理论研究的就批过了,但是没办法,好
用又只有一个数。好多纯CS出身做machine learning还把这当benchmark呢
c***z
发帖数: 6348
9
Thank you all so much for the inputs! As always, you guys are most helpful!
For some more context, we are trying to predict conversion based on page
view. The old model says everything will be bought, and then normalize so
that the cart size is one.
The new tree model tries to predict individual conversion rate. I played
with the tree model a little so that it outperforms the old model even after
normalizing.
I think it is the weighting (normalization) that distorted the data and RMSE.
I agree that RMSE is not perfect and the best way is to compare final data
such as market share with real data. However we are not confident about the
quantities yet (just the yes/no about purchase).
The next step is definitely to include more features into the model, as well
as use a output that is closer to the final product.
Please shoot any additional question and I will be very glad to discuss.
c***z
发帖数: 6348
10
We are not at the quantities yet, just the yes/no purchase decisions.

【在 M*Q 的大作中提到】
: RMSE算的是shopping cart里东西的个数?
: 所以还要先predict什么会出现在shopping cart里面吗?
:
: ★ 发自iPhone App: ChineseWeb 8.6

c***z
发帖数: 6348
11
I think the other way. But I don't know why...

【在 d****n 的大作中提到】
: So your model outperforms the old model if cart is large but underperforms
: the old model when cart is small.
: (or it could be totally the other way)

c***z
发帖数: 6348
12
That would be the plan.
We don't have all the data needed for that yet...

【在 d****n 的大作中提到】
: 找最终的metric,例如revenue之类的。
:
: analysis
: ...

c***z
发帖数: 6348
13
In the old model, if one viewed 5 products before checking out, we guess he
bought all of them, that is 0.2 if we fix the cart size to 1.

is

【在 M*Q 的大作中提到】
: I'm confused about the old model. Why doest it predict everything viewed is
: bought in one case, and predict 1/n of items viewed are bought? How could
: this happend?
: Just a guess (I'm still confused), your model outperforms if we're
: interested in the number of items bought, but underperforms if we're
: interested in the probability of purchasing.
:
: analysis
: ...

c***z
发帖数: 6348
14
Points taken.
I can use the confusion matrix if I let the tree output binary instead of a
probability :)

marketing

【在 c****t 的大作中提到】
: 没看懂,好像大家也没看懂。不想用squared errors就用别的,关键是要能用1,2个数
: 字表达你想用的标准,否则business people就不会买账。你不是做digital marketing
: 的吗,别用什么海岸线做例子。
: squared errors这东东30年前做统计理论和决策理论研究的就批过了,但是没办法,好
: 用又只有一个数。好多纯CS出身做machine learning还把这当benchmark呢

1 (共1页)
进入DataSciences版参与讨论
相关主题
如何evaluate an unsupervised learning method?我现在有一个15个variable的回归模型。 有什么系统性的方法去
SAS PROC VARCLUS 问题求救 (转载)刚入行新人的两个问题
最简单机器学习问题求教关于data preprocessing的问题求教
求教linear regression的一道面试题predict的时候对于test data,要不要standardized?
问一个关于clustering analysis的问题一道药厂computational biology的面试题
[Data Science Project Case] Data Monitoring[Data Science Project Case] Bias Correction
一道面试题,向本版求教一下。look alike model 有什么学习资料吗?
问一道面试题有人面过square吗?
相关话题的讨论汇总
话题: model话题: rmse话题: cart话题: old话题: bought