第5页 - 关于auc的讨论汇总 - 话题女王

k****i
发帖数: 347

来自主题: Statistics版 - 谁来解释解释c-statistic为什么等于AUC

到底有人能解释么

w*******9
发帖数: 1433

来自主题: Statistics版 - 谁来解释解释c-statistic为什么等于AUC

The guys above said they are just the same thing under different names, so
if you could share with us the version of c stat you saw, somebody might be
able to figure out the link.

g***l
发帖数: 22

来自主题: Statistics版 - 谁来解释解释c-statistic为什么等于AUC

想象把ROC plot平行切割成小矩形就可以了。

什么

c*******g
发帖数: 695

来自主题: Statistics版 - 请问一下log transform的变量怎么算SD

有一些PK 的变量比如AUC CL Cmax之类的
计算平均值都是做log transform之后来算
或者算geometric mean
请问这样的变量的SD应该怎么算？
谢谢

D**g
发帖数: 739

来自主题: Statistics版 - How to express cut-off value

Did you use logistic regression? Why not use ROC curve to see if you have
satisfied AUC, and then use max(specificity+sensitivity) to define your cut-
off value?

i*****c
发帖数: 1322

来自主题: Statistics版 - How to express cut-off value

I used R but got ROC using Origin.The measurement is for screening and the
AUC is larger than 0.7.
You help me a lot! Baozi to thank you:)

i*****c
发帖数: 1322

来自主题: Statistics版 - Another question about ROC

From ROC, a test is not good (AUC < 0.6). But a specific value distinguishes
2 groups with OR significant. The fisher exact test is 0.06. How to
evaluate this specific value in this measurement? Is it useful? Many thanks.

i*****c
发帖数: 1322

来自主题: Statistics版 - R help: Direction of ROC in R

I'm checking several tests for a disease condition. When a criterion in a
test is ">" for disease, R gave me the same curve as I got from Origin. When
the criterion is "<" for disease, the peak of the curve is opposite to the
one Origin created, that means the AUC < 0.5. The Origin curve is correct,
which fits my se and sp data. The pROC manual told me to use the direction
function but it didn't work - it said that direction is not recognized. Can'
t figure out what's wrong. Can anyone help me? ... 阅读全帖

w**********y
发帖数: 1691

来自主题: Statistics版 - sensitivity and specificity

google AUC or ROC..
I guess ROC is popular in biology papers.

m*********n
发帖数: 413

来自主题: Statistics版 - sensitivity and specificity

ROC “or” AUC ？？

D******n
发帖数: 2836

来自主题: Statistics版 - multinomial logit model question

ROC concept can be easily extended to more than 2 dimentions.
When its 3D, AUC becomes VUS(volumn under the roc surface).

w*******9
发帖数: 1433

来自主题: Statistics版 - 请教一个logistic regression的问题

Sorry I wish I could help you and I will be very happy if someone could
teach both of us on this point.
My understanding is that R^2 and numerous adjusted/generalized R^2s are all
designed with the desire to assess the goodness of fit by a unitless number
between 0 and 1. They are generally not associated with the proportion of
variation. Only in OLS (with unknown intercept) the R^2 is interpreted as
the proportion of variation explained off by the covariates.
Since there are more popular goodne... 阅读全帖

w*********m
发帖数: 4740

来自主题: Statistics版 - 如何evaluate对binomial distribution的预测模型

我试了auc，用的roc
假设success是positive class, fail是negative class，
预测的是 class prob, 然后用threshold prob从0到1逐渐增加来看true positive和
false positive的比例
比较两个模型，A的roc比B好，但A预测的class prob普遍比B高，A预测的平均positive
class prob比B高，而A和B都比test data的average positive class prob高
又试了 sum of square residuals 和 sum of absolute value of residuals, B的都
比A的小
这种情况下到底哪个模型更好呀
郁闷

w*******9
发帖数: 1433

来自主题: Statistics版 - 问个logistic model的面试问题

你们用ROC或者其衍生的AUC吗？

A*******s
发帖数: 3942

来自主题: Statistics版 - 为啥做了segmentation后模型fit更差？

明白了，其实根源是你不应该拿每个segment单独的performance，和segmentation之前
基于整个sample的performance来做比较，这个根本就是apple vs. orange。
要拿任何一个statistic（r^2, adj r^2， AIC， BIC， AUC， whatever）来指导
model selection，这些statistics都是对同一个sample得出来的才有意义。

p*******i
发帖数: 1181

来自主题: Statistics版 - Fraud detection model 在testing dataset 中效果很差，求原因

Model调的不好吧~ 我见过的Model在audit period的AUC要达到0.9是最低要求 = =

d******e
发帖数: 7844

来自主题: Statistics版 - Fraud detection model 在testing dataset 中效果很差，求原因

rare event看AUC根本没用。
一定要细比precition和recall

l******n
发帖数: 9344

来自主题: Statistics版 - how to increase AUC?

数据不变，有什么办法？
3xs

A*****a
发帖数: 1091

来自主题: Statistics版 - how to increase AUC?

你可以看看你的x需不需要做转换（log转换），如果不是太normallydistributed。

l******n
发帖数: 9344

来自主题: Statistics版 - how to increase AUC?

没啥用，只好不停的试不同的算法，不过也不能保证

D******n
发帖数: 2836

来自主题: Statistics版 - logistics reg 怎么看varibale 的correlation

correlation matrix 看的是 predictors之间的关系吧。
如果你问的是binary variable跟continous ariable之间的关系，看哪个measure。
你仍可以看correlation coefficient，然后就可以看KS， AUC。

f*******n
发帖数: 2665

来自主题: Statistics版 - 问一个关于R 的问题

我在SAS里可以写这样一个macro, modelscore(model, outputscore)用来评价不同的
model,然后调用，%modelscore(model1, outputscore1)，%modelscore(model2,
outputscore2)。但不知在R里怎么做.
这里model 就是一个之前modeling产生的object, 比如model1<-glm(...)，
outputscore其实包含若干的统计值，比如AUC，KS等。但如何产生outputscore1和
outputscore2这些objects, 并save到global environment？

f***l
发帖数: 117

来自主题: Statistics版 - 如何做ordinal logistic regression的validation？

SAS文档中讲一般logistic regression model的validation方法是用ROC（AUC），但对
ordinal logistic regression model好像不适用，请问该怎么validate这种模型呢？
Thanks!

A*******s
发帖数: 3942

来自主题: Statistics版 - ROC curve可以用来比较变量吗

比较单变量的AUC等价于wilcoxon test，算nonparametric方法, 看的是ranking
ability
你说的P value应该是wald test吧，算parametric方法，看的是有多符合线性假设。

dependent

F8
发帖数: 348

来自主题: Statistics版 - ROC curve可以用来比较变量吗

AUC is good for discrimination rather than prediction

g******2
发帖数: 234

来自主题: Statistics版 - 求教一个模型/预测问题

what metric did you use to evaluate performance? AUC or Mismatch%?
Are your data highly unbalanced, i.e. most customer renewed? Did the renew
proportion change a lot in the recent 2 months?

Z*******n
发帖数: 694

来自主题: Statistics版 - 求教一个模型/预测问题

I use AUC as a performance metric.
Unfortunately I cannot disclose the renewal rate (because of business
confidentiality) -- but it is greater than 50% (i.e. more than half of the
contracts renewed), but not close to 100% (below 90%).
The renewal proportion fluctuates from month to month, but not greatly, and
I cannot see any clear trend or seasonality.
The last 2 months (of the 14 months) had slightly lower renewal rate.

h*********n
发帖数: 278

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

Roc auc 85%,说是太高了，然后predicted 和 actual curves吻合的比较好，说是fit
too well了，肯定有什么问题。我真是糊涂了。之前我们group人非要求我fit as
perfectly as possible，然后transform了一下确实fit得不错了，跟别的group
present的时候，人家却提出这样的疑问，然后我们组的人又觉得是不是做错什么了，
要查。真是莫名奇妙，快气死了。

s*********h
发帖数: 6288

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

如果有class var的一个值意味着大量event，那auc就会很高，不代表fit很好。HL多分
几个group看看fit

fit

s*******2
发帖数: 499

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

I do not think KS test makes a lot of sense here. Because the sample size is
very large. So it is easy to get a significant P value.
The spline may bring on a few variables. How many predictors are in your
model?
The cross-validated AUC needs to be evaluated. The MSE and cross-validated
MSE can be evaluated.

fit

h*********n
发帖数: 278

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

确实有一个class var impact很大，请问HL是什么的缩写？谢谢

如果有class var的一个值意味着大量event，那auc就会很高，不代表fit很好。HL多分
几个group看看fit

z******n
发帖数: 397

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

我觉得你的这个论断非常奇怪。对于特定的问题，AUC 99%也有可能，怎么能够光凭这
个数字就说非常可疑？我猜想楼主涉及的建模问题可能业界值通常很低，比如0.6什么
的，一下子提高这么多，用的变量和通常用的又大体相同，reviewer才会有此评论。至
于KS，我从来不用，也不是什么不可或缺的东西
另外别人问你什么时KS的时候，给个链接就行了，没必要充大尾巴狼，没意思

h*********n
发帖数: 278

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

多谢，我后来发现记错了，是81.5%，不是85%。确实我刚进入这个行业/公司不久，也
不知道他们以前的model都是什么样的，但我这是一个新的model/data structure。上
来问就是想知道是不是有什么行业的标准，是否大家看到的这样的model fit第一反应
都会是too good了。我在网上google了一下，貌似有几个网站提到AUC的标准都是90%以
上才是excellent呀。

l*****t
发帖数: 8319

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

auc在.81的ks估计在.5到 .6左右。。。不算高。。说不上too good

range

h*********n
发帖数: 278

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

我前面有个贴里有run出来的ks，是0.25. 跟0.5-0.6好像差得有点远，这个是怎么回事？
另外，如果从auc就能算出来ks，那为啥还要同时看这俩呢？
KS Two-Sample Test (Asymptotic):
KS 0.25
KSa 227.28
D 0.5
Pr > KSa: <.0001

f*******3
发帖数: 206

来自主题: Statistics版 - 有没有人被批评过model too good to be true?

工作里面都不用auc，roc，因为他们对rare class的model prediction都出乎意料的好
。现在组里只看precision-recall curve。

C******t
发帖数: 72

来自主题: Statistics版 - approval rate report

You may use AUC or Sommer'd to track the model performance change.

l******t
发帖数: 96

来自主题: Statistics版 - 面了一个IT公司跟机器学习相关的职位

这种情况下accuracy没有意义吧
都用precision, recall, auc来评价好坏了吧

m****v
发帖数: 780

来自主题: Statistics版 - 面了一个IT公司跟机器学习相关的职位

ranking的话auc用得多

n*****3
发帖数: 66

来自主题: Statistics版 - How to calculate Area Under the Effect Curve(AUEC)?

I suspect it is the same as Area Under the Curve (AUC), but cann't prove it.
Can someone here help explain what is Area Under the Effect Curve?

o****o
发帖数: 8077

来自主题: Statistics版 - How to calculate Area Under the Effect Curve(AUEC)?

你从谁哪儿听到的这个term，应该直接问那个人：是否这个就是指AUC？

M*P
发帖数: 6456

来自主题: Statistics版 - 请问AUC和accuracy不是等价的吧

问问

b*******t
发帖数: 390

来自主题: Statistics版 - Logistic model中 ROC曲线里面AUC 值太低怎么办？

怎么没人回复？自己顶！

b*******t
发帖数: 390

来自主题: Statistics版 - Logistic model中 ROC曲线里面AUC 值太低怎么办？

谢谢你的回复。
确实有可能有其他影响因素没有包括在内。
Odds ratio 有些变量有1.5，还是比较高的。
做过诊断图后，我现在发现可能主要是因为有比较多的outliers，或者influence
values。
但是这个好像不好调整，因为sample size挺大的。

t*****a
发帖数: 459

来自主题: Statistics版 - sensitivity and specificity

1. 可以用类似比较AUC的方法（binary的图有另外个名字忘了叫啥了）。不过确实比较
扯。这个主要还是要看诊断目的是更重视多发现病案还是多避免false positive，和病
的严重程度等都有关。
2. 可以用regression的方法， Margaret Sullivan Pepe的书，the statistical
evaluation of medical tests for classification and prediction里有讲。这本书
是研究diagnosis很经典的一本书。
3. 就是mecneymar, 可以古狗一下，Pepe那本书里也讲了。

w*******9
发帖数: 1433

来自主题: Statistics版 - ks 只有28%

没用过KS, 请教KS多大算不错? 一般对应于AUC in (0.7, 0.9)的KS的大概范围是多少？

b********1
发帖数: 291

来自主题: Statistics版 - 建模型，最后一步发现classification table不均匀，和解？

嗯。我回去再看看。你们做模型， auc一般得多少才算通过？

E**********e
发帖数: 1736

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

我现在在一个小的私人公司做risk modeling才半年多。前半年觉得自己做得很不错。
可是现在越觉得有很多问题很疑惑，现在抛出来，请有经验的大侠指导。
公司是做loan lending的小公司，比较新，积累的charge off 数据4000不到，这个
跟大银行动辄一两个million 的数据不一样。 modeling的数据不是很好。我就不自爆
奇丑了，主要表现是training 和test的AUC差别很大，有很大overfitting。
现在问题来了。假设数据分成三个部分，数据一是training，数据二是test，数据三
是holdout。holdout 类似于future data，用来测试最后model 表现。所以这部分数
据只能在建模完后才拿出来。建模前是绝对不是偷看，防止数据“泄露”到modeling
过程。
我的主要问题是怎么预先选初始变量。我原先理解就是用数据一和二，初选个100左右
的变量，很多modeling的书谈到bivariate analysis，算pvalue， spearman
correlation，还有是么clustering，等等。然... 阅读全帖

n**********0
发帖数: 66

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

我没做过LZ的方向,但是做过一些marketing的分析，选变量的时候我会筛选一些先，踢
出去那些就算选了也很难解释的。我们这边需要最后make sense out of it,那些没法
解释的就略过了。然后就是仁者见仁智者见智了，方法也很多，一般如果AUC增加的不
多了，不要超过20个variable吧我觉得，特别是你sample size不大，另外hold out可
以小点我觉得，这样你可以多些样本。testing也不是必须的吧，可以做in-sample
cross validation

E**********e
发帖数: 1736

来自主题: Statistics版 - 做credit risk scorecard的朋友们，请进来，有问题求教

是的。现在已经用regulation了。问题是每次cross validation，进入的变量都变
。 AUC提高了点，问题是原先的变量是基于数据一和二选的。可能还是有bias，用到
新的数据会performancwe 不好。
现在的问题就是怎么unsupervised的预选重要的变量。不知大银行的modeler是基于
pvalue， IV，clustering 来预选变量吗？就像那几本modeling的书讲的一样。

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天