由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - 退休统计显著性(Retire SS)
相关主题
Stat should try to find a job in Wall Street求推荐一下Bayesian入门教材
求助:hierarchical linear model 的问题从生统转political science的可行性和前景如何?
Job opening一般如何证明MCMC算法得到的sample是来自targeted distribution的呢?
关于统计和CS的询问,请大家发表点意见,谢谢!牛津贝叶斯杀手
请教 stat phd,finance phd诚求各位师兄师姐指导phd offer选择
20个包子,求Bayesian Data Analysis 第二版 Gelman && Stern的全部答案问一个scorecard问题
用手积分一个hierarchical bayesian model。有人可以帮我推荐一些bayesian analysis的书么?
哥伦比亚大学统计学图书在中国被禁 (转载)问一个关于Gibbs Sampler convergence的问题
相关话题的讨论汇总
话题: scholar话题: google话题: article话题: am话题: error
进入Statistics版参与讨论
1 (共1页)
T*******I
发帖数: 5138
1
Retire statistical significance
让统计显著性退场
(Scientists rise up against statistical significance)
科学家们起来反抗统计显著性
(翻译:Google Translate/陈立功)
Valentin Amrhein, Sander Greenland, Blake McShane and more than 800
signatories call for an end to hyped claims and the dismissal of possibly
crucial effects.
Nature:20 MARCH 2019 Nature 567, 305-307 (2019).
When was the last time you heard a seminar speaker claim there was ‘
no difference’ between two groups because the difference was ‘
statistically non-significant’?
请问你最近一次在研讨会上听到有发言人称“由于无统计显著性,所以两组之间
没有‘差异’”是什么时候?
If your experience matches ours, there’s a good chance that this
happened at the last talk you attended. We hope that at least someone in the
audience was perplexed if, as frequently happens, a plot or table showed
that there actually was a difference.
如果你和我们有过相似的经历,那么你很可能在上次参加的演讲中遇到过这种会
经常发生的事情。我们希望至少有一些听众会感到某种困惑:一个图示或表格分明显示
存在着某种差异,为何被说成没有呢?
How do statistics so often lead scientists to deny differences that
those not educated in statistics can plainly see? For several generations,
researchers have been warned that a statistically non-significant result
does not ‘prove’ the null hypothesis (the hypothesis that there is no
difference between groups or no effect of a treatment on some measured
outcome)1. Nor do statistically significant results ‘prove’ some other
hypothesis. Such misconceptions have famously warped the literature with
overstated claims and, less famously, led to claims of conflicts between
studies where none exists.
统计学如何引导科学家经常否认那些未经统计学教育的人能明白看到的差异?历
代统计学家一直在告诫应用研究者们,一个无统计显着性的结果并不能“证明”无效假
设(假设组间差异为零或治疗对某测量结果无影响)1。有统计显著性的结果也没有“
证明”某些其它假设。伴随着某种夸大其词的说法,这种误解已经显著地歪曲了文献所
报告的内容,还有一些则不那么显著地引发了不同研究结果之间不存在冲突的主张。
We have some proposals to keep scientists from falling prey to these
misconceptions.
我们有一些建议让科学家避免成为这些误解的牺牲品。
Pervasive problem普遍存在的问题
Let’s be clear about what must stop: we should never conclude there
is ‘no difference’ or ‘no association’ just because a P value is larger
than a threshold such as 0.05 or, equivalently, because a confidence
interval includes zero. Neither should we conclude that two studies conflict
because one had a statistically significant result and the other did not.
These errors waste research efforts and misinform policy decisions.
让我们首先搞清楚必须停止什么:我们永远不应该仅仅由于P值大于0.05的阈值
(或者等价地说,由于置信区间包括零)而得出“没有差异”或“没有关联”的结论。
如果一项研究结果有统计学意义而另一项没有,我们也不应该因此就断言它们之间存在
着冲突。这样的错误会浪费研究工作并误导政策决策。
For example, consider a series of analyses of unintended effects of
anti-inflammatory drugs2. Because their results were statistically non-
significant, one set of researchers concluded that exposure to the drugs was
“not associated” with new-onset atrial fibrillation (the most common
disturbance to heart rhythm) and that the results stood in contrast to those
from an earlier study with a statistically significant outcome.
例如,考虑对抗炎药的副作用进行一系列分析2。因为它们的结果无统计显着性
,一组研究人员便得出结论,认为接触这些药物与新发房颤(最常见的心律紊乱)“无
关”,这与早期有统计显著性的研究结果相反。
Now, let’s look at the actual data. The researchers describing their
statistically non-significant results found a risk ratio of 1.2 (that is, a
20% greater risk in exposed patients relative to unexposed ones). They also
found a 95% confidence interval that spanned everything from a trifling risk
decrease of 3% to a considerable risk increase of 48% (P = 0.
091; our calculation). The researchers from the earlier, statistically
significant, study found the exact same risk ratio of 1.2. That study was
simply more precise, with an interval spanning from 9% to 33% greater risk (
P = 0.0003; our calculation).
现在,让我们看看实际数据。研究人员描述了其无统计显着性的结果,发现风险
比为1.2(即使用抗炎药的人群相对于不使用者的风险增加20%),但95%的置信区间
跨越了从微不足道的风险降低3%到相当大的风险增加48%(P = 0.091,我们的计算)
。他们在其早期的一个同类研究数据中得到过完全相同的风险比1.2,以及一个更精确
的风险区间:9%~33%(P = 0.0003,我们的计算)。
It is ludicrous to conclude that the statistically non-significant
results showed “no association”, when the interval estimate included
serious risk increases; it is equally absurd to claim these results were in
contrast with the earlier results showing an identical observed effect. Yet
these common practices show how reliance on thresholds of statistical
significance can mislead us (see ‘Beware false conclusions’).
当区间估计包括严重的风险增加时,如果以统计上无显着性就否认着两者之间的
“关联性”未免有点荒谬。如果认为这一结果与显示相同观察效果的早期结果形成了某
种对立也同样是荒谬的。然而,这些常见的做法表明,依赖统计显着性的阈值会误导我
们(参见“谨防错误结论”)。
These and similar errors are widespread. Surveys of hundreds of
articles have found that statistically non-significant results are
interpreted as indicating ‘no difference’ or ‘no effect’ in around half
(see ‘Wrong interpretations’ and Supplementary Information).
这些和类似的错误很普遍。对数百篇文章的调查发现,无统计显着性的结果被解
释为“无差异”或“无影响”的约占一半(参见“错误的解释”和补充信息)。
In 2016, the American Statistical Association released a statement in
The American Statistician warning against the misuse of statistical
significance and P values. The issue also included many commentaries on the
subject. This month, a special issue in the same journal attempts to push
these reforms further. It presents more than 40 papers on ‘Statistical
inference in the 21st century: a world beyond P < 0.05’. The editors
introduce the collection with the caution “don’t say ‘statistically
significant’”3. Another article4 with dozens of signatories also calls on
authors and journal editors to disavow those terms.
2016年,美国统计学会在《美国统计学家》上对滥用统计显着性和P值发出警告
。该问题还包括许多关于这一主题的评论。本月,该刊在其特刊上发表了40多篇关于“
21世纪统计推断:超越P <0.05的世界”的论文,试图进一步推动这一改革。编辑们在
介绍这一系列文章时,谨慎地表示“不要说‘有统计显著性’”3。另有一篇文章以及
数十个签署者也呼吁作者们和期刊编辑应拒绝使用这些术语4。
We agree, and call for the entire concept of statistical significance
to be abandoned.
我们同意,并呼吁放弃整个统计显著性的概念。
We are far from alone. When we invited others to read a draft of this
comment and sign their names if they concurred with our message, 250 did so
within the first 24 hours. A week later, we had more than 800 signatories —
all checked for an academic affiliation or other indication of present or
past work in a field that depends on statistical modelling (see the list and
final count of signatories in the Supplementary Information). These include
statisticians, clinical and medical researchers, biologists and
psychologists from more than 50 countries and across all continents except
Antarctica. One advocate called it a “surgical strike against thoughtless
testing of statistical significance” and “an opportunity to register your
voice in favour of better scientific practices”.
我们不是在孤军奋战。当我们邀请其他人阅读本评论的草稿并以签名表示对我们
的认可时,有250人在最初的24小时就签了名。一周之后,签名者达到800人 ---- 所有
签名者都确认了其属于一个学术联盟或表明其当前或过去的工作领域依赖于统计建模(
参见补充信息中的签名名单和最终统计),涉及50多个国家和除南极洲以外的所有大陆
的统计学家、临床和医学研究人员、生物学家和心理学家。一位倡导者将其称为“一次
针对统计显著性之轻率检验的外科手术”,以及“一次为更好的科学实践发声的机会”。
We are not calling for a ban on P values. Nor are we saying they
cannot be used as a decision criterion in certain specialized applications (
such as determining whether a manufacturing process meets some quality-
control standard). And we are also not advocating for an anything-goes
situation, in which weak evidence suddenly becomes credible. Rather, and in
line with many others over the decades, we are calling for a stop to the use
of P values in the conventional, dichotomous way — to decide whether a
result refutes or supports a scientific hypothesis5.
我们并非要求禁用P值,既没说它不能被用于某些特殊场合(例如确定制造过程
是否符合某些质量控制标准)的决策标准,也没有提倡可将弱证据突然变得可信这种无
所事事的情形。相反,我们像几十年来的许多其他人那样,只是呼吁停止以传统的二分
法决定结果是否反驳或支持科学假设的方式来使用P值5。
Quit categorizing退出(取消)分类化
The trouble is human and cognitive more than it is statistical:
bucketing results into ‘statistically significant’ and ‘statistically non
-significant’ makes people think that the items assigned in that way are
categorically different6–8. The same problems are likely to arise under any
proposed statistical alternative that involves dichotomization, whether
frequentist, Bayesian or otherwise.
麻烦在于人类和认知而不是统计:将结果分为“统计上显着”和“统计上不显着
”使人们认为以这种方式得到的是完全不同的分类结果6–8。无论是频率主义者,还是
贝叶斯学派,或者任何其流派,他们提出的任何涉及二分法的统计替代方案都可能产生
同样的问题。
Unfortunately, the false belief that crossing the threshold of statistical
significance is enough to show that a result is ‘real’ has led scientists
and journal editors to privilege such results, thereby distorting the
literature. Statistically significant estimates are biased upwards in
magnitude and potentially to a large degree, whereas statistically non-
significant estimates are biased downwards in magnitude. Consequently, any
discussion that focuses on estimates chosen for their significance will be
biased. On top of this, the rigid focus on statistical significance
encourages researchers to choose data and methods that yield statistical
significance for some desired (or simply publishable) result, or that yield
statistical non-significance for an undesired result, such as potential side
effects of drugs — thereby invalidating conclusions.
不幸的是,一个错误的信念认为跨越了统计显著性门槛能足以表明结果的“真实
性”,这使得科学家和期刊编辑更钟情于这样的结果,从而扭曲了文献。统计上显著的
估计值会幅度向上地产生大的偏差,且潜在地达到了很大的程度,而那些统计上不显著
的估计值则会幅度向下地存在偏差。因此,任何侧重于其估计的显著性的讨论都会导致
偏倚。除此之外,对统计显著性的严格关注鼓励着研究人员选择数据和方法来获得对某
些期望的(或简单可发表的)结果的统计学意义,或者对那些不希望的结果产生统计上
的无意义,例如药物潜在的副作用----从而使结论无效。
The pre-registration of studies and a commitment to publish all
results of all analyses can do much to mitigate these issues. However, even
results from pre-registered studies can be biased by decisions invariably
left open in the analysis plan9. This occurs even with the best of
intentions.
承诺预先登记研究并公布所有分析的全部结果可以大大减轻上述问题。然而,即
使是预先登记的研究结果,也可能会因分析计划中始终存在的某种意念而产生偏见9。
即使有着最好的意图,也会发生这种情况。
Again, we are not advocating a ban on P values, confidence intervals
or other statistical measures — only that we should not treat them
categorically. This includes dichotomization as statistically significant or
not, as well as categorization based on other statistical measures such as
Bayes factors.
同样,我们并不主张禁止P值、置信区间或其它统计措施 ---- 我们只是认为不
应该区别有加地对待它们。这包括作为统计上显着或不显著的二分法,以及基于其它统
计测量(例如贝叶斯因子)的分类。
One reason to avoid such ‘dichotomania’ is that all statistics,
including P values and confidence intervals, naturally vary from study to
study, and often do so to a surprising degree. In fact, random variation
alone can easily lead to large disparities in P values, far beyond falling
just to either side of the 0.05 threshold. For example, even if researchers
could conduct two perfect replication studies of some genuine effect, each
with 80% power (chance) of achieving P < 0.05, it would not be
very surprising for one to obtain P < 0.01 and the other P 
;> 0.30. Whether a P value is small or large, caution is warranted.
避免这种“二分法之痴迷”的一个理由是,所有的统计数据,包括P值和置信区
间,在不同的研究之间自然会有所不同,并且通常会达到令人惊讶的程度。事实上,仅
仅随机变异就很容易导致很大的P值差异,远远超过0.05阈值的任何一侧。例如,即使
研究人员可以对一些真实效果进行两次完美的重复性研究,每次都有80%的效能(机会
)达到P<0.05,一个人获得P<0.01而另一个P> 0.30就不足为奇了。无论P值是小还是大
,都需要谨慎。
We must learn to embrace uncertainty. One practical way to do so is to
rename confidence intervals as ‘compatibility intervals’ and interpret
them in a way that avoids overconfidence. Specifically, we recommend that
authors describe the practical implications of all values inside the
interval, especially the observed effect (or point estimate) and the limits.
In doing so, they should remember that all the values between the interval
’s limits are reasonably compatible with the data, given the statistical
assumptions used to compute the interval7,10. Therefore, singling out one
particular value (such as the null value) in the interval as ‘shown’ makes
no sense.
我们必须学会接受不确定性。一种实用的方法是将置信区间重命名为“兼容区间
”,并以避免过度自信的方式解释它们。具体而言,我们建议作者描述区间内所有值的
实际含义,尤其是观察到的效应(或点估计)和限制。与此同时,他们应该牢记,在给
定用于计算区间的统计假设的情况下7,10,区间内的所有值都与数据合理地兼容。因此
,在区间中挑出一个特定值(例如空值)为“显示”是没有意义的。
We’re frankly sick of seeing such nonsensical ‘proofs of the null’
and claims of non-association in presentations, research articles, reviews
and instructional materials. An interval that contains the null value will
often also contain non-null values of high practical importance. That said,
if you deem all of the values inside the interval to be practically
unimportant, you might then be able to say something like ‘our results are
most compatible with no important effect’.
坦率地说,我们厌倦了在演示文稿、研究文章、评论和教学材料中看到的这种荒
谬的“无效证明”和非关联主张。一个无效值存在的区间通常还包含着具有高实用且重
要的非无效值,也就是说,如果您认为一个区间内的所有值实际上并不重要,那么您可
能会说“我们的结果最兼容而没有重要的效应”。
When talking about compatibility intervals, bear in mind four things.
First, just because the interval gives the values most compatible with the
data, given the assumptions, it doesn’t mean values outside it are
incompatible; they are just less compatible. In fact, values just outside
the interval do not differ substantively from those just inside the interval
. It is thus wrong to claim that an interval shows all possible values.
在谈论兼容区间时,请记住四件事。首先,仅仅因为在给定的假设下区间给出了
与数据最相容的值,它并不意味着它之外的值是不相容的;它们只是兼容性较差而已。
实际上,区间之外的值与区间内的值没有实质性差异。因此声称区间显示了所有可能的
值是错误的。
Second, not all values inside are equally compatible with the data, given
the assumptions. The point estimate is the most compatible, and values near
it are more compatible than those near the limits. This is why we urge
authors to discuss the point estimate, even when they have a large P value
or a wide interval, as well as discussing the limits of that interval. For
example, the authors above could have written: ‘Like a previous study, our
results suggest a 20% increase in risk of new-onset atrial fibrillation in
patients given the anti-inflammatory drugs. Nonetheless, a risk difference
ranging from a 3% decrease, a small negative association, to a 48% increase,
a substantial positive association, is also reasonably compatible with our
data, given our assumptions.’ Interpreting the point estimate, while
acknowledging its uncertainty, will keep you from making false declarations
of ‘no difference’, and from making overconfident claims.
其次,根据假设,区间内并非所有值都与数据同等兼容。点估计是最兼容的,其
附近的值比接近极限的值更兼容。这就是为什么我们敦促作者们讨论点估计,即使它们
具有较大的P值或较宽的区间,以及讨论该区间的极限。例如,上述作者可能写道:“
与以前的研究一样,我们的研究结果表明,给予抗炎药物的患者新发房颤的风险增加了
20%。尽管如此,根据我们的假设,风险差异从3%的减少,即小的负相关,到48%的
增长,即实质性正相关,也与我们的数据合理地相容。”解释点估计的同时承认其不确
定性,可避免做出“无差异”的虚假声明和过于自信的主张。
Third, like the 0.05 threshold from which it came, the default 95%
used to compute intervals is itself an arbitrary convention. It is based on
the false idea that there is a 95% chance that the computed interval itself
contains the true value, coupled with the vague feeling that this is a basis
for a confident decision. A different level can be justified, depending on
the application. And, as in the anti-inflammatory-drugs example, interval
estimates can perpetuate the problems of statistical significance when the
dichotomization they impose is treated as a scientific standard.
第三,与它所来自的0.05阈值一样,用于计算区间的默认95%本身就是一种任意
(不是任意的,而是为了使得结果具有足够的充分性,译者注)约定。它基于一种错误
(准确地说是一种可操作性,译者注)观点,即计算的区间本身有95%的可能性包含真
值,再加上模糊的感觉,这是一个自信决定的基础。根据应用,一个不同的水平是合理
的。并且,如在抗炎药物实例中,当它们施加的二分法被视为科学标准时,区间估计可
以使统计显着性的问题永久化。
Last, and most important of all, be humble: compatibility assessments
hinge on the correctness of the statistical assumptions used to compute the
interval. In practice, these assumptions are at best subject to considerable
uncertainty7,8,10. Make these assumptions as clear as possible and test the
ones you can, for example by plotting your data and by fitting alternative
models, and then reporting all results.
最后,最重要的是要保持谦虚:兼容性评估取决于用于计算区间的统计假设的正
确性。实际上,这些假设充其量都是不确定的7,8,10。应尽可能使假设得到清楚的表达
和检验,如绘制数据并拟合替代模型,并报告所有结果。
Whatever the statistics show, it is fine to suggest reasons for your
results, but discuss a range of potential explanations, not just favoured
ones. Inferences should be scientific, and that goes far beyond the merely
statistical. Factors such as background evidence, study design, data quality
and understanding of underlying mechanisms are often more important than
statistical measures such as P values or intervals.
无论统计数据显示什么,都可以找出有关结果的原因,但应讨论一系列潜在的而
不仅仅只是有利的解释。推论应该是科学的,且远远超出单纯的统计范畴。背景证据、
研究设计、数据质量和对潜在机制的理解等因素通常比统计测量(如P值或区间)更重
要。
The objection we hear most against retiring statistical significance
is that it is needed to make yes-or-no decisions. But for the choices often
required in regulatory, policy and business environments, decisions based on
the costs, benefits and likelihoods of all potential consequences always
beat those made based solely on statistical significance. Moreover, for
decisions about whether to pursue a research idea further, there is no
simple connection between a P value and the probable results of subsequent
studies.
我们听到的反对不再使用统计显著性的意见是因为需要作出是或否的决定。但对
于监管、政策和商业环境中经常所需的抉择,基于所有潜在后果的成本、收益和可能性
的决策总是优于单纯基于统计显着性的决策。此外,对于是否进一步追求研究思想的决
定,P值与后续研究的可能结果之间没有简单的联系。
What will retiring statistical significance look like? We hope that
methods sections and data tabulation will be more detailed and nuanced.
Authors will emphasize their estimates and the uncertainty in them — for
example, by explicitly discussing the lower and upper limits of their
intervals. They will not rely on significance tests. When P values are
reported, they will be given with sensible precision (for example, P =
 0.021 or P = 0.13) — without adornments such as stars or
letters to denote statistical significance and not as binary inequalities (
P< 0.05 or P> 0.05). Decisions to interpret or to publish
results will not be based on statistical thresholds. People will spend less
time with statistical software, and more time thinking.
不再使用统计显著性后会变成什么情形?我们希望方法部分和数据列表更加详尽
和细致。作者将强调他们的估计结果以及其中的不确定性 ---- 例如,明确讨论它们的
区间的下限和上限。他们不会依赖显著性检验。当报告P值时,它们将以合理的精度给
出(例如,P = 0.021或P = 0.13)---- 没有星形或字母等装饰来表示统计显着性,也
不是二元不等式(P <0.05或P> 0.05)的形式)。解释或发布结果的决定不会基于统计
阈值。人们花在统计软件上的时间会更少,而是用更多的时间去思考。
Our call to retire statistical significance and to use confidence
intervals as compatibility intervals is not a panacea. Although it will
eliminate many bad practices, it could well introduce new ones. Thus,
monitoring the literature for statistical abuses should be an ongoing
priority for the scientific community. But eradicating categorization will
help to halt overconfident claims, unwarranted declarations of ‘no
difference’ and absurd statements about ‘replication failure’ when the
results from the original and replication studies are highly compatible. The
misuse of statistical significance has done much harm to the scientific
community and those who rely on scientific advice. P values, intervals and
other statistical measures all have their place, but it’s time for
statistical significance to go.
我们要求不再使用统计显著性,并将置信区间理解为兼容区间并不是一剂灵丹妙
药。虽然它会消除许多不良做法,但也很可能导致新的不良后果。因此,监测文献中的
统计滥用应该是科学界一个持续的优先事项。但是,当原始和重复研究的结果高度兼容
时,根除分类将有助于避免过度自信的主张,无担保的“无差异”声明以及关于“重复
失败”的荒谬声明。滥用统计意义对科学界和依赖科学建议的人造成了很大的伤害。P
值、区间和其它统计测量都有它们的位置,但现在是与统计显著性告别的时候了。
References
1. Fisher, R. A. Nature 136, 474 (1935). Article Google Scholar
2. Schmidt, M. & Rothman, K. J. Int. J. Cardiol. 177, 1089–1090 (2014).
PubMed Article Google Scholar
3. Wasserstein, R. L., Schirm, A. & Lazar, N. A. Am. Stat. https://doi.org/
10.1080/00031305.2019.1583913 (2019). Article Google Scholar
4. Hurlbert, S. H., Levine, R. A. & Utts, J. Am. Stat. https://doi.org/10.
1080/00031305.2018.1543616 (2019). Article Google Scholar
5. Lehmann, E. L. Testing Statistical Hypotheses 2nd edn 70–71 (Springer,
1986).
6. Gigerenzer, G. Adv. Meth. Pract. Psychol. Sci. 1, 198–218 (2018).
Article Google Scholar
7. Greenland, S. Am. J. Epidemiol. 186, 639–645 (2017). PubMed Article
Google Scholar
8. McShane, B. B., Gal, D., Gelman, A., Robert, C. & Tackett, J. L. Am. Stat.
https://doi.org/10.1080/00031305.2018.1527253 (2019). Article Google
Scholar
9. Gelman, A. & Loken, E. Am. Sci. 102, 460–465 (2014). Article Google
Scholar
10. Amrhein, V., Trafimow, D. & Greenland, S. Am. Stat. https://doi.org/10.
1080/00031305.2018.1543137 (2019). Article Google Scholar Download
references
T*******I
发帖数: 5138
2
用统计处理数据不是为了要证明什么吧。就是个好玩的认知活动而已。
T*******I
发帖数: 5138
3
A letter to the co-author of “RSS” Blakeley McShane:
I have been reading your another paper “Abandon Statistical
Significance” all day today. It is a little bit hard to me to understand
your whole ideas due to the linguistic background. I have to borrow google-
translate.
I thought the statistical significance should not be a problem in the
logical system of statistics itself. The problem might be caused by a sort
of misunderstanding.
Let’s take the t-test as an example. We have two sample means x_bar1 and x_
bar2, and easily to find the difference between them, x_bar1 minus x_bar2.
This difference is absolutely true in a classical mathematical point of view
. But in the statistical point of view, this difference is composed of two
parts, or it has two different sources, one is systematic error, and the
other is random error. The t-test constructed the t statistic to measure a
probabilistic magnitude of the random error in the total difference.
Therefore, we have to take dichotomization to make a judgement. I think
this is the so-called significance that we can find with the t-test.
So, we can say the difference is significant if the probability of random
error is happening less than 0.05 threshold; otherwise it is not significant
.
Of course, the threshold 0.05 is arbitrarily made. But it looks like that we
could have no other ways to do so.
However, the true magnitude of either systematic error or random error in
the total difference is unknown. We have no way to know them. The t
statistic just provides a statistical way to estimate them in a probability
scale. Actually the t statistic itself is also a measurement scale. Once it
is probabilized, we have the probability scale. That is why we can obtain a
p-value through a t-value.
So, in my opinion, it is not that we are “dichotomania”. We have to take
the dichotomy because the difference that we try to test has only two
sources. In contrary, if we don’t take the dichotomy, we will fall into a
situation with some ignorance.
I can see in your paper, you often say the null hypothesis for zero effect
or zero systematic error. I would like to say that the “zero systematic
error” should be replaced by “random error is large enough in the total
difference”. This might be better for doing significance test and
consequently to explain the results and eliminate some sorts of
misunderstanding.
Best regards!
Yours sincerely,
Ligong Chen, MD/MPH
d********m
发帖数: 3662
4
this is a big deal now
1 (共1页)
进入Statistics版参与讨论
相关主题
关于用Inverse Wishart作prior的问题请教 stat phd,finance phd
两个internship offer, 该选哪一个?20个包子,求Bayesian Data Analysis 第二版 Gelman && Stern的全部答案
Google用手积分一个hierarchical bayesian model。
请教统计选课, 兼问bayesian的工作机会多吗?哥伦比亚大学统计学图书在中国被禁 (转载)
Stat should try to find a job in Wall Street求推荐一下Bayesian入门教材
求助:hierarchical linear model 的问题从生统转political science的可行性和前景如何?
Job opening一般如何证明MCMC算法得到的sample是来自targeted distribution的呢?
关于统计和CS的询问,请大家发表点意见,谢谢!牛津贝叶斯杀手
相关话题的讨论汇总
话题: scholar话题: google话题: article话题: am话题: error