w*****g 发帖数: 16352 | 1 GitHub上看来的。学习了。还是PornHub社区祥和。
So given recent events there has been a lot of discussion about the Iris
data, which is a standard for demonstrating classification and other
statistical techniques. The problem with it is that was first published by
Ronald Fisher (a Eugenicist) in the Annals of Eugenics and I don't think I
need to point out why that history is extremely problematic for a dataset
that has been used for classification problems. It's also incredibly
overused and many people will roll their eyes just seeing it.
Thankfully there is a new dataset a lot of people seem to be coalescing
around, specifically this open source dataset about penguins: https://github
.com/allisonhorst/palmerpenguins, which has several advantages:
It wasn't popularized by a eugenicist
It has more interesting dimensions to explore
It's about penguins
I therefore propose we replace the Iris data in sampledata and all the
associated examples with equivalent examples using the penguin data.
Libraries like seaborn have already taken this step and I think we should
too.
★ 发自iPhone App: ChinaWeb 1.1.5 | H********g 发帖数: 43926 | 2 优生学是反动的资产阶级科学
github
【在 w*****g 的大作中提到】 : GitHub上看来的。学习了。还是PornHub社区祥和。 : So given recent events there has been a lot of discussion about the Iris : data, which is a standard for demonstrating classification and other : statistical techniques. The problem with it is that was first published by : Ronald Fisher (a Eugenicist) in the Annals of Eugenics and I don't think I : need to point out why that history is extremely problematic for a dataset : that has been used for classification problems. It's also incredibly : overused and many people will roll their eyes just seeing it. : Thankfully there is a new dataset a lot of people seem to be coalescing : around, specifically this open source dataset about penguins: https://github
|
|