第8页 - 关于clustering的讨论汇总 - 话题女王

全部话题 - 话题: clustering

z****n
发帖数: 79

来自主题: Computation版 - cluster computing 和 grid computing 有什么区别？

请问，
cluster computing 和 grid computing 有什么区别？
多谢。

w****a
发帖数: 155

来自主题: Computation版 - 组里需要添置一个用于科学计算的cluster, 请推荐一个型号。

组里需要添置一个用于科学计算的cluster, 请推荐一个型号。
研究方向主要是分子模拟和力学计算。谢谢。

s*******h
发帖数: 113

来自主题: Computation版 - 求助：一个C语言code在自己机器上跑正常，在cluster上出现segmentation fault，可能是什么原因。

如题，在自己机器上用的是code::block.也没用到什么复杂函数。放到cluster上就一
直出现这个问题，这错误也不会出现行数啥的。。。如何解决？
本人非CS专业编程新手，问题小白了的话大家轻拍。。。

s*******t
发帖数: 2896

来自主题: Computation版 - 学校的cluster不好用怎么办？

提交的几十个job(每个也就几小时）总在waiting。
自己买个cluster多少钱？不用太大，100来个core就行了。

b**********l
发帖数: 431

来自主题: Computation版 - 哪个cluster性能更好？

Cluster 1 is better.
你们公司还招CFD计算的吗？能内推一下吗？

G*********a
发帖数: 1080

来自主题: Mathematics版 - A cluster question

if i know the pair relations of a group of data, as a matrix, it looks like:
a b c d e
a 0.0 0.2 0.1 0.8 0.4
b 0.2 0.0 0.9 0.6 0.4
c 0.1 0.9 0.0 0.7 0.1
d 0.8 0.6 0.7 0.0 0.5
e 0.4 0.4 0.1 0.5 0.0
how can i cluster these data according their relations? is that possible?

e******o
发帖数: 27

来自主题: Mathematics版 - 有人做fuzzy clustering的么？

有几个问题想请教:
对于categorical data,
如果attribute很多的话
现在最好的办法是什么？
谢谢
BTW,I want large numbers of clusters too
搞不太清这到底算哪个领域
Math?CS?EE?

c****e
发帖数: 2097

来自主题: Mathematics版 - cluster algebra

请问各位对老毛子，尤其是zelevinsky啥的搞的cluster algebra是啥看法？

v***o
发帖数: 1542

来自主题: MedicalCareer版 - Cancer Cluster

Within four-square mile area 的住宅区(总人口大约5，6 万），有6位白血病病人，算不算Cancer Cluster？
谢谢！

a*e
发帖数: 431

来自主题: Psychology版 - 请教两个Cluster Analysis的问题

不太清楚怎样准确地用数学语言来描述，请大家将就着看看。
俺有一组一共50个对象，想做cluster analysis来分析一下，
可是这组对象俺想交给两组不同的被试来做proximity的区分，
每个组都是99个被试，结果得到个两个50x50的proximity matrices，
数据大概看起来是酱紫，对角线上都是99，也就是被试人数：
矩阵A 矩阵B
A1 A2 A3 . . . A50 B1 B2 B3 . . . B50
A1 99 43 78 B1 99 87 23
A2 43 99 56 B2 87 99 43
A3 78 56 99 B3 23 43 99
. .
. .
A50 B50
两个矩阵都打算用increas

L******g
发帖数: 1371

来自主题: Quant版 - computer cluster for research

好吧
看来cluster 普及了，小fund 都如此，更不要说 SAC， 2Sigma 了

z*****a
发帖数: 3

来自主题: Science版 - [转载] Cluster with a N or C in a Fe6 cage?

【以下文字转载自 Biology 讨论区】
【原文由 zyzzyva 所发表】
Does anyone know there is a cluster with such kind of structure?
It means that there are 6 Fe that creates a sphere surface, and
a N or C is in the center of the Sphere.
Thanks

m***b
发帖数: 11

来自主题: Statistics版 - 用什么可以画这个clustering 图? R?

好象很难设置彩色的cluster leaves.

s**k
发帖数: 145

来自主题: Statistics版 - 关于cluster analysis的问题

假如我用single linkage 方法估计了几个cluster,想用significance test 证明估计
是对的，要用什么sas code呢？

g**********l
发帖数: 214

来自主题: Statistics版 - SAS Proc Cluster 问题

请问一下
Proc Cluster Method=EML
EML mehod 可以用both numerical and categorical data 吗?
如果可以，能不能指点一下怎样做的 (or just show me a link)
如果不行，那SAS 有没有什么其他方法可以 handle
Thanks a lot!

r********n
发帖数: 61

来自主题: Statistics版 - 请教一下如何用ibs matrix 做cluster analysis

哪位能指教一下如何用ibs(pairwise genomewide identity-by-state) matrix做
cluster analysis. 俺知道plink能做,但plink没有给出referece或者是code
不知道里面是怎么算的
如果能给俺R code或者SAS的
那就更好了
多谢多谢

x*******i
发帖数: 10

来自主题: Statistics版 - k means clustering number

No, it can not.
The problem is for each K, it need to compare with the randomly draw uniform
data from matrix 5000x32, estimate the dispersion. You know, the number for
the cycle can not be small for this NP procedure. I even reduced to 30
times.
For large dataset, given K, the K mean itself (cluster library) is slow in R
for only one time calculation.

存就省了呢？

o****o
发帖数: 8077

来自主题: Statistics版 - k means clustering number

someone posted a sample non-optimized R code:
http://www.stat.rutgers.edu/~rebecka/RCode/gappcalg.q
but you can use the kmeans function in R to get within cluster variation
swiftly.

B*********L
发帖数: 700

来自主题: Statistics版 - 关于data mining， clustering和association区别是什么？

我的基础，读data mining的书大概跟读天书也差不多了。老大能不能把clustering和
association的区别稍微点两句，我这儿wiki了半天没太明白。
多谢了！

b*****n
发帖数: 685

来自主题: Statistics版 - 关于data mining， clustering和association区别是什么？

clustering 和 classification 相近，只不过是非监督的了，就是true的class
information 未知。
association rule基本是另一个问题，近似regression，找变量之间correlation，典
型的例子就是买面包和牛奶的问题了，你可以参看最早提出的那片CS的paper，叫Ag什
么的人写的，名字记不得了。
劝你看是看看书，像Jiawei Han之类的，基本是CS的书，不难。

B*********L
发帖数: 700

来自主题: Statistics版 - 关于data mining， clustering和association区别是什么？

多谢，多谢。我去找本Jiawei Han的书读读。牛人一句话，俺对clustering好像有点感
觉了。

s*******e
发帖数: 226

来自主题: Statistics版 - AR(1) and clustering by firms

panel data of firms over years.
How to use STATA and SAS to program models with errors of AR(1) and
clustering by firms?

s*******e
发帖数: 226

来自主题: Statistics版 - AR(1) and clustering by firms

My current program is
proc mixed method=reml empirical noclprint data=beta;
class ID time;
model Y = ID x1 x2 x3/s;
repeated/type=ar(1) subject=ID;
but it does not account for clustering by firms.
Any suggestions?

c**d
发帖数: 104

来自主题: Statistics版 - AR(1) and clustering by firms

我想楼主第一要把自己想model的问题想清楚。
（1）比如你这个panel是什么？应该是firms吧。
（2）比如你想compare mean response over time by firms是你关心的问题
（3)你的ID 应该是单个的subject and clustered by firms
proc mixed
class id time firms;
model y = time firms time*firms x1 x2 x3/s;
repeated /type = ar(1) subject = id(firms);
random id(firms); /* random intercept model */
run;
proc mixed method=reml empirical noclprint data=beta;

c******a
发帖数: 725

来自主题: Statistics版 - AR(1) and clustering by firms

You can remove (part of) the first order serial correlation by first differencing data.
Then you apply the fixed effect model to the differenced data using
clustering errors.
In order to fully remove the serial correlation, you may apply Prais-Winsten estimation (using quasi-differenced data). See Wooldridge (2006), chapter 12 for details.

e***o
发帖数: 180

来自主题: Statistics版 - Clustered Data能用GEE或Mixed Model吗？

this makes sense for Proc Mixed. thanks
can somebody use GEE for clustered data, say using Proc Genmod, if the
response is not continuous?

S****Y
发帖数: 4634

来自主题: Statistics版 - 怎么样estimate two-way cluster logistic regression？有包子

问一下，怎么样用SAS或者STATA estimate logistic regression,
with clustered standard error in two dimensions.
谢谢！

n*****1
发帖数: 172

来自主题: Statistics版 - 怎么样estimate two-way cluster logistic regression？有包子

我也想知道。在Stata里面应该没有现成的command可以做two way cluster for logit

n*****1
发帖数: 172

来自主题: Statistics版 - 怎么样estimate two-way cluster logistic regression？有包子

谢谢！明天去下载来玩一玩。不过其实我更想要的是在multinomial logit下搞two
cluster。。。

K****a
发帖数: 67

来自主题: Statistics版 - 求助：data source for Cluster Randomized trail

最近在赶论文，发现自己找不到数据。论文是关于sample size estimation for
cluster Randomized trail的，google 了半天也不知道哪里有下载。请问哪位好心人
可以帮我找找这种类型的数据吗？感激不尽

s*********s
发帖数: 100

来自主题: Statistics版 - 求推荐与clustered data分析相关的书

有没有高人推荐几本clustered data有关的书，多谢多谢！！

g*****0
发帖数: 14

来自主题: Statistics版 - R question-run on cluster

I have R script my_script.R . It loads a large input file "data.txt" and
then outputs a large matrix "out"
Now I have 30 different input files "data1.txt", "data2.txt" ... and
"data30.txt" and want to generate and save 30 output (matrix) files
separately, how can I achieve this by running the R-script on a linux
cluster?
Looked at R CMD BATCH, but not sure whether and how it works...
Thanks!

g*****0
发帖数: 14

来自主题: Statistics版 - R question-run on cluster

Hi,
Thank you for your reply!
If the input file is just one file "data.txt" , the command of running it on
cluster using my.cmd is as follows:
---------------------------------------------------------------
my.cmd
-----------------------------------------------------------------
getenv = TRUE
Executable = /usr/bin/R
Arguments = --vanilla
Universe = vanilla
input = my_script.R
output = out
error = err
Log = log
Queue
--------------------------------------------------------------
Thanks!

B******5
发帖数: 4676

来自主题: Statistics版 - R question-run on cluster

这不是condor么？
话说每个cluster运行机制不一样，你问得问题太general了

on

g*****0
发帖数: 14

来自主题: Statistics版 - R question-run on cluster

Sorry for this question being too general...
The cluster hierarchy is as follows: there is a central server which I can
log into, and this central server assigns execution of jobs to subordinate
other servers. Jobs are submitted to them through the Condor batch queue
scheduler.
Thanks!

A*******s
发帖数: 3942

来自主题: Statistics版 - # of subjects/clusters in mixed model

Does anyone have experience on fitting mixed effect model with more than 500
,000 subjects/clusters, two binormal random effects and 20 fixed effects? I
am trying nlmixed but it has been running over 5 hours. Will Glimmix be
better?
Since the model is of binary outcome I cannot use Hpmixed. :( Any other
suggestions? Thanks in advance.

u*****8
发帖数: 180

来自主题: Statistics版 - 求教：Cox PH 模型的cluster data处理

我要用cox proportional hazard模型分析鸦片类药物导致药物中毒的hazard risk。每
个病人的用药历史被分为多个episode9（比如说一个episode是08年1月到3月，另一个
是08年9月到09年5月）.我把每个episode做成一个独立的time-to-event,假设每个人每
次episode的baseline risk 一样。
请问: 我是不是应该把属于同一个人的几个episode看作clustered data,因为存在
correlation。
多谢

a****y
发帖数: 1035

来自主题: Statistics版 - 求救：关于Excel作图--cluster bar chart

想在Excel里面画一个cluster bar chart。类似于附件里面的图，
看了看网上的教程，数据必须都是cross table的格式，例如
blue_bar red_bar
year=2004 n11 n12
year=2005 n21 n22
year=2006 n31 n32
year=2007 n41 n42
可是我的data是这样的：
year color value
2004 blue n11
2004 red n12
2005 blue n21
2005 red ...
2006 blue ...
2006 red ...
2007 blue ...
2007 red ...
必须要重新排列data吗？如果必须的话，怎么重新排列？（因为data很大，手动不现实
。。。）
如果可以不重新排列data... 阅读全帖

f**y
发帖数: 8

来自主题: Statistics版 - 请问categorical data怎么做 clustering呀

Categorical data 应该可以做 cluster analysis，如果结果不好的话，可以先标准化
一下

a***u
发帖数: 69

来自主题: Statistics版 - 请问categorical data怎么做 clustering呀

categorical data做clustering貌似没有什么意义呀就那么几个类别，还需要分类吗
？

A*******s
发帖数: 3942

来自主题: Statistics版 - 请问categorical data怎么做 clustering呀

the original algorithm of random forest has clustering piece.

Am

h**t
发帖数: 1678

来自主题: Statistics版 - Clustering algorithm for categorical data

Does anyone know what algorithm for clustering categorical variables? R
packages? Which is the best?
If a data has both numeric and categorical data, what is the best algorithm
to use and R package?
Thank you!

s*********h
发帖数: 6288

来自主题: Statistics版 - Sample size for clustering analysis

twp step clustering 在R里有吗？

the
the
It

c***z
发帖数: 6348

来自主题: Statistics版 - data clustering by vector correlation distance

http://www.statmethods.net/advstats/cluster.html

A*******s
发帖数: 3942

来自主题: Statistics版 - cluster effect in case control study

case control的优点就是可以stratified/conditional logistic regression
cluster/stratum effect会变成nuisance parameter
sampling weight不起作用
因为会同时出现在conditional likelihood的分子分母中被消掉

adjust
sample
control

t********6
发帖数: 43

来自主题: Statistics版 - cluster effect in case control study

谢谢ls两位.我估计我没把问题说清楚.
总体的population:
cluster1
D+ D-
E+ a1 b1
E- c1 d1
cluster2
D+ D-
E+ a2 b2
E- c2 d2
OR(ture)=(a1+a2)(d1+d2)/(b1+b2)(c1+c2)
在我的sample里:
cluster1
D+ D-
E+ a1 fb1
E- c1 fd1
cluster2
D+ D-
E+ a2 gb2
E- c2 gd2
OR(sample)=(a1+a2)(fd1+gd2)/(fb1+gb2)(c1+c2)
OR(sample) not equal to OR(ture), because f not equal to g;
Bias就是这样产生的.
f,g已知,分别是每个cluster里sample的control的百分比.
我用的是inverse probability=1/f or 1/g for control group.

c***z
发帖数: 6348

来自主题: Statistics版 - 问一个关于clustering analysis的问题 (转载)

【以下文字转载自 DataSciences 讨论区】
发信人: chaoz (没钱也任性), 信区: DataSciences
标题: 问一个关于clustering analysis的问题
发信站: BBS 未名空间站 (Tue Jan 6 12:49:25 2015, 美东)
如果各个feature之间scale不同，而且每个feature自己的数据也是highly skewed，大
家有什么好办法吗？Take log and normalize?
不是这方面的专家，稍微做了点research，但是没有什么clue。
Thanks a lot!

h******3
发帖数: 190

来自主题: Statistics版 - GEE可以用在unbalnced design/different cluster sizes吗？

假设一个household里的人correlated.每个household的人数不一样这样的different
cluster sizes. 我用geeglm（）试了下不行啊。
这种情况该怎么操作GEE。
用random effects model应该很容易，但是我需要用GEE。抓狂中。
多谢帮忙！

h******3
发帖数: 190

来自主题: Statistics版 - GEE可以用在unbalnced design/different cluster sizes吗？

不好意思我发现gee是可以用在unbalanced cluster的。之前用geeglm碰到的error
message是因为covariate里有missing data。

b**********e
发帖数: 672

来自主题: Statistics版 - customer cluster

除了常用的cluster analysis帮助把customer分组，还有其他什么方法帮助把customer
根据profile特征分组吗？
谢谢

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天