第7页 - 关于clustering的讨论汇总 - 话题女王

全部话题 - 话题: clustering

d*******r
发帖数: 3299

来自主题: Programming版 - 求推荐带 cluster 模式的类 Redis DB

我看了下，vert.x message bus 貌似还是太简单，只有基本功能，
如果要把很多 messages 扔到到一个分布式内存 DB 中，然后还要支持很快的自定义的
查询读写，貌似 "Redis Cluster" 之类的最好，但是丫还处于难产的"开发中"阶段

q*c
发帖数: 9453

来自主题: Programming版 - 求推荐带 cluster 模式的类 Redis DB

就是因为这样不错那样不错，所以 cluster 搞不定。
transaction 可不容易。

z****e
发帖数: 54598

来自主题: Programming版 - 求推荐带 cluster 模式的类 Redis DB

想了下，c*不合适，还是storm吧
要么就是内存数据库，redis貌似不错得样子
你自己写一点cluster得功能也没那么难了
反正你们只是查，没有大面积的改写数据which适用于c*
所以你有两套方案，内存数据库vs流式处理
也就是redis/mongo vs storm这些
storm不需要你用clojure，除非有bug
否则你不用clojure也没啥大不了的
如果真怕，用jstorm吧，淘宝的人做的
建议你做两套，然后看看表现如何
哪个好留哪个
看了下redis，感觉真不错
观后感是
写内存计数器的eq简直是降到了零点以下
整个一负数

d*******r
发帖数: 3299

来自主题: Programming版 - Hazelcast: 有 Cluster 模式的 MemoryDB on JVM

我看了下 VoltDB，这个是针对 SQL DB 的内存DB，操作全是 SQL 风格的，比如create
table, select 之类的操作，感觉适合和 MySQL 和 postgresql 结合起来用。
Hazlecast 比较像 Redis, 提供的是 map, queue, list 之类的 generic 的数据结构
。只是比较起 Redis, Hazlecast 天生有 Cluster 模式。适合 scale out. Hazlecast
适合跟 Cassandra 和 mongoDB 结合起来用。

p*****2
发帖数: 21240

来自主题: Programming版 - Redis Cluster beta -- Redis 3.0 beta

这个cluster你玩过了吗

(
redirect,

d*******r
发帖数: 3299

来自主题: Programming版 - Redis Cluster beta -- Redis 3.0 beta

没有，昨天才看到的这个 Cluster 视频，是基于 unstable 版本讲的, 不过设计貌似
挺简单靠谱的
问了下视频小作者，他说等到 Redis 3.0, 就可以在 production 中用了

y****k
发帖数: 71

来自主题: Programming版 - $68000 买这个cluster怎么样 (转载)

【以下文字转载自 Hardware 讨论区】
发信人: ystdpk (ystdpk), 信区: Hardware
标题: $68000 买这个cluster怎么样
发信站: BBS 未名空间站 (Fri Nov 21 03:27:19 2014, 美东)
6个机器，
(4+14+14+10+6+4)x2=104个core， or 208个thread
1024GB RAM 大约100TB硬盘
server 1：
2x Haswell 4C E5-2637V3
2x HGST 3.5" 2TB SATA 6Gb/s 7.2K RPM 64M 0F14685
8x 16GB DDR4-2133 1.2V 2Rx4 ECC REG RoHS Memory (128GB RAM)
2 servers with upgrade to dual E5-2697V3
1 servers with upgrade to dual E5-2687V3
1 servers with upgrade to dual E5-2643V3
1 servers with upgrade to 24x 1... 阅读全帖

C*********r
发帖数: 21

来自主题: Programming版 - cluster环境里怎么做测试

一般客户在使用的系统数据量是最大的，并且有大量部署的机器可以进行运算。但是如
果是做dev或者qa的话怎么办呢。如果copy整套prod的环境实在是太贵了。如果使用一
套小的cluster进行模拟，但是一般prod上面的数据怎么迁移到qa或者dev环境做测试呢
？比如facebook或者google这种，肯定不会是有三套系统可以供不同的开发阶段进行使
用的吧。

z****e
发帖数: 54598

来自主题: Programming版 - nodejs cluster和vert.x比较怎么样？

说的是这个吧？
http://jxcore.com/nodejx-vs-vert-x-vs-node-js-cluster/
下面tim的解释很清晰了
它关于vert.x的setup一直没给出来
不知道有啥好隐瞒的，到底是怎么run verticle的cmd就是一行的事
不敢给，那这个结果就很值得商榷了
多半是自吹自擂的产物，一个比较科学的benchmark需要给出各种东东
然后你自己能够重现他做的过程和得到类似的结果才对，就像techempower那样
Tim Fox obastemur . • 2 years ago
Sounds like a setup issue. If you're publish your actual benchmark setup
somewhere (with exact instructions for replicating) then people could take a
look, maybe see where the issues are, and re-run it. Most credible
benchmarks w... 阅读全帖

d*******r
发帖数: 3299

来自主题: Programming版 - nodejs cluster和vert.x比较怎么样？

据我的理解
Node 那个 cluster 是本机多进程使用
Vert.x 那个是分布式的 over message bus

z****e
发帖数: 54598

来自主题: Programming版 - nodejs cluster和vert.x比较怎么样？

verticle之间一般不share data
如果需要share data，则需要通过msg bus来完成
先将需要share的data转换成json，然后发送给其他的verticle
因为json是所有语言都能接受的这么一个数据格式
那json自然就是immutable的，在发送和接收过程中是无法被更改的
好吧，至少msg bus不会尝试去更改msg，这样就不需要lock了
原理跟fp的多线程原理是一样的
但是vert.x很巧妙滴用这种方式，绕开了immutable这个概念
你不知不觉中就用了fp的多线程的那种方式，就是actor model
但是这种方式呢，有些人会说，太麻烦
ok，那就用map这些来分享data
那这个时候就需要做成immutable的object了
因为actor model之间的share datas必需是immutable的
否则会破坏single threadness
但是这个技能有些高级，一般人可以不用
如果需要用的话，这个就很容易理解
另外，vert.x的lock也有，是Cluster-wide locks
不是threads之间share data... 阅读全帖

J****R
发帖数: 373

来自主题: Programming版 - Hadoop CLUSTER部署thrid party libraries一般是怎么操作?

谢谢!
总结一下,基本上就是3种途径:
1, copy jar files and config files onto all nodes in cluster.
2, fat jar
3, distributed cache.
第一种太麻烦,上production 不现实,这么搞的话operation team会疯掉.
第二种比较低效,因为fat jar size太大了,跑起来performance恐怕有问题.
第三种解决了这些问题,但也得还得在client node上面单独放上jar, config. 不然启
动Job的时候会出问题.

J****R
发帖数: 373

来自主题: Programming版 - 哪位童鞋试过 submit hadoop job to remote cluster from java code?

我google了一下,大体都是说要设几个config:
mapreduce.framework.name
yarn.resourcemanager.address
fs.defaultFS
但实际情况光设这些并不work,总是出现
Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
Please check your configuration for mapreduce.framework.name and the
correspond server addresses.
不知道这里有没有大牛做过这个事情?

w***g
发帖数: 5958

来自主题: Programming版 - 卫东，怎么用DL做clustering？

k-means不行吗？DL是用来提特征的，做clustering恐怕没有优势。

L**A
发帖数: 425

来自主题: Programming版 - cluster pipeline broken, 如何检查background run jobs

ssh 到 computer cluster上，然后run 了一个for loop, 用的是 & 作background
run 指令
ssh 后来断了，看得出for loop还在run, 因为output一直在更新. 但问题是jobs
command已经没有任何信息了。
想问下这种情况，如何检查这些 background run 的status?
多谢了！

a*****s
发帖数: 1121

来自主题: Programming版 - 一个Hadoop Cluster升级的问题

一般公司都是物尽其用，大cluster意味着运行的程序更多。要找富余，只有根据历史
记录找application少的时间。这段时间能否保证所有都upgrade。不清楚是不是
hortonworks自己做过大规模RU测试没有。

e*****n
发帖数: 15

来自主题: Biology版 - 大家能推荐一个判断一些基因是否cluster在某染色体上的软件么

现在手上有一些基因在aged animals的海马里面上调或者下调，想看看这些基因是否
cluster在某一个或者几个染色体上。大家能推荐个软件么？
多谢~

h*******o
发帖数: 4884

来自主题: Biology版 - 作array cluster analysis的时候用Pearson correlation和Euclidean Distance有什么区别？

最近作了一些Taqman的Low Density Array (LDA), 老板要cluster analysis
Array 小白，只会用data analysis assist那个软件, 里面有2个选项，
一个是Pearson's Correlation 一个是 Euclidean Distance。
哪位大侠能深入浅出的给讲讲，有什么区别？
如果我相比较组间gene expression区别有多大，是不是用Euclidean Distance 比较好？
谢谢！

G***G
发帖数: 16778

来自主题: Biology版 - unigene cluster vs unigene id

what is the difference between unigene cluster and unigene id?
what unigenes can 227057_at be mapped to?

o**4
发帖数: 35028

来自主题: Biology版 - cluster不能识别我的芯片数据

我按照要求做了个txt文件，在cluster上load不上去，怎么办？

o**4
发帖数: 35028

来自主题: Biology版 - cluster不能识别我的芯片数据

不知道哪儿不对，
我用cluster 3.0，又能识别了，晕啊

n********k
发帖数: 2818

来自主题: Biology版 - seeking recommendation for clustering analyses

I have two series data (from miRNA array analysis), each comprising of data
from 6 time points either experimental or control groups...Now I desire to
do a cluster analysis for them to sort out genes which change in the same
or different dynamics between the two groups...Any recommendation on the
software? thanks

j****x
发帖数: 1704

来自主题: Biology版 - seeking recommendation for clustering analyses

hehe,以后凡是这种只回答“R”或者“Bioconductor”而不指明package的，版主应该
扣包子
miRNA Time-series和传统的mRNA Time-series Clustering没有太大的差别，我以前都
是先在genespring下面用k-means聚类大致看一下，再决定怎么往下走。
绝大部分商业芯片分析软件都有相应功能，你那里要是有的话就翻翻manual。要是不想
花钱就看看下面这个Short Time-series Expression Miner (STEM)，挺不错的。
http://gene.ml.cmu.edu/stem/

n********k
发帖数: 2818

来自主题: Biology版 - seeking recommendation for clustering analyses

thanks, I know that, I have been trying to general cluster analysis software
... it is not that user-friendly in term of input data format, 2. no link
for the miRNA genes...I could live with the second problem but the first one
has been annoying...Free is better at this moment...

G*****o
发帖数: 315

来自主题: Biology版 - How can I install program on bioinfor cluster?

你可以试试把blast装到你自己的帐户下。不一定非得让admin帮忙。如果有几个不同平
台的linux安装程序，下载一个和你的cluster最接近的。最好找一个懂一点linux的朋
友帮忙。

t*d
发帖数: 1290

来自主题: Biology版 - How can I install program on bioinfor cluster?

就是为 SunOS 5.9 定制的源代码。
Sun 以前的系统是 Solaris。NCBI 上应该有这个平台的 blast 可执行代码。就不知道
那个版本和你的 SunOS 5.9 兼容。自己试试吧。这个可执行代码大多是下载后就可用
，不需要编译，也不需要管理员权限。
另外，Windows 下的 Blast 也一样干活啊。为什么非用那个 cluster 呢？

the

e**s
发帖数: 513

来自主题: Biology版 - How can I install program on bioinfor cluster?

Do you mean I can install it under my own folder? The IT stuff told me I
need to put it in the root. Well, they are not the ones who maintain the
cluster though.
Thanks for your reply!

e**s
发帖数: 513

来自主题: Biology版 - How can I install program on bioinfor cluster?

I went to the FTP site for BLAST installer and source code. I downloaded
ncbi-blast-2.2.25+-sparc64-solaris.tar.gz. It contains many applications.
Are those the source code you menioned? Do I simply move these applications
under my folder in the cluster? I can't find any file that looks like "code"
.

G***G
发帖数: 16778

来自主题: Biology版 - classification vs clustering

有来自于34个省的3400个样本，每个省100个。
想划一个dendrogram来看看哪些省比较互相接近。
这个是叫什么？
clustering？

k****n
发帖数: 158

来自主题: Biology版 - DAVID/clustering 分析一个入门问题

请问： functional clustering annotation 和 gene functional classification 有
什么不同，分别主要用来解释什么问题？
问题由来；
有2组microarray 数据（control/patients）分析得出200个显著差异表达的genes(
gene list),把这个gene list 输入DAVID，希望得到这200个基因的相关性，比如得
到某个信号通路被放大，或者某个生物功能（migration，division）被改变，以便进
行后续wet分析，请问应该采用上述那个分析方法？
谢谢

N*******k
发帖数: 43

来自主题: Biology版 - 继续找俺的 linux cluster 管理员

起码要用过 cluster，管理过组里的 linux server 吧。

N*******k
发帖数: 43

来自主题: Biology版 - 继续找俺的 linux cluster 管理员

唉，一跟生物搭界，工资水准就会偏低，因为有连带效应。负责管理的 director 一般
由 instructor 来担任，一开始工资在 55k 到 65k。比如说原来管理数据库，管理
sequencer 的两个朋友。其中一个今天打电话过来说打算 quit 去做大夫了。而新 AP
经常只有 65k - 75k。在这样的 director 手下，工资可不就只有 50-55k 么。生物领
域的工资畸形是大问题啊。
不过只要入了行，后面就好办了。我们这里的一个例子是一个做高能物理的 PhD 毕业
找不到工作，于是来做 cluster manager -- 因为他有做大型并行计算的经验。现在没
几年，他的工资已经是天价了 -- 没办法，大家离不开他，而他随时可以去工业界。毕
竟工业界没有几个有独自把一个大型 supercomputer 从无到有做起来的过程中方方面
面的丰富经历，以及从无到有组建起一个团队的经历。

includes
and
work
software

K****n
发帖数: 5970

来自主题: Biology版 - 继续找俺的 linux cluster 管理员

有多少server啊？为啥entry level搭一个机群很稀罕？
我推荐去UPenn招人，那里生物转cs的是我见过最多的，而且他们有门课，人人都在EC2
上搭过cluster

t****b
发帖数: 47

来自主题: Biology版 - 继续找俺的 linux cluster 管理员

俺在北京做生物实验室的linux cluster 管理员，可惜不懂生物，更不懂所谓NGS。

d********f
发帖数: 43471

来自主题: Biology版 - 继续找俺的 linux cluster 管理员

你的linux cluster多少cpu? 1000个一下找个兼职的就可以了

s**********t
发帖数: 680

急用！请推荐Hierarchical Clustering Analysis的免费软件，投稿用，所以要学术界
认可的。
用这个软件分析数据投稿，reviewer不会说这个软件不可靠。例如一些peer reviewed
publications已经用过的。最好是大多数人都用的。我第一次用，不了解，谢谢指教！

K******S
发帖数: 10109

cluster & treeview

reviewed

y*******1
发帖数: 164

Cluster 和 Treeview是正解，点点鼠标就可以做了
http://rana.lbl.gov/EisenSoftware.htm

reviewed

s****l
发帖数: 10462

来自主题: Biology版 - Recommend a cluster for bioinformatics R&D

Budget is about ~$40k for the cluster. Also need a Linux box, ~$3k (I guess
I'll go with Ubuntu OS).
Thanks!

f********x
发帖数: 99

来自主题: Biology版 - Recommend a cluster for bioinformatics R&D

CentOS + Rocks Cluster + Hadoop

guess

j*********m
发帖数: 33

来自主题: Chemistry版 - A question about constructing an Al cluster.

there is metal hydrogen bond, hydride is very active in water, so I think
the idea is somehow wired. For metal cluster, I think water would be a
better choice, for oxides, maybe hydrogen.

r*****d
发帖数: 727

来自主题: Chemistry版 - A question about constructing an Al cluster.

谢谢回复，但是我想把这个cluster放到有机溶剂里，比如THF。这种情况下，是不是放
水更好，还是H就行。

g**d
发帖数: 28

来自主题: Computation版 - Re: 学校里的Beowulf Cluster要升级了

谁会build linux bewolf cluster with 16nodes.

g**d
发帖数: 28

来自主题: Computation版 - Re: 学校里的Beowulf Cluster要升级了

我自己给组里build bewolf cluster.
还有一些问题, 比如pbs, pvm都还没有装.
submit jobs 只好一个一个nodes 进去.
有机会请教大家.
而且现在老是出现个别nodes不能mounted 到master nodes的情况.
我老的用
fsck /dev/hda3 去check file system.

g**d
发帖数: 28

来自主题: Computation版 - 谁知道哪里有build cluster的workshop

总算让老板答应出钱让我去学怎么build cluster.

但是找不到相应的workshop.
前几年utk还每年都有这方面的workshop.
谁能告诉我一下.

g**d
发帖数: 28

来自主题: Computation版 - [linux cluster]

大家装个linux cluster大概需要多少钱

d*****w
发帖数: 124

来自主题: Computation版 - We are building new cluster now!

We are building new cluster now!

b*****y
发帖数: 163

来自主题: Computation版 - Clustering Methods

The following text is adopted from :
http://www.cs.sandia.gov/opt/survey/cluster.html

e****e
发帖数: 179

来自主题: Computation版 - how to submit multi-tasks to HPC linux cluster? (转载)

【以下文字转载自 Linux 讨论区】
发信人: engine (boxing cat), 信区: Linux
标题: how to submit multi-tasks to HPC linux cluster?
发信站: BBS 未名空间站 (Thu Sep 20 23:04:14 2007)
Here is what I do manual:
$ ssh node1
$ username: engine
$ password: xxxxx
$ run1.exe
Repeat
$ ssh node2
$ username: engine
$ password: xxxxx
$ run2.exe
$ ssh node3
$ username: engine
$ password: xxxxx
$ run3.exe
How can I write a script to submit these job automatically?
Thanks.

j**u
发帖数: 6059

来自主题: Computation版 - how to submit multi-tasks to HPC linux cluster? (转载)

what kind of cluster are you using? It should have a job scheduler.

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天