s**i 发帖数: 381 | 1 【 以下文字转载自 Hardware 讨论区 】
发信人: seki (瓜金), 信区: Hardware
标 题: X86的shared-memory 机器还是不行阿?
发信站: BBS 未名空间站 (Wed Feb 28 19:54:48 2007), 站内
发信人: seki (瓜金), 信区: Linux
标 题: X86的shared-memory 机器还是不行阿?
发信站: BBS 未名空间站 (Wed Feb 28 19:54:41 2007), 转信
最近用了一台Sunfire X4600
8个dual core 的AMD Opteron 855
cpu MHz : 2613.696
cache size : 1024 KB
60GB RAM,号称memory bandwidth up to 6.4GB/sec
在上面运行一个MPI code(主要就是解大型稀疏矩阵),parallel scaling 远远不如
另外一台 Opteron cluster,而且cluster的机器的CPU还要慢些
今天又试了试一台IBM 595: 64 Pow | b***e 发帖数: 38 | 2 Can you tell how large the sparse matrix is and how fast the solver runs?
【在 s**i 的大作中提到】 : 【 以下文字转载自 Hardware 讨论区 】 : 发信人: seki (瓜金), 信区: Hardware : 标 题: X86的shared-memory 机器还是不行阿? : 发信站: BBS 未名空间站 (Wed Feb 28 19:54:48 2007), 站内 : 发信人: seki (瓜金), 信区: Linux : 标 题: X86的shared-memory 机器还是不行阿? : 发信站: BBS 未名空间站 (Wed Feb 28 19:54:41 2007), 转信 : 最近用了一台Sunfire X4600 : 8个dual core 的AMD Opteron 855 : cpu MHz : 2613.696
| s**i 发帖数: 381 | 3 For example, one matrix is 2,170,159 X 2,170,159
it has 58,400,311 nonzeros
using preconditioned conjugate gradient method
it converges in 7 iterations.
If I use 32 processes, it only takes 0.28 seconds.
If I use 2 processes, it takes 5.29 seconds
The serial version converges in 4 iterations, taking 7.7 seconds.
【在 b***e 的大作中提到】 : Can you tell how large the sparse matrix is and how fast the solver runs?
| m***t 发帖数: 254 | 4 hoho, interesting post, two things: for SMP, you want to use openMP, not MPI
; second, cache performance
is the ultimate key, not the clock rate.
【在 s**i 的大作中提到】 : For example, one matrix is 2,170,159 X 2,170,159 : it has 58,400,311 nonzeros : using preconditioned conjugate gradient method : it converges in 7 iterations. : If I use 32 processes, it only takes 0.28 seconds. : If I use 2 processes, it takes 5.29 seconds : The serial version converges in 4 iterations, taking 7.7 seconds.
|
|