y**b 发帖数: 10166 | 1 请问一般用什么通用的方法和工具?
另外,PAPI_MEM_SCY(Cycles Stalled Waiting for Memory Access)似乎不支持
sandybridge/ivybridge/haswell几种处理器,是不是意味着基于PAPI的TAU,PerfSuite
,HPCToolkit也没法做这种测试? | g*********e 发帖数: 14401 | 2 Perf
Vtune
PerfSuite
【在 y**b 的大作中提到】 : 请问一般用什么通用的方法和工具? : 另外,PAPI_MEM_SCY(Cycles Stalled Waiting for Memory Access)似乎不支持 : sandybridge/ivybridge/haswell几种处理器,是不是意味着基于PAPI的TAU,PerfSuite : ,HPCToolkit也没法做这种测试?
| h**********c 发帖数: 4120 | | h**********c 发帖数: 4120 | 4 [et@localhost ~]$ vmstat -s
2915864 K total memory
1351968 K used memory
1433572 K active memory
652544 K inactive memory
516272 K free memory
1004 K buffer memory
1046620 K swap cache
3145724 K total swap
0 K used swap
3145724 K free swap
123731 non-nice user cpu ticks
88 nice user cpu ticks
19288 system cpu ticks
2600085 idle cpu ticks
1494 IO-wait cpu ticks
0 IRQ cpu ticks
3351 softirq cpu ticks
0 stolen cpu ticks
794970 pages paged in
802406 pages paged out
0 pages swapped in
0 pages swapped out
5843121 interrupts
9239204 CPU context switches
1485969013 boot time
7778 forks | y**b 发帖数: 10166 | 5 perf不错,好像不能测试mpi程序。请看这个例子:
perf stat -p 48382 sleep 10
Performance counter stats for process id '48382':
4821.604141 task-clock (msec) # 0.963 CPUs utilized
[100.00%]
1,218 context-switches # 0.253 K/sec
[100.00%]
0 cpu-migrations # 0.000 K/sec
[100.00%]
0 page-faults # 0.000 K/sec
17,312,623,873 cycles # 3.591 GHz
[100.00%]
5,783,328,106 stalled-cycles-frontend # 33.41% frontend cycles
idle [100.00%]
2,359,944,745 stalled-cycles-backend # 13.63% backend cycles
idle [100.00%]
27,153,618,219 instructions # 1.57 insns per cycle
# 0.21 stalled cycles
per insn [100.00%]
4,263,391,770 branches # 884.227 M/sec
[100.00%]
27,273,889 branch-misses # 0.64% of all branches
5.004692107 seconds time elapsed
(1)这里面ipc=1.57,那么cpi=1/1.57=0.64;
stalled cycles per insn=0.21,就是说比例为0.21/0.64=33% ?
也就是上面的stalled-cycles-frontend # 33.41% frontend cycles idle?
(2)比较而言,
front-end(fetch and decode phases) vs back-end(execute),
哪个个更体现memory stall time to execution time ratio,
或者说应该选用哪个指标来描述? | y**b 发帖数: 10166 | 6 我用intel pcm测试了一下单机工作站上的程序,
1 core 串行,L3HIT ratio 100%,
12 core并行,L3HIT ratio 97%,
这算高吗?
但也不知道pcm怎么测试cpu stall cycles。 |
|