l*********i 发帖数: 483 | 1 【 以下文字转载自 Linux 讨论区 】
发信人: lamborghini (Murcielago), 信区: Linux
标 题: gcc里面的-ffast-math
发信站: BBS 未名空间站 (Mon Jun 18 18:49:23 2007), 转信
gcc 4.1.2, Ubuntu 7.04 32bit(kernel 2.6.20-16-generic), Core Duo T2500.
man里说不能和-O一起用,偶想知道一下具体是什么原因呢?偶有个monte-carlo
的code,随机的抽取了一些初始值,-O3和-ffast-math一起用的话运行时间只有单独
用-O3或者-fast-math的10-20%左右,具体的结果看来用-O3 -ffast-math数值上
大约差了1e-7%(对偶的计算来说可以忽略不计了),但是不知道是不是普遍的情况
下是这样?究竟什么情况下-O和-ffast-math一起用会导致比较大的误差呢?另外
这个具体的是什么决定的?硬件?OS? |
|
y***d 发帖数: 2330 | 2 Don't forget -ffast-math...
java allsum=1.8658666E16
real 0m9.719s
g++ -O3 test.cpp
c++ allsum=1.86587e+16
real 0m9.116s
g++ -O3 -ffast-math test.cpp
c++ allsum=1.86587e+16
real 0m6.029s
g++ -O3 -march=native -mtune=native -ffast-math test.cpp
c++ allsum=1.86587e+16
real 0m4.888s
g++ -O3 -march=native -mtune=native -ffast-math test.cpp -funsafe-math-
optimizations -funroll-loops -fprefetch-loop-arrays
c++ allsum=1.86587e+16
real 0m4.235s |
|
x*****u 发帖数: 3419 | 3 接上次。。
今天在windows下运行了一下我的程序,并和在colinux下运行做了个粗糙的比较。
同一台机器
Linux colinux 2.6.8.1-co-0.6.2-pre1 #21 Fri Sep 10 17:03:21 IDT 2004 i686
Intel(R) Pentium(R) 4 CPU 2.40GHz GenuineIntel GNU/Linux
同样的编译器
windows: icc version 8.0
colinux: icc version 8.0
compiling options: -O2
同一个程序
不同的运行时间(重复多次运行)
windows: the cpu time: 18.97s
colinux: the cpu time: 22.56s
/附
colinux: gcc version 3.4.2 (Gentoo Linux 3.4.2-r2, ssp-3.4.1-1, pie-8.7.6.5)
compiling options: g++ -c -O2 -fomit-frame-pointer -pipe -ffast-math |
|
k*****l 发帖数: 177 | 4 我用的是
g++ -g -O3 -Wall -Wno-sign-compare -Wno-deprecated -fomit-frame-pointer -
ffast-math |
|
t****t 发帖数: 6806 | 5 No, -O3 will not do that. -Ofast will do that (which implies -ffast-math).
SSE is different from x87, but whether SSE or x87 does not depend on -O
option. |
|
t****t 发帖数: 6806 | 6 我不是专业搞数值计算的, 但是O3理论上应该结果跟O2一模一样. 至少gcc是这样. 数
学的sse和x87确实结果会不一样, 但那是-fpmath=sse和-fpmath=x87的结果. -ffast-
math会使用不合标准的数学计算, 但是O3并不会激活这个开关.
至于O3出错, 99%是自己写得不对. 写数值计算的人往往不太注意C/C++的规则, 也很正
常. |
|
b******n 发帖数: 592 | 7 You are very right. -O3 doesn't turn on -ffast-math. In my experience, -O3
doesn't always give the speed improvement you'd expected. In some cases, it
can be
slower than -O2. |
|
t****t 发帖数: 6806 | 8 你不是有code了么, 我就随便跑一跑. by specifying too much detail on
implementation, you actually INTERFERE with compiler optimization.
$ gcc -O3 -fprefetch-loop-arrays -march=native -funroll-loops -ffast-math 21
.c
$ a.out 20000
naive add: 0.920000 second; m[10000][10000]=2069822843
better add: 0.930000 second; m[10000][10000]=2069822843
sse2 add: 0.990000 second; m[10000][10000]=2069822843 |
|
d****n 发帖数: 1637 | 9 Sure I know you are not gcc developer.
here is my result
$ gcc -O3 -fprefetch-loop-arrays -funroll-loops -ffast-math matrix_op.c
-lm
$ ./a.out 20000
naive add: 2.200000 second; m[10000][10000]=1836335890
better add: 2.200000 second; m[10000][10000]=1836335890
sse2 add: 2.280000 second; m[10000][10000]=1836335890
Be honest, I never play with these fancy flags using gcc.
and I think the flags step on the mine.
##############man page for gcc####################
-fprefetch-loop-arrays
... 阅读全帖 |
|
k***i 发帖数: 662 | 10
Things may not be that horrible as you think. Technologies are devoloping very ffast.
Research and progress on hydrate production are on-going rapidly. You may better
update your estimation in near future.
It's hard to predict future technology development. I remember one saying: if some scientist says "something is possible in next 50 years", he may be correct; if someone says "something is impossible in next 50 years", he is possibly wrong. 20 years ago, a super computer requires a large room |
|
|