c*****l 发帖数: 297 | 1 Do someone think it is feasible to sort a matrix (1M rows x 100 columns) for
each row in GPU? We keeping the repeating sorting every day and want to
know whether the performance could be improved to 10X or 20X faster (
Currently we just bought a server with 8 GPU K40). |
l*******m 发帖数: 1096 | 2 please refer https://solarianprogrammer.com/2013/02/04/sorting-data-in-
parallel-cpu-gpu/
In my opinion, cpu should be fast enough for the size if the sort alg and
implementation is correct. CPU-GPU data copy is a big overhead for such a
task
for
【在 c*****l 的大作中提到】 : Do someone think it is feasible to sort a matrix (1M rows x 100 columns) for : each row in GPU? We keeping the repeating sorting every day and want to : know whether the performance could be improved to 10X or 20X faster ( : Currently we just bought a server with 8 GPU K40).
|
y*****0 发帖数: 1189 | 3 没试过,大概想法是这样的。
因为你的columns比较少,所以log2 100才是7都不到,所以复杂度是7*matrix_size。
太小,不适合传到gpu上面。
cpu的cache用好了,直接每行走cache,直接在cpu上并行,应该是最佳选择。
for
【在 c*****l 的大作中提到】 : Do someone think it is feasible to sort a matrix (1M rows x 100 columns) for : each row in GPU? We keeping the repeating sorting every day and want to : know whether the performance could be improved to 10X or 20X faster ( : Currently we just bought a server with 8 GPU K40).
|