l*******m 发帖数: 1096 | |
c***z 发帖数: 6348 | |
l*******m 发帖数: 1096 | 3 跑完了,80M intersects 120M 花了100分钟,慢不慢?开了25 parallel
【在 c***z 的大作中提到】 : inner join
|
c***z 发帖数: 6348 | |
l*******m 发帖数: 1096 | 5 100
【在 c***z 的大作中提到】 : cluster size?
|
c***z 发帖数: 6348 | 6 then it is slow
can you post your code here? |
D**u 发帖数: 288 | 7 Have you tried sort merge join? At least it is the fastest with Hive from my
experience.
first sort the sets then do
join A by $1, B by $1 using 'merge'; |
r*****d 发帖数: 346 | 8
【在 l*******m 的大作中提到】 : 什么方法最快?
|
B*A 发帖数: 83 | 9 这要在SQLdatabase里就是几秒钟的事儿
★ 发自iPhone App: ChineseWeb 8.1
【在 l*******m 的大作中提到】 : 跑完了,80M intersects 120M 花了100分钟,慢不慢?开了25 parallel
|
l*******m 发帖数: 1096 | 10 it could be true if the server has large enough ram. in the case, i would
use hashset directly, which is faster
【在 B*A 的大作中提到】 : 这要在SQLdatabase里就是几秒钟的事儿 : : ★ 发自iPhone App: ChineseWeb 8.1
|
B*A 发帖数: 83 | 11 刚才用一个730 MILLION record 的 TABLE (60GB) intersect itself on my Oracle
database
It took 54 seconds.
In most time Big Data does not mean solution for better performance, it
means solution for less expensive software investments.
【在 l*******m 的大作中提到】 : it could be true if the server has large enough ram. in the case, i would : use hashset directly, which is faster
|
l*******m 发帖数: 1096 | 12 my case is the original data set having no duplicates. I am curious of
Qracle performance...
Oracle
【在 B*A 的大作中提到】 : 刚才用一个730 MILLION record 的 TABLE (60GB) intersect itself on my Oracle : database : It took 54 seconds. : In most time Big Data does not mean solution for better performance, it : means solution for less expensive software investments.
|
B*A 发帖数: 83 | 13 No duplicates here either.
【在 l*******m 的大作中提到】 : my case is the original data set having no duplicates. I am curious of : Qracle performance... : : Oracle
|