u*******r 发帖数: 2855 | 1 源文件txt大概5G左右,如何把它分割成100-200M的txt文件?自己的电脑读不了很大的
文件,网上找了一下一些都不work。谢谢 |
S******y 发帖数: 1123 | 2 There are several a ways depending on context and your environment
1) use Python to read / process line by line (instead of reading everything
into memory upfront )
2) use Hadoop
3) use Revolutionary R |
u*******r 发帖数: 2855 | 3 谢谢
目前只会R,有没有什么软件能够比较方便的做这个事情?
everything
【在 S******y 的大作中提到】 : There are several a ways depending on context and your environment : 1) use Python to read / process line by line (instead of reading everything : into memory upfront ) : 2) use Hadoop : 3) use Revolutionary R
|
l****i 发帖数: 398 | 4 用data.table下的fread函数。我读过一个5g多的数据,才2:30秒。对data.table的读
取速度比较满意。
system.time(DT <- fread("201403-201406_with_tv_market.csv"))
Read 16221666 rows and 29 (of 29) columns from 5.380 GB file in 00:02:30
user system elapsed
137.17 3.48 149.70 |
g******2 发帖数: 234 | 5 is your system linux? if yes, use "split -b 100m yourfile.txt" |