What's the best way to convert text/csv file into PARQUET - DataSciences版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

DataSciences版 - What's the best way to convert text/csv file into PARQUET

相关主题
● How to load csv file converted from excel file into Cloudera Hive or Impala?	● 请教做 data science 的 ICCC
● Impala v Hive	● 讨论，（Big）Data Engineer到底是个什么职位
● 学习Pig Latin	● 求职招聘高薪IT,你想不成功都难
● Re: MapR Technologies continue hiring a lot of positions (转载)	● 不知这样的大数据培训怎样？我想求职Big data Architect......
● 哪里可以免费的练习一下pig/hive/spark的?	● 找DS的工作帮忙分析下
● 大数据这个东西，如果用hive，岂不是跟SQL差不多了	● 克劳迪娅包怎么用啊
● big data software engineer或者data scientist 工作机会推荐 (转载)	● hadoop的经验怎么攒？
● 急，跪求答案 (moving avg using spark dataframe window functions)	● 请教一下如何快速复习/学习DS的核心知识

相关话题的讨论汇总
话题: parquet话题: apache话题: drill话题: convert话题: csv

进入DataSciences版参与讨论

(共1页)

s****h
发帖数: 3979

I have text/csv files and want to upload them into Cloudera cluster, and use
them in Spark.
What's the best way to upload and convert text/csv file into PARQUET format?
Two load, use either file manager in Hue or SFTP?
To convert, I can think of 3 ways:
A.
In HIVE, create external table based on the original file,
then create new external table in PARQUET format ?
B.
In Spark, wse Scala code to convert ? Conversion speed might be a concern.
https://developer.ibm.com/hadoop/blog/2015/12/03/parquet-for-sp
C.
Using Apache Drill? Anyone has installed Apache Drill on CDH before?
Conversion speed would be better. https://www.mapr.com/blog/how-convert-csv-
file-apache-parquet-using-apache-drill
Need install Apache Drill first: https://drill.apache.org/docs/installing-
drill-on-the-cluster/
With Sqoop, it's much easier as we have setting "--as-parquetfile".
Thanks!

c*******n
发帖数: 679

check out spark-csv @
https://github.com/databricks/spark-csv

(共1页)

进入DataSciences版参与讨论

相关主题
● 请教一下如何快速复习/学习DS的核心知识	● 哪里可以免费的练习一下pig/hive/spark的?
● 有没有人想报Cloudera的Data Scientist Certificate的	● 大数据这个东西，如果用hive，岂不是跟SQL差不多了
● 想转行Data Science，求建议	● big data software engineer或者data scientist 工作机会推荐 (转载)
● 请问今年有Big Data的短期training培训吗（美国）？	● 急，跪求答案 (moving avg using spark dataframe window functions)
● How to load csv file converted from excel file into Cloudera Hive or Impala?	● 请教做 data science 的 ICCC
● Impala v Hive	● 讨论，（Big）Data Engineer到底是个什么职位
● 学习Pig Latin	● 求职招聘高薪IT,你想不成功都难
● Re: MapR Technologies continue hiring a lot of positions (转载)	● 不知这样的大数据培训怎样？我想求职Big data Architect......

相关话题的讨论汇总
话题: parquet话题: apache话题: drill话题: convert话题: csv

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天