Pig word count - DataSciences版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

DataSciences版 - Pig word count

相关主题
● 请问大家有没有直接用java全程写mapreduce的程序的？	● 请问data scientist 相关职务，面试要准备什么?
● 你们用的都是pig吗？	● hadoop pig的问题
● data scientist对sql要求高吗	● 征集版标
● 做big data一定要是Ph.d吗？	● 现在的大数据技术的价值和功用有些被夸大了
● Pig 问题请教	● 请问如何用JDBC连接R和Hive (转载)
● 求Hadoop项目练手	● 三星samsung创新部门招大数据工程师 (转载)
● 讨论，（Big）Data Engineer到底是个什么职位	● Impala v Hive
● 请问有没有Pig Hive Hadoop SQL的速成课？	● big set intersection in pig

相关话题的讨论汇总
话题: word话题: count话题: wordcount话题: group话题: foreach

进入DataSciences版参与讨论

1

(共1页)

c***z 发帖数: 6348	1 Got asked several times in interviews. lines = LOAD 'sample.txt' AS (line:chararray); words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; grouped = GROUP words BY word; wordcount = FOREACH grouped GENERATE group, COUNT(words); DUMP wordcount;
B*****g 发帖数: 34098	2 -- Hive queries for Word Count drop table if exists doc; -- 1) create table to load whole file create table doc( text string ) row format delimited fields terminated by 'n' stored as textfile; --2) loads plain text file --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc table) load data local inpath '/home/trendwise/Documents/sentiment/doc_data/ wikipedia' overwrite into table doc; -- Trick-1 -- 3) wordCount in single line SELECT word, COUNT() FROM doc LATERAL VIEW explode(split(text, ' ')) lTable as word GROUP BY word; 【在 c**z 的大作中提到】 : Got asked several times in interviews. : lines = LOAD 'sample.txt' AS (line:chararray); : words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; : grouped = GROUP words BY word; : wordcount = FOREACH grouped GENERATE group, COUNT(words); : DUMP wordcount;
l******n 发帖数: 9344	3 现在pig越来越少人用，hive，impala成主流了【在 B*****g 的大作中提到】 : -- Hive queries for Word Count : drop table if exists doc; : -- 1) create table to load whole file : create table doc( : text string : ) row format delimited fields terminated by 'n' stored as textfile; : --2) loads plain text file : --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc : table) : load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
B*****g 发帖数: 34098	4 sql必胜，哈哈【在 l******n 的大作中提到】 : 现在pig越来越少人用，hive，impala成主流了
c***z 发帖数: 6348	5 damn, I am loving Pig
c***z 发帖数: 6348	6 OK, Scala version: val countTable = myText.split("\W+").groupBy(identity).mapValues(_.length) PS: split(" ") would work for interview purpose; also there are two \ before W

1

(共1页)

进入DataSciences版参与讨论

相关主题
● big set intersection in pig	● Pig 问题请教
● 初入data science的困惑	● 求Hadoop项目练手
● 如何学习Hadoop?	● 讨论，（Big）Data Engineer到底是个什么职位
● 求助：一个用Hive提取feature的问题	● 请问有没有Pig Hive Hadoop SQL的速成课？
● 请问大家有没有直接用java全程写mapreduce的程序的？	● 请问data scientist 相关职务，面试要准备什么?
● 你们用的都是pig吗？	● hadoop pig的问题
● data scientist对sql要求高吗	● 征集版标
● 做big data一定要是Ph.d吗？	● 现在的大数据技术的价值和功用有些被夸大了

相关话题的讨论汇总
话题: word话题: count话题: wordcount话题: group话题: foreach

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)