由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
DataSciences版 - Pig word count
相关主题
请问大家有没有直接用java全程写mapreduce的程序的?请问data scientist 相关职务,面试要准备什么?
你们用的都是pig吗?hadoop pig的问题
data scientist对sql要求高吗征集版标
做big data一定要是Ph.d吗?现在的大数据技术的价值和功用有些被夸大了
Pig 问题请教请问如何用JDBC连接R和Hive (转载)
求Hadoop项目练手三星samsung创新部门招大数据工程师 (转载)
讨论,(Big)Data Engineer到底是个什么职位Impala v Hive
请问有没有Pig Hive Hadoop SQL的速成课?big set intersection in pig
相关话题的讨论汇总
话题: word话题: count话题: wordcount话题: group话题: foreach
进入DataSciences版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
Got asked several times in interviews.
lines = LOAD 'sample.txt' AS (line:chararray);
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
wordcount = FOREACH grouped GENERATE group, COUNT(words);
DUMP wordcount;
B*****g
发帖数: 34098
2
-- Hive queries for Word Count
drop table if exists doc;
-- 1) create table to load whole file
create table doc(
text string
) row format delimited fields terminated by 'n' stored as textfile;
--2) loads plain text file
--if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
table)
load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
wikipedia' overwrite into table doc;
-- Trick-1
-- 3) wordCount in single line
SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(text, ' ')) lTable
as word GROUP BY word;

【在 c***z 的大作中提到】
: Got asked several times in interviews.
: lines = LOAD 'sample.txt' AS (line:chararray);
: words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
: grouped = GROUP words BY word;
: wordcount = FOREACH grouped GENERATE group, COUNT(words);
: DUMP wordcount;

l******n
发帖数: 9344
3
现在pig越来越少人用,hive,impala成主流了

【在 B*****g 的大作中提到】
: -- Hive queries for Word Count
: drop table if exists doc;
: -- 1) create table to load whole file
: create table doc(
: text string
: ) row format delimited fields terminated by 'n' stored as textfile;
: --2) loads plain text file
: --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
: table)
: load data local inpath '/home/trendwise/Documents/sentiment/doc_data/

B*****g
发帖数: 34098
4
sql必胜,哈哈

【在 l******n 的大作中提到】
: 现在pig越来越少人用,hive,impala成主流了
c***z
发帖数: 6348
5
damn, I am loving Pig
c***z
发帖数: 6348
6
OK, Scala version:
val countTable = myText.split("\W+").groupBy(identity).mapValues(_.length)
PS: split(" ") would work for interview purpose; also there are two \
before W
1 (共1页)
进入DataSciences版参与讨论
相关主题
big set intersection in pigPig 问题请教
初入data science的困惑求Hadoop项目练手
如何学习Hadoop?讨论,(Big)Data Engineer到底是个什么职位
求助: 一个用Hive提取feature的问题请问有没有Pig Hive Hadoop SQL的速成课?
请问大家有没有直接用java全程写mapreduce的程序的?请问data scientist 相关职务,面试要准备什么?
你们用的都是pig吗?hadoop pig的问题
data scientist对sql要求高吗征集版标
做big data一定要是Ph.d吗?现在的大数据技术的价值和功用有些被夸大了
相关话题的讨论汇总
话题: word话题: count话题: wordcount话题: group话题: foreach