c***z 发帖数: 6348 | 1 Got asked several times in interviews.
lines = LOAD 'sample.txt' AS (line:chararray);
words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;
grouped = GROUP words BY word;
wordcount = FOREACH grouped GENERATE group, COUNT(words);
DUMP wordcount; |
B*****g 发帖数: 34098 | 2 -- Hive queries for Word Count
drop table if exists doc;
-- 1) create table to load whole file
create table doc(
text string
) row format delimited fields terminated by 'n' stored as textfile;
--2) loads plain text file
--if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc
table)
load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
wikipedia' overwrite into table doc;
-- Trick-1
-- 3) wordCount in single line
SELECT word, COUNT(*) FROM doc LATERAL VIEW explode(split(text, ' ')) lTable
as word GROUP BY word;
【在 c***z 的大作中提到】 : Got asked several times in interviews. : lines = LOAD 'sample.txt' AS (line:chararray); : words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word; : grouped = GROUP words BY word; : wordcount = FOREACH grouped GENERATE group, COUNT(words); : DUMP wordcount;
|
l******n 发帖数: 9344 | 3 现在pig越来越少人用,hive,impala成主流了
【在 B*****g 的大作中提到】 : -- Hive queries for Word Count : drop table if exists doc; : -- 1) create table to load whole file : create table doc( : text string : ) row format delimited fields terminated by 'n' stored as textfile; : --2) loads plain text file : --if file is .csv then in replace 'n' by ',' in step no 1 (creation of doc : table) : load data local inpath '/home/trendwise/Documents/sentiment/doc_data/
|
B*****g 发帖数: 34098 | 4 sql必胜,哈哈
【在 l******n 的大作中提到】 : 现在pig越来越少人用,hive,impala成主流了
|
c***z 发帖数: 6348 | |
c***z 发帖数: 6348 | 6 OK, Scala version:
val countTable = myText.split("\W+").groupBy(identity).mapValues(_.length)
PS: split(" ") would work for interview purpose; also there are two \
before W |