由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Programming版 - [Pig Progamming] Pig Latin join problem (转载)
相关主题
寻求技术合伙人怎样返回当前机器所在Domain的server name?
[合集] 2008 ACM Progamming Contest结果请教一个技术问题
讲解services和相关概念的两片小品文[合集] 求问regular expression 的问题
还是成员函数指针,试试这个诡异的东东吧。有知道machine learning, data mining 的同学吗?
请问什么是Sandbox?c++ floating point calculation problem (revised)
Quant Researcher/Data Scientist/Developer 招人,纽约,需要公 (转载)头大得不行,请教c# active directory 问题
zz: Is Angular 2.0 Worth It网上查库存的程序怎么写的? (转载)
菜鸟问一个C++的问题初学者弱弱问一下
相关话题的讨论汇总
话题: pig话题: latin话题: progamming话题: join话题: problem
进入Programming版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
【 以下文字转载自 DataSciences 讨论区 】
发信人: chaoz (面朝大海,吃碗凉皮), 信区: DataSciences
标 题: [Pig Progamming] Pig Latin join problem
发信站: BBS 未名空间站 (Tue Jan 28 20:00:22 2014, 美东)
Hi all,
Just wondering if any of you had the same problem and if you know the cause.
I have a dataset of site-visitor pairs, which records daily visits to
websites.
While filtering the data using domain "espn.com", there are 306 unique
visitors; while joining the data with the list of domain names, I only got
176 unique visitors to "espn.com".
This is weird since conceptually both filtering and joining use hash tables
the same way.
PS: pig didn't drop bags during the join, at least it didn't tell me about
dropping bags.
Thanks a lot!
1 (共1页)
进入Programming版参与讨论
相关主题
初学者弱弱问一下请问什么是Sandbox?
domain nameQuant Researcher/Data Scientist/Developer 招人,纽约,需要公 (转载)
《Working with Microsoft Office 365》英文文字版/EPUB[PDF]zz: Is Angular 2.0 Worth It
[请问]除了CS基础课之外,还需要再学别的domain knowledge么?菜鸟问一个C++的问题
寻求技术合伙人怎样返回当前机器所在Domain的server name?
[合集] 2008 ACM Progamming Contest结果请教一个技术问题
讲解services和相关概念的两片小品文[合集] 求问regular expression 的问题
还是成员函数指针,试试这个诡异的东东吧。有知道machine learning, data mining 的同学吗?
相关话题的讨论汇总
话题: pig话题: latin话题: progamming话题: join话题: problem