第10页 - 关于queries的讨论汇总 - 话题女王

全部话题 - 话题: queries

L*****a
发帖数: 3080

来自主题: SanFrancisco版 - 女生，data analyst，湾区，想转行码工，诚求建议！

我在银行做过IT Consultant，感觉如果是data analyst的lead，职能应该很接近
Business Analyst了吧，不是面向IT的BA，是那种精通银行业务会modeling和charting
大量Financial data的BA，Excel用得很转。我觉得这类工作是吃行业专业饭的，跟Sql
queries和其他coding 技能没什么太大关系，当然有这些技能可以在简历上锦上添花
。他们要批量数据，没有数据访问权限的话可以打一个电话向IT要。有些公司IT开发了
一些Query工具，这些BA可以用工具自己drag and drop ad hoc query。我的意思是sql
query不是BA essential的技能。另一方面，我觉得sql query真的是稀松平常，只要
上过一些课或自学又有数据访问权限的话，什么人都可以用啊，我现在的公司sales，
supply chain和accounting等等部门都有人会sql，当然如果logic比较复杂涉及的
table又很多的话，这些部门还是要找IT帮忙的。

t******5
发帖数: 12

来自主题: SanFrancisco版 - San Mateo 9.8成新IKEA家具 moving sale低价甩卖！

大家号，所有东西都是我9月刚在IKEA购买的，才使用了3个月。真的非常新！
因为个人原因现在要着急搬回国，所以东西全部要挥泪甩卖，比原价便宜至少50%！
位置在San Mateo,欢迎有意者上门挑选。
联系方式：
Jenny 740 591 8254
EMAIL: j****[email protected]
QUEEN SIZE 床架＋床板＋席梦思 $200 (Original $589)
http://www.ikea.com/us/en/catalog/products/60206890/?query=6020
http://www.ikea.com/us/en/catalog/products/S99904900/?query=NYV
KARLSTAD 绿色长沙发 $225 (Original $550)
Width: 80 3/4 " ＊ Depth: 36 5/8 " ＊ Height: 31 1/2 "
Seat depth: 22 " ＊Seat height: 17 3/4 "
http://www.ikea.com/us/en/catalog/products/S09840535/... 阅读全帖

m********u
发帖数: 3942

来自主题: SanFrancisco版 - 【工作机会】Principal Big Data Platform Engineer -- CA (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: missingyou (miss), 信区: JobHunting
标题: 【工作机会】Principal Big Data Platform Engineer -- CA
发信站: BBS 未名空间站 (Wed Nov 9 13:38:11 2016, 美东)
公司为全球领先的信息与通信技术(ICT)解决方案供应商，薪资丰厚，工作地点Santa
Clara，CA
有兴趣的同学可以站内信或者发邮件：[email protected]/* */
Responsibilities:
• Design a comprehensive metadata support for the platform that
supports various data governance and security capabilities.
• Enhance the distributed query engine for various use cases,
especially in... 阅读全帖

d********f
发帖数: 43471

来自主题: Joke版 - 更多美国秋裤展览

Climatesmart Men's Seamed pants - $6.20 http://www.bonton.com/shop/men/underwear/thermal-underwear/climatesmart-men-s-seamed-pants_414671.html?query=climatesmart
Climatesmart Men's Midweight Long Pants - $6.20 http://www.bonton.com/shop/men/underwear/thermal-underwear/climatesmart-men-s-midweight-long-pants_330970.html?query=climatesmart
Climatesmart Men's Thermal Long Pants - $5.40 http://www.bonton.com/shop/men/underwear/thermal-underwear/climatesmart-men-s-thermal-long-pants_330959.html?query... 阅读全帖

a***y
发帖数: 19743

来自主题: Apple版 - [合集] 蠢人是无法教育的

☆─────────────────────────────────────☆
Corinthian (Diogenes门下一走狗) 于 (Wed May 5 17:04:59 2010, 美东) 提到:
1998年度 64mb的mp3直接用目录管理就好了，没人回说啥
2008年mp3 player已经存储几千首歌甚至上万，还总希望用目录方式管理，这已经不是
行为艺术了，这是iq低于75的直接表现
这跟几十万人的大公司死活要纸笔记账是一个德行
你可以说这是个人选择，没错，你非要吃屎的确是你的个人选择，但你不能劝别人一起
吃，更不能为了自己方便让坐便器厂商更改设计使你更容易的趴着吃屎
☆─────────────────────────────────────☆
meeweek (Meeweek 米粥) 于 (Wed May 5 17:14:40 2010, 美东) 提到:
强力插入。
☆─────────────────────────────────────☆
shadandan (我比小强惨) 于 (Wed May 5 17:15:26 2010,... 阅读全帖

c*****t
发帖数: 1879

来自主题: Database版 - Question 2: distributed database question?

I am writing a P2P application which involves database query
of peers' data.
Say, P1 (peer 1) has following rows in table "serve"
abc1
abc2
ddd1
...
...
hundreds if not thousands of entries.
Peer 2 (P2) perform a broadcast query to all peers in a group
and expect to obtain the queries. I used a dumb query like:
SELECT * FROM serve
and P1 would return the entire table to P2. However, later
P2 does the query again, since P2 is interested in updates

m*****y
发帖数: 229

来自主题: Database版 - 每天苦着脸的DBA把我郁闷到了

每天跟人说话跟谁欠他钱了一样拉着八尺半脸的貌似越南裔的dba，总是提醒我们不要
在production运行expensive query。可我们工作需要阿，我们有task要完成阿。昨天
我在A table运行之前还去问了他那个时间运行可不可以，他说除了B, C,D都可以，所
以我就运行了。结果他给我发instant msg，让我把query发给他，说我的语句bad
structure，他要给我优化。我说好阿，我就把query发给他了，他给我发过来改了
where的语句，我就开始run他语句，结果没半秒钟，他就走过来喊了一句'stop',就一
个字’stop’，然后扭头就走了。他nnd, 我run的是他的query阿。
郁闷归郁闷，今天开始好好看看怎么优化query。在请各位大牛讲讲怎么避免在
production 运行东西，用data warehouse？（不是给公司问的，新来的俩dba给他提建
议说优化什么什么东西他都当没听见）

F****n
发帖数: 3271

来自主题: Database版 - 为啥RDBMS只用一个Index? (转载)

【以下文字转载自 Java 讨论区】
发信人: Foxman (今狐冲), 信区: Java
标题: 为啥RDBMS只用一个Index?
发信站: BBS 未名空间站 (Sat Mar 28 15:46:49 2015, 美东)
不久前有个项目要比较Lucene和主要RDBMS的search performance。因为老板想知道如
果把RDBMS当成NoSQL（就是Table Design的时候坚持denormalize) 用效果如何。经过
一段时间研究,
发现在其他条件不变的情况下单个column的search二者差不多，但多个fields/columns
的 query
RDBMS要慢的多 (e.g., select * from users where last_name='xxx' and email='
yyy")
进一步我发现RDBMS对一个Table在一个Query中居然一直只使用一个Index, 即使所有的
Column都有Index！一开始以为是Query Planner根据selectivity的选择，但后来发现
根本不是这样：
1. 几乎所有的Quer... 阅读全帖

c******n
发帖数: 4965

来自主题: Java版 - EHCache --- hibernate question

I enabled ehcache as second level cache,
from TRACE level logging, I can see that hibernate/ehcache did try to place
query result/load result into cache, but every time I query with the same
parameter (i.e. there is only one uniq query), so it says
"caching xxx ..."
"xxx is already cached",
this is fine.
but it seems that after trying to read from the cache for only 1 time, it
never tries to use the cache again.
I can see on the first read, it logs:
2011-12-20 15:29:53,344 95419 INFO [org.hiber... 阅读全帖

g*****g
发帖数: 34805

来自主题: Java版 - 问Zhaoce个问题

There are a couple of approaches you can consider.
1. Form a cluster for your RDMBS, have a readonly node and let your queries
go through the readonly node. You offload the load from the main DB
immediately and your long queries ain't contending for the resource.
2. Cache the join result if some queries/parameters are being used in high
frequency.
3. If queries are ad-hoc, use SOLR or Elastic Search for your queries.

g*****g
发帖数: 34805

来自主题: Java版 - 问Zhaoce个问题

F****n
发帖数: 3271

来自主题: Java版 - 为啥RDBMS只用一个Index?

不久前有个项目要比较Lucene和主要RDBMS的search performance。因为老板想知道如
果把RDBMS当成NoSQL（就是Table Design的时候坚持denormalize) 用效果如何。经过
一段时间研究,
发现在其他条件不变的情况下单个column的search二者差不多，但多个fields/columns
的 query
RDBMS要慢的多 (e.g., select * from users where last_name='xxx' and email='
yyy")
进一步我发现RDBMS对一个Table在一个Query中居然一直只使用一个Index, 即使所有的
Column都有Index！一开始以为是Query Planner根据selectivity的选择，但后来发现
根本不是这样：
1. 几乎所有的Queries, Query Planner都只会选一个Index，根本没有啥优化可言--像
上面那种简单的例子，即使用第二个Index可以提高几千倍的速度，也不会用。而
Lucene总是使用所有的Index然后sort-and-merge。Pe... 阅读全帖

p*****2
发帖数: 21240

来自主题: Programming版 - 春运这个东西，用Storm就可以轻松搞定了

Storm能handle多大的数据量就不用解释了吧？不服气的去看看Twitter的流量。
Storm怎么工作的也很简单，不熟悉的看看Nathan的video去。
Storm支持stream和RPC，这个系统可以利用Storm的RPC。接口也很简单，query和book
query和book从spout进去，然后可以根据车次的hash发送到不同的bolt上去。也就是说
同一个车次的query和book会发送到同一个bolt上去。bolt里面保存车次的信息，in
memory。query不用说了，直接返回车次的信息即可。关键是book。book的时候bolt
mark一下，先到先得，后到返回失败。然后把transaction发到下一层bolt中去。下一
个bolt去bank charge，成功就ack这个请求，失败就fail请求，上一级的bolt unmark
，并且返回失败回去。
topology 就是三层
第一层spout，接受query和book
中间一层bolt保存列车信息， respond查询
最后一层管transaction的
log保存在cassandra里。

L***s
发帖数: 1148

来自主题: Programming版 - python一问，怎么实现这个函数

就简单朴素地地暴力解好了，白开水一样的代码，确保正确
efficiency等真正需要时再考虑
In [15]: def query (df, q):
...: result = []
...: for tup, string in df.iteritems():
...: for i, numset in q.iteritems():
...: if tup[i] not in numset:
...: break
...: else: # i.e., no break
...: result.append(string)
...: return result
...:
In [16]: df = {(1,2,3):'a', (2,3,4):'b', (1,3,4):'c', (2,4,3):'d'}
In [20]: query(df, { 0:{1}, 1:{2}, })
O... 阅读全帖

s*****t
发帖数: 119

来自主题: Programming版 - 问一道HIVE题关于Efficiency

下面两个HIVE query做同一件事情，请问那个query更efficient？假设在Map Reduce的
frame work 下
Query 1:
select id, count(distinct value) values
from table1
group by id;
Query 2:
select a.id, sum(1) values
from
(select distinct id, value
from table1
)a
group by a.id
;
另外请有哪本书或视频讲HIVE的query efficiency吗

q******n
发帖数: 66

来自主题: Programming版 - 求助: node.js memory leak

如果我comment out line 88, 就没问题。各为大牛， handle_database有什么问题吗？
{code}
1 var express = require('express');
2 var app = express();
3 var mysql = require('mysql');
4 var pool = mysql.createPool({
5 connectionLimit : 100, //important
6 host : 'localhost',
7 user : 'root',
8 password : '',
9 database : 'callback',
10 debug : false
11 });
12
13 function handle_database(req,res) {
1... 阅读全帖

c*********u
发帖数: 607

来自主题: Statistics版 - 求问一个关于SQL的exist的问题

code如下：
create table nt (x int, y int);
insert into nt values (10, 10);
insert into nt values (10, 20);
insert into nt values (20, 10);
insert into nt values (30, 40);
insert into nt values (30, 50);
insert into nt values (30, 60);
insert into nt values (40, 70);
select * FROM nt WHERE exists
(SELECT t.* FROM nt t WHERE nt.x = t.x AND nt.y > t.y) ;
select * FROM nt WHERE exists
(SELECT nt.* FROM nt t, nt nt WHERE nt.x = t.x AND nt.y > t.y) ;
网上跑SQL的结果的链接在这里：
http://ideone.com/3CtqN9
第一个query的结果是... 阅读全帖

a*****c
发帖数: 2086

来自主题: Military版 - 双十一又要到了，让我们看看这次淘宝在大规模数据上表现如何

有些人啊，没有真正的去做过一个project，没有深入去参与其中的开发，就在那里想
当然的发表评论。真正做下去了才知道会碰到多少问题需要去考虑去解决的。一个短时
间内承受巨大交易量，要做到不能当机，让用户不感到速度变慢，购物历史和更新记录
不能冲突，背后采用怎样技术才能支撑，我还是贴个淘宝技术的科普吧。
【编者按】对于淘宝网而言，2012年的“双十一”是一个交易里程碑，是一个购物狂欢
日，在这个“神棍节”里，淘宝创下191亿元的交易额，在交易的背后隐藏着哪些复杂
技术？
你发现快要过年了，于是想给你的女朋友买一件毛衣，你打开了www.taobao.com。这时
你的浏览器首先查询DNS服务器，将www.taobao.com转换成ip地址。不过首先你会发现
，你在不同的地区或者不同的网络（电信、联通、移动）的情况下，转换后的IP地址很
可能是不一样的，这首先涉及到负载均衡的第一步，通过DNS解析域名时将你的访问分
配到不同的入口，同时尽可能保证你所访问的入口是所有入口中可能较快的一个 (这和
后文的CDN不一样)。
你通过这个入口成功的访问了www.taobao.com的实际的入口IP... 阅读全帖

c*****e
发帖数: 215

来自主题: USANews版 - 建议linkedin上联系各州voting数据库管理员防作弊最后一关

感谢Cheesecat (Cheese) 提供的链接：
https://action.trump2016.com/survey/trump-vs-hillary-approval-poll/?utm_
medium=email&
utm_campaign=ELC_elections-2016_trump-vs-hillary-approval-poll&utm_content=
101316-
tracking-poll-7-inh-jfc-a-a-hf-e&utm_source=e_a-a
已经提交了以下意见：
*****Last chance to prevent fraud voting*****
All voting data at each voting location in each state will be
collected together in a centralized database. If we can ensure the voting
database administrators, DBAs, database architects, ... 阅读全帖

c*****e
发帖数: 215

来自主题: USANews版 - Trump 出survey 了，请大家多给他建议

已经提交了以下意见：
*****Last chance to prevent fraud voting*****
All voting data at each voting location in each state will be
collected together in a centralized database. If we can ensure the voting
database administrators, DBAs, database architects, or database operators to
validate the entire database and query out any invalid votes, then all the
cheating efforts will be wasted.
For example, if someone uses a dead person to register for vote, as
long as the voting database is linked to th... 阅读全帖

c******n
发帖数: 5697

来自主题: Automobile版 - 20万miles以上的轿车美国车多还是日本车多？

大数据来说话：
https://losangeles.craigslist.org/search/cta?query=toyota&min_auto_miles=
200000&auto_bodytype=8
https://losangeles.craigslist.org/search/cta?query=ford&min_auto_miles=
200000&auto_bodytype=8
https://losangeles.craigslist.org/search/cta?query=honda&min_auto_miles=
200000&auto_bodytype=8
https://losangeles.craigslist.org/search/cta?query=chevrolet&min_auto_miles=
200000&auto_bodytype=8
150个丰田，102个本田，18个雪佛兰，6个福特

b***e
发帖数: 39

来自主题: JobHunting版 - Job Openings: Data Warehouse Architect/ETL/Business Objects Architect/Developer

Our company is a global fortune 100 located in Midwest having several IT
positions available. These positions are not entry-level positions. Please
email me with your resume or if you have any questions at dyadmstaffing@
gmail.com if you are interested in any of them. These are exempt full time
positions with competitive pay. We will offer relocation,a complete benefit
package, including 401K/ESOP, pension, health, life and dental insurance.
Data Warehouse Architect
We are looking for a hard cor... 阅读全帖

m******p
发帖数: 5393

来自主题: JobHunting版 - 有包子，花街的一道题，请指教

一学电路的半路出家找coding的，结果遇到这家花街一公司二面，上去就是写卷子，十页，每页一
道题
这道似乎见过，但翻了翻CLRS无解又due了只好网上submit了，可是心有不甘，估计就会毁在这道
题了
大家都是牛人，有耐心的请指教一下吧，发包子
Given a communication network, n nodes are linked to each other by
wireless links (meaning two nodes that are within some distance, d0 of
each other can communicate with each other thus forming a link).
A centralized controller wishes to learn the quality of all the links in
the network. This can be done by querying any node or any set of nodes
and the corresponding nodes ... 阅读全帖

n******n
发帖数: 567

来自主题: JobHunting版 - Yelp面经+题目讨论

class Tree{
Node head;
class Node{
urlPair[k] topKlist;
Time start;
Time end;
Time pivot
Node left;
Node right;
public urlPair[] query(Time qstart, Time qend){
if(qstart <= start && qend >= end)
return this.topKlist;
urlPair[] left = new urlPair[k];
urlPair[] right = new urlPair[k];
if(qstart < pivot)
leftResult = left.query(qstart, pivot);
if(qend > pivot)
rightResult = right.query(pivot, qend);
return merge(leftResult, rightResult);
}
public void add(Time start, Time end, urlPair[] in){
urlPai... 阅读全帖

K*********n
发帖数: 2852

来自主题: JobHunting版 - Yelp电面面经+求问

刚跟Yelp的小帅哥Skype过了，半个多小时就匆匆结束了。我是master in CS，申的是S
DE New Grad。
先简要介绍他自己，然后问我做过的project，我说了俩，用了十分钟。
然后问了三个问题：
1. 当你在浏览器输入地址然后敲回车之后，一直到你看到想要的内容，这期间都发生了
什么，描述了一下。
这方面我不是专家，我就high-level说了，地址嘛，被翻译成IP，定向到网站的服务器
，然后地址后面的suffix用来query服务器，得到内容传回来，比如是个html文件，那么
浏览器就就parse它，显示出来。超级超级业余的回答啊。他说好，看起来你挺了解的。
2. 比如在Yelp网页上，有一块区域是当地你关心过的商业POI(Point of Interest)的更
新，或者朋友的推荐的更新，如果你要手动刷新这一快更新，结果反应是很迟钝，很慢
，你会在服务器端寻找什么问题？
我说，可能不同的地域和不同类型的POI分别存在不同的机器上，跨机器的query会很慢
，因为POI在机器上的组织形式可能不符合我query的要求。他说忽略这个，假如所有数
据都在一个大datab... 阅读全帖

l*****a
发帖数: 14598

来自主题: JobHunting版 - 问道题，谁给个效率高点的解法

HashMap 存query array value,query array index
然后每次循环初始currentIndex=-1;意思是准备先找query array的第0个
if(!map.containsKey(input[index])) {end++;}
if(map.containsKey(input[index]) {
if(map.get(input[index])==currentIndex）{end++;}
else if (map.get(input[index])==currentIndex+1){end++; currentIndex++;}
else {break;}
}
找到currentIndex==Query array size 算是找到一组
###第一个的出现次数还得计数。。

l*****a
发帖数: 14598

来自主题: JobHunting版 - 问道题，谁给个效率高点的解法

c**m
发帖数: 535

来自主题: JobHunting版 - g电面，新鲜面经

1. Find 1000 popular URLs in a log.
对于这种log里找popular URL的题目，首先肯定是要用hash去保存每一个URL的
frequency。然后对于返回top k：option 1， sort， O(nlogn)；option2，用一个
size为k的min_heap，O（nlogk）。
Follow up，如果log存在多个machine里，那么肯定是要merge result了。这个时候每
个machine如果只是存一个size为k的heap显然是不够的了。所以我们可以对于每个
machine：opt 1，sort the URL frequency；opt 2， use a min_heap to store all
URL frequency。这里个人感觉opt 1好一些。
2. Return a query based on the occurrence from a big table。
首先这个很大的table，也就是一个很大file，然后不能完全放入main memory里面，是
吧？那个这个其实跟“从一个大文件里随机取出一行... 阅读全帖

j******s
发帖数: 48

来自主题: JobHunting版 - 最近面的两道题，求解答

好吧，自己回答一下思路，虚心求各路大神拍，给点意见
第一题应该是建立一个binary tree,然后recursive求每一个节点的左右子树的最高高
度,他们和就是在这一位上的largest distance，可以online做，但是需要在每个节点
保存左右子树的高度并且动态更新。
o(n) time + O(n) space
第二题,
Assumption:
1 Log file is typically very small, operations are append, delete and read.
2 Cluster is built on Hadoop, which has a chunk size of 64MB, roughly, and
it might be too large and inefficient if we use this chunk size for the log
file.
3 Log information is typically not very important, less effort is needed for
redundan... 阅读全帖

b*****n
发帖数: 618

来自主题: JobHunting版 - RF 面经

姑且称为RF吧
申请的是fresh grad职位，2月底第一次跟hr联系到这个周拿到offer，中间经历了
online code test，onsite和一次电面。
好像不少人对他家的code test比较感兴趣，4个小时两道题，每个人遇到的题目可能不
一样，
第一题很简单，主要考察code质量，第二题稍微难一点，每个题目的要求都很详细要仔
细看，还有详细的提示也要注意。
我遇到的题：
1. 一个矩阵，从指定格子向右发射激光，每个格子有以下几种可能：激光直接穿过，
或者改变激光方向（4个方向）
问激光射出矩阵之前一共经过了多少格子，如果死循环了就输出－1
2. 一堆racer，每个racer有出发时间和到达时间，计算每个racer的score，规则如下
：score ＝所有出发比自己晚但是到达比自己早的racer数量之和，（所有的出发时间
和到达时间没有重复的）要求时间复杂度 code test过了之后我直接就安排onsite了，onsite本来安排6个人但实际上只面了5个
，题目如下：
1. 两个不一样长度的sorted array，求median。
leetcod... 阅读全帖

b*****n
发帖数: 618

来自主题: JobHunting版 - RF 面经

f*********1
发帖数: 75

来自主题: JobHunting版 - rocket fuel 面试题

大家看这个行不行？
由2^k 得到启发
建一个大表每行表示一个ad，每列表示frequent query string 的一个词，表的值表
示单词是否出现在某一个ad里。
wordcount new... york ...department ...store ... sale.
ad1 5 1 1 1 1 1
ad2 1 0 1 0 0 1
ad3 4 1 1 1 1 0
...
adn 1 1 0 0 0 0
建表需要O(N)
查询需要先生成query string的所有subsets，这一步需要2^k, 然后与(&)query
string match对应列vector，选与值为1的ads。最后再用第一列的word cou... 阅读全帖

f*********1
发帖数: 75

来自主题: JobHunting版 - rocket fuel 面试题

s*******r
发帖数: 2697

来自主题: JobHunting版 - 发几个面经(5) Groupon 电面+onsite

尽管第一个onsite是twitter给的
groupon却是我第一个去onsite的公司过程也是所有面试的公司中最漫长的
面完groupon心态彻底变平和了再面任何其他公司也都不会再觉得折腾了
总共面了 10个人 onsite前两轮电话+ onsite 5个人 + onsite后3轮电话
面试的过程中要去的组因为内部人re-org被强塞了几个人 offer自然也没了
电面
p1 主要面data mining，毕竟宽泛，考察到了
1) measures of classification
2) boundary decision for classification
3) Feature selection
4)Entrophy,TF,IDF
5) coding 给定query 打印出所有match的combination
// Query = dress for less
// Expansion: "dress:[es, ed, ing] for less:(cheap, deal)"
/*
dress for less
dress for cheap
d... 阅读全帖

f******h
发帖数: 45

来自主题: JobHunting版 - G家面经总结，顺便求个bless，和一起找工作的同学们共勉

也找工作了一段时间了，从版上学了很多，上周G家面完了，求个bless。
之前的一些都挂了，还在继续找其他的。等定下来之后一定发面经回报本版。
谢谢大家啦！！
1. http://www.mitbbs.com/article_t/JobHunting/32005597.html
1) Implement a simple calculator (+,-,*,/);
2) Implement "+1" for a large integer;
3) How to match Ads to users;
4) How to extract useful information from a forum webpage (list all
kinds of useful signal you can think of)
5) How to detect the duplicate HTML pages (large scale);
6) Find all the paths between two places on Google map;
7)... 阅读全帖

h*********7
发帖数: 169

来自主题: JobHunting版 - 发个FB电面SQL题目攒个人品希望H1B抽中

刚电面完FB的Data team职位，SQL题目一共4题。头三题都挺简单，题目如下:
1. Given an EMPLOYEE table and a DEPARTMENT table, write a query to return
the list of Departments for which the total employee salary > $1m
2. Given an EMPLOYEE table, write a query that returns the employee(s) with
the 2nd highest salary. There may be >1 employee with the top salary, >1
employee with second highest, and so on.
3. Given a table fruit_counts that has these three columns : DATE, FRUIT,
NUM, write a query that gives me the difference of ... 阅读全帖

C*********o
发帖数: 7

来自主题: JobHunting版 - 求问一道面试题

向大家请教一道题目：
Given a list of one million score> pairs where names are valid Java variable
names, write two programs and try to optimize their
efficiency:
1. A Construction Program that produces an index
structure D.
2. A Query Server Program that reads in serialized D
and then accepts user queries such that for each
query s, it responds with the top 10 names (ranked
by score) that start with s or contains ‘_s’ (so for
example, both “revenue” and “yearly_revenue”
match the prefix ... 阅读全帖

x***7
发帖数: 11

来自主题: JobHunting版 - 问一道G家热题

T_T打了很多字回复的时候忘记密码了。。。。然后没了。。。
线段树嘛大概这样，我们建立一个[1..n]这个区间的线段树，每个叶子节点标记为1，
其他节点的值为这个节点下面有多少个为1的叶子节点。
【查找k大】
看左子树有多少个为1的节点，如果大于等于k，那么就在左子树找。如果不到k，那么
就在右子树找k-左子树为1的叶子节点个数。
当你找到相应的叶子节点，那么他表示的区间[l,r](l == r)，l或者r就是我们要找的[
1..n]里面的第k个数啦。
【删除】
就是把那个叶子节点标记为0，其他包含这个节点的区间当然就是num--
代码上面有人也回复了的，大概差不多。
-----
我自己写了个，测了几个简单的数据，不保证是对的。
struct TreeNode {
TreeNode *left, *right;
int val;
int l, r;
TreeNode (int _l, int _r) : l(_l), r(_r), left(nullptr), right(nullptr) {
}
TreeNode (int _val,... 阅读全帖

s****a
发帖数: 794

来自主题: JobHunting版 - G面经里这个怎么做

这个要问面试官 update和query哪个频繁
updata频繁的话就直接暴力加和。update(O(1)) query(O(N^2))
query频繁用到左上角的举行加和 update(O(N^2)) query(O(1))
如果差不多就应该建树吧

f********e
发帖数: 100

来自主题: JobHunting版 - G家面经求指点--beanbun--G--dictionary

版上最新的：
给一个dictionary，然后可以support的query是，给一个string，返回在
dictionary里面包含给定string的所有character的最短的string
我能想到的是把query变成sorted hash table. dictionary的words也都变成sorted
hash table, 放在象trie一样的结构里。这样query来时可以filter字典里哪些词会
contain query.然后就得把这些candidates 都把min window算一遍。
但这样的话如果candidates多了，很贵。。。

y****s
发帖数: 46

来自主题: JobHunting版 - Job opportunities at PVH in NJ

PVH has the following job openings.
1. Business Systems Analyst-Commercial IT-Store Systems
2. Testing Manager
3. Senior Developer-Middleware
for more info, please check www.pvh.com site.
Business Systems Analyst-
Commercial IT-Store Systems
More information about this job:
Overview:
PVH Corp. is a global, action-oriented company characterized by achievement
and commitment. We
want people who are hungry for both professional and personal growth; who
will help us take our
brands, our businesses a... 阅读全帖

发帖数: 1

来自主题: JobHunting版 - G一个新题

我和楼主一样有一道类似题目，不过并不是计算给定区间，而是计算sum (0,0) -> (x,
y) 内所有点的value，而且对方没让我写代码，只是一个open ended discussion:
1. insertion > query
2. query > insertion
3. insertion = query
segmented tree现在看到才有点印象，当时一点也没想起来，我第一步就是直接按X排
序，然后按Y排序，然后可以query出来点，进行sum；后来多想了一个按照每个点预计
算总和，但没想到二分或者四分，对方也没有多问下去。。
后来又问了一道关于string的coding，给定string和一个整数m，计算最长的substring
里面包含最多m个字母：比如aabccdd, m=2 => ccdd
还是复习不到位，做题不够啊。。

o*q
发帖数: 630

来自主题: JobHunting版 - 请教leetcode高频题是哪些题

# Title Editorial Acceptance Difficulty Frequency
1
Two Sum 28.3% Easy
292
Nim Game 54.4% Easy
344
Reverse String 57.3% Easy
136
Single Number 52.2% Easy
2
Add Two Numbers 25.6% Medium
371
Sum of Two Integers 51.6% Easy
4
Median of Two Sorted Arrays
20.4% Hard
6
ZigZag Conversion 25.6% Easy
13
Roman to Integer 42.7% Easy
237
... 阅读全帖

y*********e
发帖数: 518

来自主题: JobHunting版 - 非常常见的面试题：数据太多，用MySQL查询太慢该怎么办？

首先explain下query，看下execution plan。看index有没有被用到。没有被用到，为
什么，改写query。有用到还是慢，index是不是corrupt了，重建index。Query返回多
少数据？返回数据量大的话，nonclustered index performance很有影响的，考虑
clustered index。table要不
要做partition？要不要把mysal server partition(比如把数据partition成100份，
存到100个不同的mysql server上，然后query做成100个mapper这样提速？）
还有，服务器CPU是不是100%了，看一下。还有，具体慢在哪里，是在数据库查询上，
还是在业务逻辑层上？看日志。需要的话profile一下。是一个服务器慢，还是多个慢
？是突然间变慢？最近有没有做release，要不要rollback？等等。。

m********u
发帖数: 3942

来自主题: JobHunting版 - Applied Scientist for NLP 加州

地点 Santa Clara
有兴趣的同学或者想了解更多职位信息的可以站内或者发邮件联系 [email protected]
gmail.com
Responsibilities:
Include but not limited the following three main directions:
• Apply Deep Learning or traditional NLP algorithms such as RNN/DNN/
CRF to improve Entity Recognition and Sequence Labeling;
• Apply machine learning algorithms to improve query understanding
such as query suggestion/ query rewriting
• Work with engineering team to implement those algorithms into search
runtime system
Qua... 阅读全帖

T*****0
发帖数: 22

来自主题: JobMarket版 - Research Engineer-Computational Advertising from Yahoo! US Beijing R&D Centre

Hi, All,
If you have interesting in the following job, please kindly update your
resume to n************[email protected] or contact my cell phone 0086-13811923880
for more details.
---------------------------------------------------
Research Engineer – Computational Advertising
Job Number: 110621
Primary Location: China-Beijing-Haidian
Description
About Yahoo! Labs,
Do you enjoy solving challenging and complex problems? Are you passionate
about dealing with Tega-byte daily data? Do you want to help d... 阅读全帖

R*****n
发帖数: 355

来自主题: JobMarket版 - 【内推】计算机H1B职位SQL/ETL/BI

请写明你想申请的职位， email简历至[email protected]/* */
-----------------------------------------------
1, Access/SQL
Financial/Banking in Minnetonka, MN
Our client is looking for someone that loves data! This candidate’s ideal
background for this role would be SQL, ETL, Data Warehouse, Access, and
business intelligence in their background. The client needs someone with a
change management and price improvement mindset. This person must support
best practices and will be joining a team that will con... 阅读全帖

a*****a
发帖数: 143

来自主题: Living版 - 贷款是直接联系zillow上利率最低的银行，还是broker会有更好的利率

第一次买房，请教贷款的几个问题。
1. 联系mortgage broker会不会有很多hard credit query?
我们向BoA拿pre-approval letter的时候，本来问好是soft query，结果是hard query
。现在我们的agent说认识一个lender，可以帮我们拿到最好的利率，会不会再再run
hard credit query?
2.是不是直接联系zillow上利率最好的银行就行？还是broker可以给更好的利率？
Zillow给出的最好利率是3.3%，一个没听说过的小银行。还需要用agent介绍的broker
吗？

e*i
发帖数: 10288

来自主题: Money版 - 遭受到了淘宝卖家威胁和敲诈，有谁帮我支支招

WHOIS information for yigoexpress.com:**
[Querying whois.verisign-grs.com]
[Redirected to whois.dns.com.cn]
[Querying whois.dns.com.cn]
[whois.dns.com.cn]
Domain name: yigoexpress.com
Registry Domain ID:
Registrar WHOIS Server: whois.dns.com.cn
Registrar URL: http://www.dns.com.cn
Updated Date: 2013-10-14T10:24:59Z
Creation Date: 2012-11-12T22:00:34Z
Registrar Registration Expiration Date: 2014-11-12T22:00:34Z
Registrar: Beijing Innovative Linkage Technology Ltd.
Registrar IANA ID: 633
Registrar... 阅读全帖

e*i
发帖数: 10288

来自主题: shopping版 - 这个网站的coach包包是真货吗？

WHOIS information for coachbags2011.com :
[Querying whois.verisign-grs.com]
[Redirected to whois.enom.com]
[Querying whois.enom.com]
[whois.enom.com]
=-=-=-=
Visit AboutUs.org for more information about coachbags2011.com
Registration Service Provided By: YNZG Domain Register Center
Contact: e*****[email protected]

Domain name: coachbags2011.com
Registrant Contact:
YNZGDomainRegisterCenter
Yong Liu ()

Fax:
No.39,YongquanLane,ShibaDistrict
Ruian, 325200
CN
Administrative Co... 阅读全帖

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天