第2页 - 关于shard的讨论汇总 - 话题女王

G******i
发帖数: 5226

☆─────────────────────────────────────☆
guangyi ( 光一) 于 (Sat Oct 29 00:10:37 2011, 美东) 提到:
**********************************
M:
phone interview (1 round):
why MS?
biggest challenge
why like coding and algorithm?
what is good code?
your longest code
biggest accomplishment
if you don't want some functions to be modified in java, what to do?
does java allow multiple inheritance?
what does synchronized keyword mean in java?
CEO wants a book, you find it in the system of a nearby bookshop. You ... 阅读全帖

R**y
发帖数: 72

来自主题: JobHunting版 - 攒RP，ZocDoc 面经

ZocDoc是一个不错的公司。市场前景不错，没有对手。
Skype Interview，一个亚裔小伙，人很nice，题目也不难
Reverse Linked List.
我开始用stack实现，结果返回的head是为null，初始化赋值的地方出错
Node result = null;
Node head = result; // 这个地方，即时将来result 会赋上新值，head依然为null。
然后处理头节点的时候，没有将其的next赋为空。。。。
接着一看不行，用for loop 直接做，返回值又弄错了，返回了是反转结果的最后一个
节点。。。。
no.2 打印一个string所有可能的subset的anagram，
这道题饿做错了，我只打印了当前字符串所有可能的anagram，而且面试官没看出来我
错了，他也误以为是只打印所有anagram。
这道题如果要打印所有subset的 anagram，我觉得至少O(2^n),字串就有这么多。。。
攒个RP，这是第二个电面，发现如果做新题，很容易慌，直接就容易跪，即使能做出来
也经常出这样那样的小bug，需要面试官带着才能做对
----... 阅读全帖

R**y
发帖数: 72

来自主题: JobHunting版 - 攒RP，ZocDoc 面经

s**x
发帖数: 7506

来自主题: JobHunting版 - 问一道G家系统设计题

这个其实就是news feed design 的经典题。
1 limited storage, you have to store one copy for each activity.
So other friends can query. Otherwise, you can store multiple copies for
all his friends.
2. Storage layer. Have to use sharding or some distributed file systems like
google file system.Any persistent storage would work.
3. Friends list is large, again sharding is the key. 1m friends, we can
have 100 servers, each one handles 10k friends only.
4 login/loged out friends.
Loged in friends, use push... 阅读全帖

h*****a
发帖数: 1718

来自主题: JobHunting版 - 求牛人解答一个Amazon 设计问题

每个server有自己的id，同时用自己的计数器。比如64bit的id，你预期最多1000台
server，那就预留10bit给server id，然后剩下54bit是counter 部分。
这是一个分布式系统的常见问题，data sharding也都要用到类似的方法。
你的方案中把sharding提前固定了，这样data的分布和服务器的数量绑定在一起，是非
常不灵活的。所以面试官不满意是很正常的。

65

p*****3
发帖数: 488

来自主题: JobHunting版 - FB设计题求教。

request 会打在web server上，每个web server再实时的batch processing log，再内
存keep一个简单的aggregation map。batch size 到了就把old aggregation map写到
sharded的key value store上，一般支持batch write一个shard就发一个request，key
value
store最好支持merge operation，不支持race condition的几率也很小。
解释个毛线的key value store原理，拿来会用不就得了。

p*****p
发帖数: 379

来自主题: JobHunting版 - 问个snapchat的设计题

抛个砖：
这一般就是起一个DB，开一个events表，关系无非是：
event_id, source_user_id （发起者）, timestamp, is_valid, event_data(json)
由于event基本可以看做immutable，数据库可以设为read uncommitted减少等锁情况
naive solution：
当用户登录时，检索用户好友，然后从events表里，根据timestamp和id，读取所需的
数据（pull），登录以后维持一个tcp长连接，poll或者服务器push最近的更新
当活跃用户多、用户互相之间平均connections多或者该用户经常登录时，上述检索容
易造成第一次登录时更新缓慢甚至超时情况，此时可以考虑push，即给每个用户（或者
部分活跃用户）维护一个event list，naive solution就是再造个source_user_id,
target_user_id, timestamp的表，然后登录时就读这个表获得数据
一个mysql服务器，撑起一个同时在线1000人，平均一人300 connections，和1条even... 阅读全帖

b******g
发帖数: 77

来自主题: JobHunting版 - 报F和G的offer，分享面经和准备经验

Offer:
=====
背景：非cs PhD+两年半经验
申请了Amazon，FB 和 G。
A家Rejected：1st 电面遇三哥，被黑
f家Offer： ~24w/year + 5w sign on
g家Offer： ~25w/year + 3.5w sign on
两个Offer都很好，很难选择，最后去了狗。
FB 是板上的大哥帮我内推的，人非常非常好，很热心，很可惜最后没去，特别特别的
感谢他。
G家是哥们内推的，帮忙收集了很多准备材料，有问必答。
最感谢的是，老婆，岳父，岳母，提供充足的后期保障，说实话，照顾宝宝比什么写码
刷题，累得多。
面经：
====
A家电面：
-----------
三哥，出了5道题，30分钟全部搞定，还是被黑了。当时没有经验，应该面试完后
立刻投诉。出结果后才向HR投诉，未果。
1 given 2 strings，can you construct str1 using chars in str2？
2 binary tree inorder traversal，both recursiv... 阅读全帖

b******g
发帖数: 77

来自主题: JobHunting版 - 报F和G的offer，分享面经和准备经验

f*******r
发帖数: 976

来自主题: JobHunting版 - 报F和G的offer，分享面经和准备经验

恭喜！

Offer:
=====
背景：非cs PhD+两年半经验
申请了Amazon，FB 和 G。
A家Rejected：1st 电面遇三哥，被黑
f家Offer： ~24w/year + 5w sign on
g家Offer： ~25w/year + 3.5w sign on
两个Offer都很好，很难选择，最后去了狗。
FB 是板上的大哥帮我内推的，人非常非常好，很热心，很可惜最后没去，特别特别的
感谢他。
G家是哥们内推的，帮忙收集了很多准备材料，有问必答。
最感谢的是，老婆，岳父，岳母，提供充足的后期保障，说实话，照顾宝宝比什么写码
刷题，累得多。
面经：
====
A家电面：
-----------
三哥，出了5道题，30分钟全部搞定，还是被黑了。当时没有经验，应该面试完后
立刻投诉。出结果后才向HR投诉，未果。
1 given 2 strings，can you construct str1 using chars in str2？
2 binary tree inorder traversal，both rec... 阅读全帖

b*****n
发帖数: 618

来自主题: JobHunting版 - 回报本版，前段时间骑驴找马FGU等公司offer面经总结【已更新FGU】

这个也是经典题目，每个人问的东西也会不一样。
这次对面问的侧重点是在不同阶段bottleneck是什么，打算怎么解决。
我答的是：
1.开始的时候network IO是block的因素，需要解决，方法跟dropbox那个题目类似。。
用不同的pool做不同的事情。
2.然后process的latency会是问题，因为要crawl的文件很多，然后开始分布式，把
workload分布到不同的机器/网络上，url based sharding
3.假设cpu很强劲，那么下一个问题可能是每个shard需要记录自己之前已经crawl过的
url，如果直接存内存的话到一定程度就受不了了，所以需要一个比较好的解决方案，
我感觉又可以用老套路了，直接上kv store就行了。。但是被告知不行，这么轻量级的
操作不想借助于外界的系统，而且kv store这个空间占的又多了去了。只能上bloom
filter来解决，但是有一定的false rate，不过大不了多download少数，问题不大。。
我感觉这一轮答的不是特别好，但是应该还是过了。。

and

a*****u
发帖数: 1712

来自主题: JobHunting版 - 再来继续比较，芒果和redis各什么时候用比较好？

redis可以shard的，一台装不下了就sharding。

二爷，还是你好！有个问题：redis既然是in memory，如果大量数据，memory不就装不
下了么？还有，什么时候redis不好？

t*********r
发帖数: 387

来自主题: JobHunting版 - 再来继续比较，芒果和redis各什么时候用比较好？

Performance: using ZK implies pessimistic CC, which IMO will probably
perform slightly worse in the general case.
I concede cross shard transactions will incur a latency cost, but scaling
RDBMS to very large scales is a pain in the ass. For smaller data sets,
crossing a few shards isn't going to be *that* bad.
I never made an argument that there is no tradeoff between C/A against *
performance*. What I do have a problem is claiming that you can't have both
C and A (since P is widely claimed and ... 阅读全帖

b*****n
发帖数: 618

来自主题: JobHunting版 - Facebook 加面 Design 题

看面试官问的角度是什么，答案可能会差别很大。
从系统的角度上来讲，这个其实仍然是pub-sub system
design可以完全参照feed system的，要想distributed无非就是对doc做sharding，
但是sharding的key可以选择，比如
如果doc A -> doc B
可以用A做key也可以用B做key，不同情况下各有优缺点。

y******u
发帖数: 804

来自主题: JobHunting版 - 讨论一下12306的架构？

再简洁一点
一个座x［各剩余区间（存成bitmap）］
并发：先对inventory做sharding，比如按车厢，在各shards上操作。

s*****r
发帖数: 43070

来自主题: JobHunting版 - 讨论一下12306的架构？

数据库瓶颈可以通过sharding来解决，每个订票者的transaction去lock不同的row，减
少之间的conflict和roll back。大多数订票的只会买一两张车票，sharding可以细化
到很小的层次

j********r
发帖数: 127

来自主题: JobHunting版 - 讨论一下12306的架构？

sharding其实是显著降低性能的，每个query里面都要根据shard key来查询，否则就是
灾难。

r*****s
发帖数: 1815

来自主题: JobHunting版 - fb设计题

#1 QPS:
Probably 10k~100k concurrent access (my guess)
Meaning we need to heavily cache the data. We may need some write through
cache in place. Well sharded.
resource wise, I'm not worried at all. for cache https://redis.io/topics/
benchmarks) redis can be super fast. for persistency with nosql we can
achieve almost infinite scalability.
our system should provide the following APIs:
- Like(user, post)
- Unlike(user, post)
- Liked(user, post)
- CountLikes(post)
Extra:
- RecentLikes(user)
- Frie... 阅读全帖

r*****s
发帖数: 1815

来自主题: JobHunting版 - 一道脸家的系统设计常考题

The problem is that this is a little island in a big ocean, a really big...
Sorry I mean this is a system which needs to be injected into many other sub
systems so it's messy and too hard to be described here in a post.
However we can quickly go over some basic stuffs.
# we ignore the QPS and scalability bullshits here
* What are the requirements?
1. user can CRUD groups. **
2. user can assign group(s) to a post to whitelist/blacklist people in group
for the post***
3. user will not see other's ... 阅读全帖

m********l
发帖数: 791

来自主题: JobHunting版 - 咨询一个system design 小细节问题

小弟不才，试着回答...
tiny URL 在多machine/Server环境下，如果多个用户同时insert the same URL。如何
处理？谢谢。
不需要特殊处理，来一个long_url就给一个short_url去对应。一个长url对应多个短
url不会对系统和用户造成什么影响
比如两个用户同时insert www.mitbbs.com，生成的url如下：
abcdef -> www.mitbbs.com
123456 -> www.mitbbs.com
两个短url都指向www.mitbbs.com，无所谓
第二个问题：
1.感觉用不用自增ID和做不做consistent hashing没有关系，你用了自增ID也可以用
consistent hashing
2.你的做法可行，无非就是把数据存两份，一个是short2long的mapping，一个是
long2short的mapping。为什么是两份？因为你的mapping方式不能保证long_url和
short_url都在一台server上。
还有另外两个办法：
1) 只存short2long的mapping，来一个lo... 阅读全帖

f********r
发帖数: 304

来自主题: Stock版 - 说一下nosql和mongodb

随便喷两句MDB。以前管理过400+nodes的cluster 横跨4个data center （1primary+
3secondary+1backup）replicaset。QPS大概100k/s。 index的坑很多，而且数据碎片
管理问题很多，自带sharding非常难用。它自带的工具几乎没法用。数据out of sync
也是很常见的。最恶心的问题是如果数据库被overload，基本整个系统不响应。所以在
application端一定要做好throttle保持数据库不会被overload。升级补丁坑也很多尤
其是2to3这种majority upgrade。wiredtiger是他们收购的项目。总体来说也就应付一
下中小business。它家季报只要有一个Q miss growth就会是很好的short入点。
如果自己做data sharding还勉强可以用用。不过数据量大了没有什么solution是完美
的。

j*******e
发帖数: 529

来自主题: TVGame版 - 悲催的刷石头做魔女剑大狼剑和月光碟盾

只需要titanite shard和large titanite shard 但是要这个三个boss的魂

L********g
发帖数: 4204

来自主题: TVGame版 - 今晚做好了魔女剑，月光盾，大狼剑还有Smough's hammer

因为我不想拿着lordvessel去和那个大蛇talk（关系到一个后续的covenant),所以就只
好在Depths bonfire附近刷史莱姆，一晚上刷出了40个large titanite shards和36个
green titanite shards. 把手头上所有的boss魂都用掉了做了武器，又用了6个demon
titanite强化到大狼剑+4（我选择用broken straight sword+10做，要求Str24 Dex18
Int20 Faith20)感觉还是我的黑骑士剑~380ATK好用些，也许遇到魔防不高的敌人大狼
能好些吧(魔法伤害很高)

L********g
发帖数: 4204

来自主题: TVGame版 - 给黑骑剑升级的twinkle在哪搞？怒问

Do you have the serpent ring that can increase the droping rate?
I farm 1 hour get ~10 large shard+10green shard

m*******n
发帖数: 6660

来自主题: TVGame版 - 太变态了！

大shard牢笼铁匠随便卖，小shard搞不到，只能刷大锤胖子。
这小的为什么没人卖？？

t******g
发帖数: 17520

来自主题: TVGame版 - 【Destiny】最烦恼的是到每个星球上开箱子拿升级用的材料

嗯，紫的本身也就比蓝的最多多5个light 关键是升级后light 能+地多
现在升级材料ascendant shard 难弄，还是升不满
我的胸甲和脚各需要18个ascendant shard才能升满

AR
发帖数: 436

来自主题: WmGame版 - Obsidian Armor

Need 120 globs of ectoplasm and 120 obsidian shards and
some other common materials which are different for each
class. The ectoplasm and shard are rare drops in uw and fow,
respectively. Rare material traders sell them about 3.5K each.
To craft each piece of armor it will cost 15k not including
materials. So roughly to craft the whole set it will cost about:
120*2*3.5+15*5=915 (platiums)
Besides you need a very experienced team to make it to the
crafter and finish his quests before he can craft

d****t
发帖数: 19

来自主题: SciFiction版 - 黑暗精灵崔斯特的小说，作者的英文名和小说名是什么?

作者R.A.Salvatore
你指的大概是DarkElf Trilogy。而和崔斯特有关的书很多，
Icewind Dale Trilogy, 还有个血脉四部曲，包括The Legacy,
Starless Night, Siege of Darkness, Passage to Dawn。
他的作品如下：
"The Icewind Dale Trilogy"
I.The Crystal Shard
II. Streams of Silver
III. The Halfling's Gem
"The Dark Elf Trilogy"
I.Homeland
II.Exile
III.Sojourn
I.The Legacy
II.Starless Night
III.Siege of Darkness
IV.Passage To Dawn
"Paths of Darkness"
The Silent Blade
Spine of the World
Servant of the Shard
Sea of Swords
"The Cleric Quintet"
I.Canticle

k*****a
发帖数: 1463

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

Go back and learn why HADOOP (BASE: basically available, soft state,
eventual consistency)?
：所以到了这个时候，就只有两种选择。一是NoSQL DB，二是Sharding。
一是NoSQL DB ？？？
>> 瞎腚: EBay's items don't have much consistency contentions, would be able
to be handled by some garage DBMS after proper 瞎腚, 别说SQLServer.Ebay's
transaction isn't at all scale a problem. How many
bid items live everyday? Are you suggesting Ebay to use NoSQL DB for live
site transactions？and you put these as your first option to scale out EBAY?
Read... 阅读全帖

g*****g
发帖数: 34805

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

你不懂不是你的错，出来吓人就是你的不对了。Ebay本来就是一个拍卖网站，还不需要
consistency了。别人下个bid你没读出来，还玩个屁呀。跟你说Ebay复杂的地方就是因
为不能用NoSQL，只能Sharding，Sharding之后要尽力避免distributed transaction。
Ebay的架构网上能找到，你不懂没关系，Google呀，不懂还瞎鸡巴扯。
从头就跟你提Transaction是关键，感情你不懂transaction就是ACID，ACID里的C就是
consistency，这回又跟我扯回CAP theorem了。你不是说跟transaction没关系吗？

able
EBAY?

k*****a
发帖数: 1463

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

您坐，您坐：
[goodbug]〉〉〉 : ：所以到了这个时候，就只有两种选择。一是NoSQL DB，二是
Sharding。
[goodbug]〉〉〉〉〉〉不能用NoSQL，只能Sharding。

g*****g
发帖数: 34805

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

Of course Ebay does. Isn't data partition "Sharding"? I keep on saying they
can't go NoSQL so they have to resolve it with "Sharding".
I never think Hadoop is a big part of Ebay. Ebay exisits many many years
before Hadoop was even created. And we all know Hadoop is a batch process
that doesn't have much to do with interactive transactions. But NeverLearn
kept on bringing Hadoop up in Ebay's discussion like it's the only reason
Ebay's successful. As I said, he knows nothing, but he's good at maki... 阅读全帖

k*****a
发帖数: 1463

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

g*****g
发帖数: 34805

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

k*****a
发帖数: 1463

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

您坐，您坐：
[goodbug]〉〉〉 : ：所以到了这个时候，就只有两种选择。一是NoSQL DB，二是
Sharding。
[goodbug]〉〉〉〉〉〉不能用NoSQL，只能Sharding。

g*****g
发帖数: 34805

来自主题: DotNet版 - 要不要跳ASP.net and C# 坑？

t*******e
发帖数: 684

来自主题: Java版 - 真的还有人用那么heavy的web sphere,jboss么?

sharding就是DB horizontal partitioning。Hibernate Shard适用。

c******o
发帖数: 1277

来自主题: Programming版 - 你们有没有一种感觉，其实big data

mongodb count()很差， 2.4 好一点了 https://jira.mongodb.org/browse/SERVER-
1752
mongodb performance和内存有关，和index／shard key的关系很大，
shard mongodb 和 index creation是很有学问的。
还有的就是mongodb 的写lock是per db,不是per collection的。
所以mongodb只适合于一定的东西。

。

c******o
发帖数: 1277

来自主题: Programming版 - HOW WE DECIDED TO USE MONGO INSTEAD OF MYSQL

3 shards, each shard is a replica set contain 3 machines (there maybe other
utility machines in the set also)
and 3 config servers and some number of mongos server.
All on AWS
It is a total waste at beginning, do not know why they need so many.

c******o
发帖数: 1277

来自主题: Programming版 - HOW WE DECIDED TO USE MONGO INSTEAD OF MYSQL

shard are auto balancing, data size does not matter much, you can throw in
shards as needed
no general cache, has some cache in play app

c******o
发帖数: 1277

来自主题: Programming版 - HOW WE DECIDED TO USE MONGO INSTEAD OF MYSQL

depends on your working set, worst case your working set of 100% of your
data
you can just add shard, it auto balancing, so never have the problem of 上限
it is far easier sharding and horizontal scaling for MongoDB still,
simply as it is all automatic and less coupling from the beginning.
In your case (mysql with no join), it is in fact better use mongodb (mongodb
has single document ACID), and it will be faster and more flexible.

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天