拿C*当message queue用，不知道哪里面试能通过 - Programming版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - 拿C*当message queue用，不知道哪里面试能通过

相关主题
● 拿Cassandra当MQ用，证明你连Cassandra也不懂	● 古德巴大牛，请看这个设计题
● 请教一个初级的用户名密码保存问题	● Node.js is not suitable for generic web projects
● 搞技术的，要有起码的是非观念 by 老魏	● 傻逼太监懂个屁C＊
● goodbug你现在懂message queue了么？	● 分享一下 kango extension 的一些心得
● 请教一个系统设计问题	● 有谁能讲讲Cassandra secondary index的？
● 老魏，你的message queue的概念是十年前j2ee的概念	● 有懂bigtable ,hbase，c*的么？问一个timestamp的问题
● 50伪币：请教perl代码差错的问题！多谢啦！	● Node.js question on identifying 2 different web browser tab/pages
● 春运这个东西，用Storm就可以轻松搞定了	● 请问mongodb + nodejs 如何保证原子操作

相关话题的讨论汇总
话题: mq话题: read话题: queue话题: message话题: hashtable

进入Programming版参与讨论

(共1页)

t**********1
发帖数: 550

本版令人堪忧呀。

p*****2
发帖数: 21240

貌似搞Java的都很钟爱这个。

t**********1
发帖数: 550

傻逼凑一块儿了，这是个互相促进。
本版有两条路走。傻逼化和去傻逼化。

【在 p*****2 的大作中提到】

: 貌似搞Java的都很钟爱这个。

l*********s
发帖数: 5409

what is c*?

p*****2
发帖数: 21240

应该就是cassandra吧。记得好虫的最初版本是用这个的。不过后边我没follow

【在 l*********s 的大作中提到】

: what is c*?

l*********s
发帖数: 5409

i c, thanks.

【在 p*****2 的大作中提到】

:
: 应该就是cassandra吧。记得好虫的最初版本是用这个的。不过后边我没follow

w**z
发帖数: 8232

具体点？

【在 t**********1 的大作中提到】

: 本版令人堪忧呀。

d*******r
发帖数: 3299

如果不要求速度，可以吧
另外 Redis，不也经常当 message queue 用吗

w**z
发帖数: 8232

C*单机20k writes per second, 不慢啊。用在这正合适。

【在 d*******r 的大作中提到】

: 如果不要求速度，可以吧
: 另外 Redis，不也经常当 message queue 用吗

b*******g
发帖数: 603

cassandra写快读慢，但MQ都是批量读，完全没有性能问题。
一个简单的time-based UUID做key, 一个index CF就搞定了。
唯一要注意的是tombstone, 删除要批量删，否则对性能有影响。
太监根本就没用过Cassandra.

【在 w**z 的大作中提到】

: C*单机20k writes per second, 不慢啊。用在这正合适。

相关主题
● 老魏，你的message queue的概念是十年前j2ee的概念	● 古德巴大牛，请看这个设计题
● 50伪币：请教perl代码差错的问题！多谢啦！	● Node.js is not suitable for generic web projects
● 春运这个东西，用Storm就可以轻松搞定了	● 傻逼太监懂个屁C＊
进入Programming版参与讨论

p*****3
发帖数: 488

Message queue 要 guarantee deliver

【在 d*******r 的大作中提到】

: 如果不要求速度，可以吧
: 另外 Redis，不也经常当 message queue 用吗

w**z
发帖数: 8232

整行删除就没问题了吧。或者，多搞几个CF, 一小时一个，处理完就truncate the CF
老魏对C*完全没概念，啥他都敢喷。

【在 b*******g 的大作中提到】

: cassandra写快读慢，但MQ都是批量读，完全没有性能问题。
: 一个简单的time-based UUID做key, 一个index CF就搞定了。
: 唯一要注意的是tombstone, 删除要批量删，否则对性能有影响。
: 太监根本就没用过Cassandra.

t**********1
发帖数: 550

你有概念？你给一个100k msg/s的方案出来？

CF

【在 w**z 的大作中提到】

: 整行删除就没问题了吧。或者，多搞几个CF, 一小时一个，处理完就truncate the CF
: 老魏对C*完全没概念，啥他都敢喷。

w**z
发帖数: 8232

timestamp 做rowkey， 1ms 就100 msg.一个row, 就100 column, 有啥问题？

【在 t**********1 的大作中提到】

: 你有概念？你给一个100k msg/s的方案出来？
:
: CF

t**********1
发帖数: 550

你再想想，想不出来最好别上网了。

【在 w**z 的大作中提到】

: timestamp 做rowkey， 1ms 就100 msg.一个row, 就100 column, 有啥问题？

w**z
发帖数: 8232

能说点实际点的？不懂你在喷什么？

【在 t**********1 的大作中提到】

: 你再想想，想不出来最好别上网了。

b*******g
发帖数: 603

傻逼太监又出来丢人了。这是cassandra最常见的time series.
time based UUID 做key, key扔入一个index CF 排序。index CF本身又可以sharding
分多行。
写是commit log, 根本不锁，完全并发。读的时候先读index CF, 给个start time
UUID, end
time UUID, 一次读出一行里的这些 key, 读column本身可以并发，然后拿这些key去读
纪录也是并发的。虽然没有写快。但是作为 MQ, 本身就是缓冲，不需要实时。
100K/s 写峰值完全没有压力。本质上就是无锁写，compaction的时候才sort. 读出的
时候已经排好了。

N********n
发帖数: 8363

QUEUE要求FIRST-IN FIRST-OUT，NOSQL骨子里都是HASHTABLE。HASHTABLE拿
来做FIRST-IN FIRST OUT那是啥效率？HASHTABLE不是干这个用的。

【在 d*******r 的大作中提到】

: 如果不要求速度，可以吧
: 另外 Redis，不也经常当 message queue 用吗

p*****2
发帖数: 21240

NOSQL骨子里都是HASHTABLE？

【在 N********n 的大作中提到】

:
: QUEUE要求FIRST-IN FIRST-OUT，NOSQL骨子里都是HASHTABLE。HASHTABLE拿
: 来做FIRST-IN FIRST OUT那是啥效率？HASHTABLE不是干这个用的。

b*******g
发帖数: 603

And why can't hashtable be sorted?

【在 p*****2 的大作中提到】

:
: NOSQL骨子里都是HASHTABLE？

相关主题
● 分享一下 kango extension 的一些心得	● Node.js question on identifying 2 different web browser tab/pages
● 有谁能讲讲Cassandra secondary index的？	● 请问mongodb + nodejs 如何保证原子操作
● 有懂bigtable ,hbase，c*的么？问一个timestamp的问题	● C++两个问题
进入Programming版参与讨论

N********n
发帖数: 8363

KEY-VALUE PAIR的结构即HASHTABLE。

【在 p*****2 的大作中提到】

:
: NOSQL骨子里都是HASHTABLE？

b*******g
发帖数: 603

Cassandra的每一行都是排序的，你想不排都不行。这下你满意了吧。

【在 N********n 的大作中提到】

:
: KEY-VALUE PAIR的结构即HASHTABLE。

p*****2
发帖数: 21240

redis有list和set

【在 N********n 的大作中提到】

:
: KEY-VALUE PAIR的结构即HASHTABLE。

N********n
发帖数: 8363

排序有OVERHEAD。而且所有行都在同一个节点上吗？如果分散在各个节点
上怎么知道一个MESSAGE读完下一个MESSAGE去哪个节点读？QUEUE可是要
严格FIRST-IN FIRST OUT的，除非你们应用不CARE.

【在 b*******g 的大作中提到】

: Cassandra的每一行都是排序的，你想不排都不行。这下你满意了吧。

P****i
发帖数: 12972

那是value是list, set，只是一个helper
k-v还是hashtable

【在 p*****2 的大作中提到】

:
: redis有list和set

N********n
发帖数: 8363

QUEUE通常就是用DOUBLE LINKED LIST实现的吗。高并发的环境下MQ的存取
就变HOT SPOT了，不觉得用NOSQL能帮什么忙。
NOSQL基本出发点是数据之间LOW COUPLING。这样才能SCALE OUT。而QUEUE
本身要求其数据FIRST IN FIRST OUT。这个FIFO就是一种COUPLING。

【在 p*****2 的大作中提到】

:
: redis有list和set

b*******g
发帖数: 603

Messages are distributed but index key can be on one node (3 copy), which is
just a UUID and you can easily fetch 10K in a read for a few ms because
those rows are cached in memory too.
You don't read next, you get a batch of sort keys by specifying start and
end, read the corresponding
messages concurrently and you get a sorted queue.

【在 N********n 的大作中提到】

:
: QUEUE通常就是用DOUBLE LINKED LIST实现的吗。高并发的环境下MQ的存取
: 就变HOT SPOT了，不觉得用NOSQL能帮什么忙。
: NOSQL基本出发点是数据之间LOW COUPLING。这样才能SCALE OUT。而QUEUE
: 本身要求其数据FIRST IN FIRST OUT。这个FIFO就是一种COUPLING。

N********n
发帖数: 8363

So essentially those UUIDs are still maintained on one single spot,
which is no different from running on a single machine.

【在 b*******g 的大作中提到】

: Messages are distributed but index key can be on one node (3 copy), which is
: just a UUID and you can easily fetch 10K in a read for a few ms because
: those rows are cached in memory too.
: You don't read next, you get a batch of sort keys by specifying start and
: end, read the corresponding
: messages concurrently and you get a sorted queue.

p*****2
发帖数: 21240

你看看kafka？

【在 N********n 的大作中提到】

:
: So essentially those UUIDs are still maintained on one single spot,
: which is no different from running on a single machine.

p*****2
发帖数: 21240

看你怎么说了。你可以认为key是list和set的名字。
list可以保证FIFO吧？做queue不行吗？

【在 P****i 的大作中提到】

: 那是value是list, set，只是一个helper
: k-v还是hashtable

相关主题
● 开始看PEG.js 星宿派出身基础不行看得脑仁疼	● 请教一个初级的用户名密码保存问题
● 请教一个C++的考题	● 搞技术的，要有起码的是非观念 by 老魏
● 拿Cassandra当MQ用，证明你连Cassandra也不懂	● goodbug你现在懂message queue了么？
进入Programming版参与讨论

N********n
发帖数: 8363

And what's the efficiency of this "specifying start and end" thing?
How to quickly locate the exact UUIDs for random start and end? Sounds
like an O(log N) operation, which is more expensive than a typical
ENQUEUE DEQUEUE or Hash Table access. Those are O(1).

【在 b*******g 的大作中提到】

N********n
发帖数: 8363

No magic here. Coupling and SCALABILITY cannot co-exist.

【在 p*****2 的大作中提到】

:
: 看你怎么说了。你可以认为key是list和set的名字。
: list可以保证FIFO吧？做queue不行吗？

b*******g
发帖数: 603

There are different types of MQ, for us, events are generated in burst, it
can be as high as 100K/s and we can't lose them. We are optimizing for peak
write and C* works well for us there. Sink doesn't have to go as fast.

【在 N********n 的大作中提到】

:
: No magic here. Coupling and SCALABILITY cannot co-exist.

b*******g
发帖数: 603

Cassandra is not an MQ, Cassandra is only a storage backing the MQ.
You can read one record at a time 100K times or you can read 100K records at
a time and put them in memory. We all know which one is faster.
While the keys are centralized (some sharding is possible too), they are
very small and messages are big. Concurrently retrieving messages from a
cluster is a big advantage as you won't have a hot spot.

【在 N********n 的大作中提到】

:
: No magic here. Coupling and SCALABILITY cannot co-exist.

w**z
发帖数: 8232

为什么不直接用timestamp 做row key？每个ms里的message 是同一row 的columns？但
要求所有client的clock要用NTP sync

sharding

【在 b*******g 的大作中提到】

: 傻逼太监又出来丢人了。这是cassandra最常见的time series.
: time based UUID 做key, key扔入一个index CF 排序。index CF本身又可以sharding
: 分多行。
: 写是commit log, 根本不锁，完全并发。读的时候先读index CF, 给个start time
: UUID, end
: time UUID, 一次读出一行里的这些 key, 读column本身可以并发，然后拿这些key去读
: 纪录也是并发的。虽然没有写快。但是作为 MQ, 本身就是缓冲，不需要实时。
: 100K/s 写峰值完全没有压力。本质上就是无锁写，compaction的时候才sort. 读出的
: 时候已经排好了。

b*******g
发帖数: 603

With your schema, writing is about the same. And your read is even
faster.
We keep one message on one row to have some flexibility on the read. e.g. We
can have metadata and message body and we can read metadata only. When you
have multiple messages in one row, it's harder to achieve that.
I think it's all kind of tradeoff and it depends on your application.

【在 w**z 的大作中提到】

: 为什么不直接用timestamp 做row key？每个ms里的message 是同一row 的columns？但
: 要求所有client的clock要用NTP sync
:
: sharding

w**z
发帖数: 8232

Makes sense.
Thanks.

We
you

【在 b*******g 的大作中提到】

: With your schema, writing is about the same. And your read is even
: faster.
: We keep one message on one row to have some flexibility on the read. e.g. We
: can have metadata and message body and we can read metadata only. When you
: have multiple messages in one row, it's harder to achieve that.
: I think it's all kind of tradeoff and it depends on your application.

N********n
发帖数: 8363

Well that's basically what the topic says - C* is not MQ - right?
Your app has a specific need of batch message reading that can be
solved w/ C*. That's fine. If, however, the requirement is a pure
FIFO queue then a doublely-linked list is what people will use.
Btw UUIDs themselves are hot spot as they have to be centralized
and sorted. Perhaps your app does not access it too frequently so
you don't care.

【在 b*******g 的大作中提到】

: Cassandra is not an MQ, Cassandra is only a storage backing the MQ.
: You can read one record at a time 100K times or you can read 100K records at
: a time and put them in memory. We all know which one is faster.
: While the keys are centralized (some sharding is possible too), they are
: very small and messages are big. Concurrently retrieving messages from a
: cluster is a big advantage as you won't have a hot spot.

p*****2
发帖数: 21240

doublely-linked list
这个怎么scale？

【在 N********n 的大作中提到】

:
: Well that's basically what the topic says - C* is not MQ - right?
: Your app has a specific need of batch message reading that can be
: solved w/ C*. That's fine. If, however, the requirement is a pure
: FIFO queue then a doublely-linked list is what people will use.
: Btw UUIDs themselves are hot spot as they have to be centralized
: and sorted. Perhaps your app does not access it too frequently so
: you don't care.

b*******g
发帖数: 603

For the context, we are talking about 12306 where I saved the orders to C*.
C* is a DB, a DB is certainly not a MQ but it can be a MQ storage.
For a time series like that, you grab keys you read orders and you never
need to visit the same sorted keys in DB again. It's pretty much one write
and one read with read being offline and batch. It will work just fine.

【在 N********n 的大作中提到】

相关主题
● goodbug你现在懂message queue了么？	● 50伪币：请教perl代码差错的问题！多谢啦！
● 请教一个系统设计问题	● 春运这个东西，用Storm就可以轻松搞定了
● 老魏，你的message queue的概念是十年前j2ee的概念	● 古德巴大牛，请看这个设计题
进入Programming版参与讨论

N********n
发帖数: 8363

Anything that has to be centralized for everyone else to access cannot
scale.

【在 p*****2 的大作中提到】

:
: doublely-linked list
: 这个怎么scale？

b*******g
发帖数: 603

It can be replicated.

【在 N********n 的大作中提到】

:
: Anything that has to be centralized for everyone else to access cannot
: scale.

(共1页)

进入Programming版参与讨论

相关主题
● 请问mongodb + nodejs 如何保证原子操作	● 请教一个系统设计问题
● C++两个问题	● 老魏，你的message queue的概念是十年前j2ee的概念
● 开始看PEG.js 星宿派出身基础不行看得脑仁疼	● 50伪币：请教perl代码差错的问题！多谢啦！
● 请教一个C++的考题	● 春运这个东西，用Storm就可以轻松搞定了
● 拿Cassandra当MQ用，证明你连Cassandra也不懂	● 古德巴大牛，请看这个设计题
● 请教一个初级的用户名密码保存问题	● Node.js is not suitable for generic web projects
● 搞技术的，要有起码的是非观念 by 老魏	● 傻逼太监懂个屁C＊
● goodbug你现在懂message queue了么？	● 分享一下 kango extension 的一些心得

相关话题的讨论汇总
话题: mq话题: read话题: queue话题: message话题: hashtable

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天