t**********1 发帖数: 550 | |
p*****2 发帖数: 21240 | |
t**********1 发帖数: 550 | 3 傻逼凑一块儿了,这是个互相促进。
本版有两条路走。傻逼化和去傻逼化。
【在 p*****2 的大作中提到】 : 貌似搞Java的都很钟爱这个。
|
l*********s 发帖数: 5409 | |
p*****2 发帖数: 21240 | 5
应该就是cassandra吧。记得好虫的最初版本是用这个的。不过后边我没follow
【在 l*********s 的大作中提到】 : what is c*?
|
l*********s 发帖数: 5409 | 6 i c, thanks.
【在 p*****2 的大作中提到】 : : 应该就是cassandra吧。记得好虫的最初版本是用这个的。不过后边我没follow
|
w**z 发帖数: 8232 | 7 具体点?
【在 t**********1 的大作中提到】 : 本版令人堪忧呀。
|
d*******r 发帖数: 3299 | 8 如果不要求速度,可以吧
另外 Redis,不也经常当 message queue 用吗 |
w**z 发帖数: 8232 | 9 C*单机20k writes per second, 不慢啊。用在这正合适。
【在 d*******r 的大作中提到】 : 如果不要求速度,可以吧 : 另外 Redis,不也经常当 message queue 用吗
|
b*******g 发帖数: 603 | 10 cassandra写快读慢,但MQ都是批量读,完全没有性能问题。
一个简单的time-based UUID做key, 一个index CF就搞定了。
唯一要注意的是tombstone, 删除要批量删,否则对性能有影响。
太监根本就没用过Cassandra.
【在 w**z 的大作中提到】 : C*单机20k writes per second, 不慢啊。用在这正合适。
|
|
|
p*****3 发帖数: 488 | 11
Message queue 要 guarantee deliver
【在 d*******r 的大作中提到】 : 如果不要求速度,可以吧 : 另外 Redis,不也经常当 message queue 用吗
|
w**z 发帖数: 8232 | 12 整行删除就没问题了吧。或者,多搞几个CF, 一小时一个,处理完就truncate the CF
老魏对C*完全没概念,啥他都敢喷。
【在 b*******g 的大作中提到】 : cassandra写快读慢,但MQ都是批量读,完全没有性能问题。 : 一个简单的time-based UUID做key, 一个index CF就搞定了。 : 唯一要注意的是tombstone, 删除要批量删,否则对性能有影响。 : 太监根本就没用过Cassandra.
|
t**********1 发帖数: 550 | 13 你有概念?你给一个100k msg/s的方案出来?
CF
【在 w**z 的大作中提到】 : 整行删除就没问题了吧。或者,多搞几个CF, 一小时一个,处理完就truncate the CF : 老魏对C*完全没概念,啥他都敢喷。
|
w**z 发帖数: 8232 | 14 timestamp 做rowkey, 1ms 就100 msg.一个row, 就100 column, 有啥问题?
【在 t**********1 的大作中提到】 : 你有概念?你给一个100k msg/s的方案出来? : : CF
|
t**********1 发帖数: 550 | 15 你再想想,想不出来最好别上网了。
【在 w**z 的大作中提到】 : timestamp 做rowkey, 1ms 就100 msg.一个row, 就100 column, 有啥问题?
|
w**z 发帖数: 8232 | 16 能说点实际点的? 不懂你在喷什么?
【在 t**********1 的大作中提到】 : 你再想想,想不出来最好别上网了。
|
b*******g 发帖数: 603 | 17 傻逼太监又出来丢人了。这是cassandra最常见的time series.
time based UUID 做key, key扔入一个index CF 排序。index CF本身又可以sharding
分多行。
写是commit log, 根本不锁,完全并发。读的时候先读index CF, 给个start time
UUID, end
time UUID, 一次读出一行里的这些 key, 读column本身可以并发,然后拿这些key去读
纪录也是并发的。虽然没有写快。但是作为 MQ, 本身就是缓冲,不需要实时。
100K/s 写峰值完全没有压力。本质上就是无锁写,compaction的时候才sort. 读出的
时候已经排好了。 |
N********n 发帖数: 8363 | 18
QUEUE要求FIRST-IN FIRST-OUT,NOSQL骨子里都是HASHTABLE。HASHTABLE拿
来做FIRST-IN FIRST OUT那是啥效率?HASHTABLE不是干这个用的。
【在 d*******r 的大作中提到】 : 如果不要求速度,可以吧 : 另外 Redis,不也经常当 message queue 用吗
|
p*****2 发帖数: 21240 | 19
NOSQL骨子里都是HASHTABLE?
【在 N********n 的大作中提到】 : : QUEUE要求FIRST-IN FIRST-OUT,NOSQL骨子里都是HASHTABLE。HASHTABLE拿 : 来做FIRST-IN FIRST OUT那是啥效率?HASHTABLE不是干这个用的。
|
b*******g 发帖数: 603 | 20 And why can't hashtable be sorted?
【在 p*****2 的大作中提到】 : : NOSQL骨子里都是HASHTABLE?
|
|
|
N********n 发帖数: 8363 | 21
KEY-VALUE PAIR的结构即HASHTABLE。
【在 p*****2 的大作中提到】 : : NOSQL骨子里都是HASHTABLE?
|
b*******g 发帖数: 603 | 22 Cassandra的每一行都是排序的,你想不排都不行。这下你满意了吧。
【在 N********n 的大作中提到】 : : KEY-VALUE PAIR的结构即HASHTABLE。
|
p*****2 发帖数: 21240 | 23
redis有list和set
【在 N********n 的大作中提到】 : : KEY-VALUE PAIR的结构即HASHTABLE。
|
N********n 发帖数: 8363 | 24
排序有OVERHEAD。而且所有行都在同一个节点上吗?如果分散在各个节点
上怎么知道一个MESSAGE读完下一个MESSAGE去哪个节点读?QUEUE可是要
严格FIRST-IN FIRST OUT的,除非你们应用不CARE.
【在 b*******g 的大作中提到】 : Cassandra的每一行都是排序的,你想不排都不行。这下你满意了吧。
|
P****i 发帖数: 12972 | 25 那是value是list, set,只是一个helper
k-v还是hashtable
【在 p*****2 的大作中提到】 : : redis有list和set
|
N********n 发帖数: 8363 | 26
QUEUE通常就是用DOUBLE LINKED LIST实现的吗。高并发的环境下MQ的存取
就变HOT SPOT了,不觉得用NOSQL能帮什么忙。
NOSQL基本出发点是数据之间LOW COUPLING。这样才能SCALE OUT。而QUEUE
本身要求其数据FIRST IN FIRST OUT。这个FIFO就是一种COUPLING。
【在 p*****2 的大作中提到】 : : redis有list和set
|
b*******g 发帖数: 603 | 27 Messages are distributed but index key can be on one node (3 copy), which is
just a UUID and you can easily fetch 10K in a read for a few ms because
those rows are cached in memory too.
You don't read next, you get a batch of sort keys by specifying start and
end, read the corresponding
messages concurrently and you get a sorted queue.
【在 N********n 的大作中提到】 : : QUEUE通常就是用DOUBLE LINKED LIST实现的吗。高并发的环境下MQ的存取 : 就变HOT SPOT了,不觉得用NOSQL能帮什么忙。 : NOSQL基本出发点是数据之间LOW COUPLING。这样才能SCALE OUT。而QUEUE : 本身要求其数据FIRST IN FIRST OUT。这个FIFO就是一种COUPLING。
|
N********n 发帖数: 8363 | 28
So essentially those UUIDs are still maintained on one single spot,
which is no different from running on a single machine.
【在 b*******g 的大作中提到】 : Messages are distributed but index key can be on one node (3 copy), which is : just a UUID and you can easily fetch 10K in a read for a few ms because : those rows are cached in memory too. : You don't read next, you get a batch of sort keys by specifying start and : end, read the corresponding : messages concurrently and you get a sorted queue.
|
p*****2 发帖数: 21240 | 29
你看看kafka?
【在 N********n 的大作中提到】 : : So essentially those UUIDs are still maintained on one single spot, : which is no different from running on a single machine.
|
p*****2 发帖数: 21240 | 30
看你怎么说了。你可以认为key是list和set的名字。
list可以保证FIFO吧?做queue不行吗?
【在 P****i 的大作中提到】 : 那是value是list, set,只是一个helper : k-v还是hashtable
|
|
|
N********n 发帖数: 8363 | 31
And what's the efficiency of this "specifying start and end" thing?
How to quickly locate the exact UUIDs for random start and end? Sounds
like an O(log N) operation, which is more expensive than a typical
ENQUEUE DEQUEUE or Hash Table access. Those are O(1).
【在 b*******g 的大作中提到】 : Messages are distributed but index key can be on one node (3 copy), which is : just a UUID and you can easily fetch 10K in a read for a few ms because : those rows are cached in memory too. : You don't read next, you get a batch of sort keys by specifying start and : end, read the corresponding : messages concurrently and you get a sorted queue.
|
N********n 发帖数: 8363 | 32
No magic here. Coupling and SCALABILITY cannot co-exist.
【在 p*****2 的大作中提到】 : : 看你怎么说了。你可以认为key是list和set的名字。 : list可以保证FIFO吧?做queue不行吗?
|
b*******g 发帖数: 603 | 33 There are different types of MQ, for us, events are generated in burst, it
can be as high as 100K/s and we can't lose them. We are optimizing for peak
write and C* works well for us there. Sink doesn't have to go as fast.
【在 N********n 的大作中提到】 : : No magic here. Coupling and SCALABILITY cannot co-exist.
|
b*******g 发帖数: 603 | 34 Cassandra is not an MQ, Cassandra is only a storage backing the MQ.
You can read one record at a time 100K times or you can read 100K records at
a time and put them in memory. We all know which one is faster.
While the keys are centralized (some sharding is possible too), they are
very small and messages are big. Concurrently retrieving messages from a
cluster is a big advantage as you won't have a hot spot.
【在 N********n 的大作中提到】 : : No magic here. Coupling and SCALABILITY cannot co-exist.
|
w**z 发帖数: 8232 | 35 为什么不直接用timestamp 做row key?每个ms里的message 是同一row 的columns?但
要求所有client的clock要用NTP sync
sharding
【在 b*******g 的大作中提到】 : 傻逼太监又出来丢人了。这是cassandra最常见的time series. : time based UUID 做key, key扔入一个index CF 排序。index CF本身又可以sharding : 分多行。 : 写是commit log, 根本不锁,完全并发。读的时候先读index CF, 给个start time : UUID, end : time UUID, 一次读出一行里的这些 key, 读column本身可以并发,然后拿这些key去读 : 纪录也是并发的。虽然没有写快。但是作为 MQ, 本身就是缓冲,不需要实时。 : 100K/s 写峰值完全没有压力。本质上就是无锁写,compaction的时候才sort. 读出的 : 时候已经排好了。
|
b*******g 发帖数: 603 | 36 With your schema, writing is about the same. And your read is even
faster.
We keep one message on one row to have some flexibility on the read. e.g. We
can have metadata and message body and we can read metadata only. When you
have multiple messages in one row, it's harder to achieve that.
I think it's all kind of tradeoff and it depends on your application.
【在 w**z 的大作中提到】 : 为什么不直接用timestamp 做row key?每个ms里的message 是同一row 的columns?但 : 要求所有client的clock要用NTP sync : : sharding
|
w**z 发帖数: 8232 | 37 Makes sense.
Thanks.
We
you
【在 b*******g 的大作中提到】 : With your schema, writing is about the same. And your read is even : faster. : We keep one message on one row to have some flexibility on the read. e.g. We : can have metadata and message body and we can read metadata only. When you : have multiple messages in one row, it's harder to achieve that. : I think it's all kind of tradeoff and it depends on your application.
|
N********n 发帖数: 8363 | 38
Well that's basically what the topic says - C* is not MQ - right?
Your app has a specific need of batch message reading that can be
solved w/ C*. That's fine. If, however, the requirement is a pure
FIFO queue then a doublely-linked list is what people will use.
Btw UUIDs themselves are hot spot as they have to be centralized
and sorted. Perhaps your app does not access it too frequently so
you don't care.
【在 b*******g 的大作中提到】 : Cassandra is not an MQ, Cassandra is only a storage backing the MQ. : You can read one record at a time 100K times or you can read 100K records at : a time and put them in memory. We all know which one is faster. : While the keys are centralized (some sharding is possible too), they are : very small and messages are big. Concurrently retrieving messages from a : cluster is a big advantage as you won't have a hot spot.
|
p*****2 发帖数: 21240 | 39
doublely-linked list
这个怎么scale?
【在 N********n 的大作中提到】 : : Well that's basically what the topic says - C* is not MQ - right? : Your app has a specific need of batch message reading that can be : solved w/ C*. That's fine. If, however, the requirement is a pure : FIFO queue then a doublely-linked list is what people will use. : Btw UUIDs themselves are hot spot as they have to be centralized : and sorted. Perhaps your app does not access it too frequently so : you don't care.
|
b*******g 发帖数: 603 | 40 For the context, we are talking about 12306 where I saved the orders to C*.
C* is a DB, a DB is certainly not a MQ but it can be a MQ storage.
For a time series like that, you grab keys you read orders and you never
need to visit the same sorted keys in DB again. It's pretty much one write
and one read with read being offline and batch. It will work just fine.
【在 N********n 的大作中提到】 : : Well that's basically what the topic says - C* is not MQ - right? : Your app has a specific need of batch message reading that can be : solved w/ C*. That's fine. If, however, the requirement is a pure : FIFO queue then a doublely-linked list is what people will use. : Btw UUIDs themselves are hot spot as they have to be centralized : and sorted. Perhaps your app does not access it too frequently so : you don't care.
|
|
|
N********n 发帖数: 8363 | 41
Anything that has to be centralized for everyone else to access cannot
scale.
【在 p*****2 的大作中提到】 : : doublely-linked list : 这个怎么scale?
|
b*******g 发帖数: 603 | 42 It can be replicated.
【在 N********n 的大作中提到】 : : Anything that has to be centralized for everyone else to access cannot : scale.
|