Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage - MobileDevelopment版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

MobileDevelopment版 - Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency Storage

相关主题
● 感觉vert.x的设计很一般呀	● 为什么板上这么多人还是抱着C++不学Java呢？
● Cassandra Rewritten In C++, Ten Times Faster	● 吐槽g家package, 顺便说说昨天L家面经
● 岳东晓知识产权案获胜	● Leetcode用Python刷通关可以去面试不
● 求内推湾区IT公司职位	● 怎样把LAMP server和C++写的后台处理程序结合起来？（新手问）
● C++问题3	● 面试 Expectation　问题
● 我也来说说我Amazon的onsite经历吧	● [转]Alibaba全球招华人技术牛人！！！ (转载)
● CS选课求问	● 求内推湾区IT公司职位
● 生平第一次用C#写程序	● [转]Alibaba全球招华人技术牛人！！！

相关话题的讨论汇总
话题: facebook话题: apollo话题: storage话题: johnson话题: shard

进入MobileDevelopment版参与讨论

(共1页)

z*******n
发帖数: 1034

Facebook Announces Apollo, a New NoSQL Database for On-line Low Latency
Storage
by Charles Humble on Jun 13, 2014 | Discuss

Speaking at QCon New York on Wednesday Jeff Johnson, from the core data
group at Facebook, announced Apollo, Facebook’s Paxos-like NoSQL database.
Written in C++11 on top of the Apache Thrift 2 RPC framework, Apollo is a
hierarchical storage system where all the data is split into shards, very
much analogous to region servers in HBase. The sweet-spot for it, Johnson
explained, is on-line low latency storage - in particular Flash and in-
memory.
As distinct from a document oriented, or key value store, Apollo is about
modifications to data structures, allowing you to represent maps, queues,
trees and so on, as well as key values. Within the system individual pieces
of data are quite small - a range of between 1 byte and 1MB, with a total
size anywhere from 1MB to 10+PB. It supports anything from a minimum of
three servers to thousands.
Each Shard has four components. The first is a quorum consensus protocol
which is based on Raft, a strong leader consensus protocol from Stanford.
Johnson explained that one of the things that his team really like about
Raft is that the leader failure recovery is really well defined, as is the
quorum view change. That said, he suggested, it isn’t really simpler to
work with than multi-paxos:
We’ve had to do tons and tons and stuff - everything from allowing you
to asynchronous write and read from disk to trying to deal with situations
when followers are getting behind because there’s other stuff going on on
the server or the disk is slow, corruption detection, and so on
The second component is storage. At the time of writing the primary storage
is based RocksDB, a Key/Value store that builds on top of Google’s LevelDB.
Whilst it is a Key/Value store Facebook are using it to emulate other data
structures. Apollo is designed to be storage agnostic and the team are also
working on adding support for MySQL as an alternative storage engine.
The third component is a Client API with read() and write() methods. Every
operation that Apollo performs at a Shard level is atomic, so you express
pre-conditions and if these are satisfied it returns the reads or writes.
For example this code:
read(conditions : {map(m1).contains(x)},
reads : {deque(d2).back()})
Says “If the map m1 contains the value x then return the value that is on
the back of the d2 deque.”
You can combine together any number of conditions and any number of reads.
Writes are very similar and again allow you to express conditions:
write(conditions : {ver(k1) == v}, reads : {},
writes : {val(k1) := x})
The final of the four shard components are Fault Tolerant State Machines (
FTSMs). These are primarily used by the system code but can also be used for
user code. Each FTSM is owned by a shard so that, for example, in a shard
of three machines all of them will be executing the same code at the same
time. They are able to access the persistent storage that is local to each
machine. Most importantly if one node dies the code continues to execute in
a proper order that all the nodes agree on.
Amongst other things the state machines are used for load balancing, data
migration, shard creation and destruction, and co-ordinating cross-shard
transactions. State machines can have external side effects, for example
they can send RPC requests to remote machines, but whenever they make a
change to persistent state they have to submit it to Raft to get all the
servers to agree.
Apollo isn't currently being used in production at Facebook, but the firm is
looking at using it to replace some memcahced use cases, and Johnson made
clear that Facebook makes very significant use of memcahced. "More
generally," Johnson told InfoQ "we’re looking into various in-memory
storage use cases at Facebook, either new ones or replacing some existing
ones, by comparing side-by-side with existing systems."
The company is also looking at using Apollo as a reliable queuing system for
outgoing Facebook messages to iOS, Android and carriers via SMS, and also
potentially for faster analytics.
Apollo is still in development and hasn’t been open-sourced though Johnson
did state that doing so was something Facebook were looking into and would
like to do. Johnson's presentation is currently available to QCon New York
attendees and will be published to everyone via InfoQ in due course.

(共1页)

进入MobileDevelopment版参与讨论

相关主题
● [转]Alibaba全球招华人技术牛人！！！	● C++问题3
● 岳东晓知识产权案获胜 (转载)	● 我也来说说我Amazon的onsite经历吧
● 要不要跳ASP.net and C# 坑？	● CS选课求问
● need your comments	● 生平第一次用C#写程序
● 感觉vert.x的设计很一般呀	● 为什么板上这么多人还是抱着C++不学Java呢？
● Cassandra Rewritten In C++, Ten Times Faster	● 吐槽g家package, 顺便说说昨天L家面经
● 岳东晓知识产权案获胜	● Leetcode用Python刷通关可以去面试不
● 求内推湾区IT公司职位	● 怎样把LAMP server和C++写的后台处理程序结合起来？（新手问）

相关话题的讨论汇总
话题: facebook话题: apollo话题: storage话题: johnson话题: shard

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天