L******e 发帖数: 136 | 1 现在很多软件系统需要处理big data,想在这里问一下大家都用什么? 是传统的数据
库,还是最新的 no sql数据库?还是别的? |
h****r 发帖数: 2056 | 2 这个第一步要看你打算怎么存数据。
分布式存有分布式的路数,集中存储有集中的路数。
传统的数据库不太适合真正意义上的big data。oracle都在推出接口来衔接big data。
【在 L******e 的大作中提到】 : 现在很多软件系统需要处理big data,想在这里问一下大家都用什么? 是传统的数据 : 库,还是最新的 no sql数据库?还是别的?
|
w***g 发帖数: 5958 | 3 不同人对big有不同的理解。一个硬盘能存3TB。现在一台机器配四个硬盘,即便是mirr
or了也有6TB。所以对有的人来说6TB只是small data。但是如果要load到内存里来算的
话,1TB都算big data了。
【在 L******e 的大作中提到】 : 现在很多软件系统需要处理big data,想在这里问一下大家都用什么? 是传统的数据 : 库,还是最新的 no sql数据库?还是别的?
|
X****r 发帖数: 3557 | 4 大数据不在于绝对大小,而在于scalability。换句话说,如果需要的资源是随数据/流量
接近线性增长,而处理/反应时间基本不变,这样的架构可以认为是处理大数据的。
mirr
【在 w***g 的大作中提到】 : 不同人对big有不同的理解。一个硬盘能存3TB。现在一台机器配四个硬盘,即便是mirr : or了也有6TB。所以对有的人来说6TB只是small data。但是如果要load到内存里来算的 : 话,1TB都算big data了。
|
m*******p 发帖数: 141 | 5 This answer makes sense!!
Thanks.
Would you please also provide a little bit of tips about the popular methods
?
For example,
This reminds me of hadoop. the map/reduce provides a good interface for
processing the single big file, based on the hdfs, blablabla.......
I actually don't have any experience of this, but I want to say something
when the interviewer mentions this topic.
Thanks!
流量
【在 X****r 的大作中提到】 : 大数据不在于绝对大小,而在于scalability。换句话说,如果需要的资源是随数据/流量 : 接近线性增长,而处理/反应时间基本不变,这样的架构可以认为是处理大数据的。 : : mirr
|
g*****g 发帖数: 34805 | 6 Most large scale applications have bottleneck at DB.
People have been using caching, and in recent years,
NoSQL DB to tackle the problem.
methods
【在 m*******p 的大作中提到】 : This answer makes sense!! : Thanks. : Would you please also provide a little bit of tips about the popular methods : ? : For example, : This reminds me of hadoop. the map/reduce provides a good interface for : processing the single big file, based on the hdfs, blablabla....... : I actually don't have any experience of this, but I want to say something : when the interviewer mentions this topic. : Thanks!
|
d*******1 发帖数: 854 | 7 can you elaborate a little bit more on NoSQL DB?
Thanks
★ 发自iPhone App: ChineseWeb - 中文网站浏览器
【在 g*****g 的大作中提到】 : Most large scale applications have bottleneck at DB. : People have been using caching, and in recent years, : NoSQL DB to tackle the problem. : : methods
|
g*****g 发帖数: 34805 | 8 That's a big topic. But if you ever heard of CAP theorem.
Basically it's availability vs. consistency.
Traditional DB is consistent but cannot be clustered in
linear scalability. NoSQL DB uses so called eventual
consistency to achieve linear scalablity.
【在 d*******1 的大作中提到】 : can you elaborate a little bit more on NoSQL DB? : Thanks : : ★ 发自iPhone App: ChineseWeb - 中文网站浏览器
|
c****e 发帖数: 1453 | 9 Twitter's chief engineer has a very good blog talking about how to beat CAP.
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
Hadoop based Map-Reduce is considered not fast enough in many applications.
It might take hours to run the job. In many cases, you have to have two
layers: in memory DB to do event stream processing and Map-Reduce based
batch processing.
【在 g*****g 的大作中提到】 : That's a big topic. But if you ever heard of CAP theorem. : Basically it's availability vs. consistency. : Traditional DB is consistent but cannot be clustered in : linear scalability. NoSQL DB uses so called eventual : consistency to achieve linear scalablity.
|