S******y 发帖数: 1123 | 1 随着Apache Projects 的层出不穷, 大家来复习一下 我们经常听到的大数据方面的一
些术语 -
Accumulo
a sorted, distributed key/value store that provides robust, scalable data
storage and retrieval
Ambari
A completely open source management platform for provisioning, managing,
monitoring and securing Apache Hadoop clusters
Atlas
a scalable and extensible set of core foundational governance services –
enabling enterprises to effectively and efficiently meet their compliance
Falcon
a data governance engine that defines, schedules, and monitors data
management policies. Falcon allows Hadoop administrators to centrally define
their data pipelines, and then Falconuses those definitions to auto-
generate workflows inApache Oozie.
Flume
a distributed, reliable, and available service for efficiently collecting
, aggregating, and moving large amounts of streaming data into the Hadoop
Distributed File System (HDFS).
Hadoop
an open source software platform for distributed storage and distributed
processing of very large data sets on computer clusters built from commodity
hardware.
HBase
a distributed, scalable, big data store.
Hive
a data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis
Kafka
an open-source message broker project developed by the Apache Software
Foundation written in Scala
Knox
a REST API Gateway for interacting with Apache Hadoop clusters.
Metron
an open source project dedicated to providing an extensible and scalable
advanced security analytics too
Oozie
a Java Web application used to schedule Apache Hadoop jobs. Oozie combines
multiple jobs sequentially into one logical unit of work.
Pig
a high-level platform for creating MapReduce programs used with Hadoop
Ranger
a framework to enable, monitor and manage comprehensive data security across
the Hadoop platform. T
Solr
an enterprise search platform that forms a component of Apache Lucene, an
information retrieval software library
Spark
the open standard for flexible in-memory data processing that enables batch
, real-time, and advanced analytics on the Apache Hadoop platform.
Sqoop
a tool designed for efficiently transferring bulk data between Apache Hadoop
and structured datastores such as relational databases.
Storm
an open source distributed realtime computation system. Stnorm makes it
easy
to reliably process unbounded streams of data, doing for realtime processing
what Hadoop did for batch processing.
Tez
an extensible framework for building high performance batch and interactive
data processing applications, coordinated by YARN in Apache Hadoop
ZooKeeper
an open source file application program interface (API) that allows
distributed processes in large systems to synchronize with each other
Yarn
The resource management layer for theApache Hadoop ecosystem
温故而知新 :-) |
|