由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - 大数据 Terminology
相关主题
Quicken Loans 内部推荐请问SAS certificate对找工作帮助大不大?
在集成的cloudera hadoop中计算词频(wordcount)纽约top投行证券部门招募Entry Level C# Developers,年薪85k
来讲讲SAS的优点吧different between familywise/experimentwise error rate
julia有前途吗?Which package in R connects Hive and R?
问一个简历的问题amazon电面求助
Confirm一下one基本的Terminology如何在R 里 提高读取大数据的速度
统计在医院SAFETY & QUALITY部门面试问题数据挖掘/人工智能 兼职机会
求推荐genetics入门大数据该怎么处理?
相关话题的讨论汇总
话题: hadoop话题: apache话题: data话题: platform
进入Statistics版参与讨论
1 (共1页)
S******y
发帖数: 1123
1
随着Apache Projects 的层出不穷, 大家来复习一下 我们经常听到的大数据方面的一
些术语 -
Accumulo
a sorted, distributed key/value store that provides robust, scalable data
storage and retrieval
Ambari
A completely open source management platform for provisioning, managing,
monitoring and securing Apache Hadoop clusters
Atlas
a scalable and extensible set of core foundational governance services –
enabling enterprises to effectively and efficiently meet their compliance
Falcon
a data governance engine that defines, schedules, and monitors data
management policies. Falcon allows Hadoop administrators to centrally define
their data pipelines, and then Falconuses those definitions to auto-
generate workflows inApache Oozie.
Flume
a distributed, reliable, and available service for efficiently collecting
, aggregating, and moving large amounts of streaming data into the Hadoop
Distributed File System (HDFS).
Hadoop
an open source software platform for distributed storage and distributed
processing of very large data sets on computer clusters built from commodity
hardware.
HBase
a distributed, scalable, big data store.
Hive
a data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis
Kafka
an open-source message broker project developed by the Apache Software
Foundation written in Scala
Knox
a REST API Gateway for interacting with Apache Hadoop clusters.
Metron
an open source project dedicated to providing an extensible and scalable
advanced security analytics too
Oozie
a Java Web application used to schedule Apache Hadoop jobs. Oozie combines
multiple jobs sequentially into one logical unit of work.
Pig
a high-level platform for creating MapReduce programs used with Hadoop
Ranger
a framework to enable, monitor and manage comprehensive data security across
the Hadoop platform. T
Solr
an enterprise search platform that forms a component of Apache Lucene, an
information retrieval software library
Spark
the open standard for flexible in-memory data processing that enables batch
, real-time, and advanced analytics on the Apache Hadoop platform.
Sqoop
a tool designed for efficiently transferring bulk data between Apache Hadoop
and structured datastores such as relational databases.
Storm
an open source distributed realtime computation system. Stnorm makes it
easy
to reliably process unbounded streams of data, doing for realtime processing
what Hadoop did for batch processing.
Tez
an extensible framework for building high performance batch and interactive
data processing applications, coordinated by YARN in Apache Hadoop
ZooKeeper
an open source file application program interface (API) that allows
distributed processes in large systems to synchronize with each other
Yarn
The resource management layer for theApache Hadoop ecosystem
温故而知新 :-)
1 (共1页)
进入Statistics版参与讨论
相关主题
大数据该怎么处理?问一个简历的问题
贴一个视频:RHIPE: An Interface to Hadoop and R for Large and Complex Data AnalysisConfirm一下one基本的Terminology
big data analysis in Revolution R统计在医院SAFETY & QUALITY部门面试问题
R 有点令人失望求推荐genetics入门
Quicken Loans 内部推荐请问SAS certificate对找工作帮助大不大?
在集成的cloudera hadoop中计算词频(wordcount)纽约top投行证券部门招募Entry Level C# Developers,年薪85k
来讲讲SAS的优点吧different between familywise/experimentwise error rate
julia有前途吗?Which package in R connects Hive and R?
相关话题的讨论汇总
话题: hadoop话题: apache话题: data话题: platform