Mryqu's Notes


  • 首页

  • 搜索
close

[MapR培训笔记] Hadoop生态系统

时间: 2015-12-05   |   分类: BigData     |   阅读: 378 字 ~2分钟

[MapR培训笔记] Hadoop生态系统

  • SQL: Language designed for querying & transformingdata held in a relational database management system.
  • NoSQL: Database model that’s not necessaryly held intabular format. Often, NoSQL refers to data that is flat or nestedin format.
  • Log Data: Information captured about organization’sinternal system, external customer interactions & how they’reused
  • Streaming Data: Twitter, Facebook, Web click data, Webform data.
  • Flume: Reliable, scalable service used to collectstreaming data in Hadoop cluster.[MapR培训笔记] Hadoop生态系统
  • Sqoop: Transfers data between external data store &Hadoop cluster. Command line tool used with both RDBMS & NoSQLdata.[MapR培训笔记] Hadoop生态系统
  • NFS: Distributed file system protocal allowing computerto access files over network as through they were locally mountedstorage.
  • Kafka: Fast, scalable open source messaging servicedistributing log data over a cluster.
  • Map Reduce: Multi step process splitting large data setinto parts, & performing functions to produce finalresult.
  • Spark: General data analytucs on distributed computingcluster. Provides in-memory computation, enhances speed of dataprocessing over MapReduce.[MapR培训笔记] Hadoop生态系统
  • Drill: Query engine for Big Data exploration. Performsdynamic queries on self-describing data in files (JSON, parquest,text, MapR-DB and HBase tables).[MapR培训笔记] Hadoop生态系统
  • Mahout: Scable machine learning libraries. Corealgorithm supporting data clustering, classfication,categorization, collaborative filter (recommendation engine).[MapR培训笔记] Hadoop生态系统
  • Oozie: Scalable & extensible scheduling systemcreating complex Hadoop workflows. Manages multiple job types inworkflow (MapReduce, Pig, Hive, Sqoop, distcp, script & Javaprogram).[MapR培训笔记] Hadoop生态系统
  • Hive: Data warehouse instructure. SQL-like syntax forqueries using MapReduce. HiveQL provides operations similar to SQL,transformed into MapREduce program for query results.[MapR培训笔记] Hadoop生态系统
  • Pig: Platform for analyzing data. Consists of data flowscripting language called ‘Pig Latin’ & Instructure forconverting scripts into sequence of MapReduce program.[MapR培训笔记] Hadoop生态系统
  • HBase: Hadoop database, provided random, realtime,read/write access to very large data.[MapR培训笔记] Hadoop生态系统
  • Elatic search: Scalable search server build on top ofLucene that support real-time or near real time search.
  • Solr: Open source search platform, part of the ApacheLucene project.
  • SiLK: Interface for Lucene based Solr search platform.Provides tool to perform searches & visualize results in adashboard with reports.
  • Web Tier: Web site delivery of big data processingresults directly to consumer (new song recommendation base on whatthe consumer hsa listened to previsouly).
  • Banana: Based on Kibana. Used to create dashboards forvisualizing data stored in Solr.
  • Kibana: Browser based dashboard for search & dataanalytics. But with HTM: & JavaScript. Easy to set up &used with just a web server.
  • Data Warehouse: Centralized repository for storing datafrom multiple sources (marketing, finance, human resources, websites, external data stores).

标题:[MapR培训笔记] Hadoop生态系统
作者:mryqu
声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 3.0 CN 许可协议。转载请注明出处!

#hadoop# #ecosystem# #mapr#
[C++] GNU Binutils之ar和ranlib
[Spring Boot] 让非Spring管理的类获得一个Bean
  • 文章目录
  • 站点概览

Programmer & Architect

662 日志
27 分类
1472 标签
GitHub Twitter FB Page
© 2009 - 2023 Mryqu's Notes
Powered by - Hugo v0.120.4
Theme by - NexT
0%