- SQL: Language designed for querying & transformingdata held in a relational database management system.
- NoSQL: Database model that’s not necessaryly held intabular format. Often, NoSQL refers to data that is flat or nestedin format.
- Log Data: Information captured about organization’sinternal system, external customer interactions & how they’reused
- Streaming Data: Twitter, Facebook, Web click data, Webform data.
- Flume: Reliable, scalable service used to collectstreaming data in Hadoop cluster.
- Sqoop: Transfers data between external data store &Hadoop cluster. Command line tool used with both RDBMS & NoSQLdata.
- NFS: Distributed file system protocal allowing computerto access files over network as through they were locally mountedstorage.
- Kafka: Fast, scalable open source messaging servicedistributing log data over a cluster.
- Map Reduce: Multi step process splitting large data setinto parts, & performing functions to produce finalresult.
- Spark: General data analytucs on distributed computingcluster. Provides in-memory computation, enhances speed of dataprocessing over MapReduce.
- Drill: Query engine for Big Data exploration. Performsdynamic queries on self-describing data in files (JSON, parquest,text, MapR-DB and HBase tables).
- Mahout: Scable machine learning libraries. Corealgorithm supporting data clustering, classfication,categorization, collaborative filter (recommendation engine).
- Oozie: Scalable & extensible scheduling systemcreating complex Hadoop workflows. Manages multiple job types inworkflow (MapReduce, Pig, Hive, Sqoop, distcp, script & Javaprogram).
- Hive: Data warehouse instructure. SQL-like syntax forqueries using MapReduce. HiveQL provides operations similar to SQL,transformed into MapREduce program for query results.
- Pig: Platform for analyzing data. Consists of data flowscripting language called ‘Pig Latin’ & Instructure forconverting scripts into sequence of MapReduce program.
- HBase: Hadoop database, provided random, realtime,read/write access to very large data.
- Elatic search: Scalable search server build on top ofLucene that support real-time or near real time search.
- Solr: Open source search platform, part of the ApacheLucene project.
- SiLK: Interface for Lucene based Solr search platform.Provides tool to perform searches & visualize results in adashboard with reports.
- Web Tier: Web site delivery of big data processingresults directly to consumer (new song recommendation base on whatthe consumer hsa listened to previsouly).
- Banana: Based on Kibana. Used to create dashboards forvisualizing data stored in Solr.
- Kibana: Browser based dashboard for search & dataanalytics. But with HTM: & JavaScript. Easy to set up &used with just a web server.
- Data Warehouse: Centralized repository for storing datafrom multiple sources (marketing, finance, human resources, websites, external data stores).