Hadoop: Overview
Move computation not data.
Hadoop performance and data scale facts.
Hadoop in the context of other data stores.
The Apache Hadoop Project.
Hadoop – an inside view: MapReduce and HDFS.
The Hadoop Ecosystem.
What about NoSQL?
MapReduce Map and Reduce.
Java Map Reduce.
Running a Distributed Map.
Reduce Job Hadoop Streaming: Python
The Hadoop Distributed Filesystem
HDFS Design & Concepts
Blocks, Namenodes and Datanodes
hadoop fs The Command-Line Interface
Basic Filesystem Operations
Reading Data from a Hadoop URL
Reading Data Using the FileSystem API
Data Flow Anatomy of a File Read
Anatomy of a File Write Coherency Model
Gps Infotech
How MapReduce Works
Anatomy of a MapReduce Job Run
Job Submission Job Initialization, Task Assignment, Task Execution
Progress and Status Updates
Job Completion, Failures
Job Scheduling
Fair Scheduler
Shuffle and Sort – Map Side, Reduce Side
Configuration Tuning
Task Execution, Speculative Execution, Task JVM Reuse, Skipping Bad Records
The Task Execution Environment
Distributed Cache
Hadoop Administrator
Setting Up a Hadoop Cluster
Cluster Specification
Network Topology
Cluster Setup and Installation
SSH Configuration
Hadoop Configuration
Configuration Management
Environment Settings
Important Hadoop Daemon Properties
Hadoop Daemon Addresses and Ports
Gps Infotech
Post Install
Benchmarking a Hadoop Cluster: TeraByte Sort on Apache
Hadoop on Amazon EC2
Monitoring, Logging Routine Administration Procedures
Commissioning and Decommissioning Nodes
Installing and Running Pig
Execution Types
Running Pig Programs
User-Defined Functions
Basic concepts.
Concepts Data Model, Schema Design
Test Drive
Clients Java
REST and Thrift