HADOOP

HADOOP TRAINING

HADOOP TRAINING COURSE CONTENT


  1. THE MOTIVATION OF HADOOP
  • Problems with traditional large scale systems
  • Requirement for a new apache
  • Introducing Hadoop
  1. HADOOP BASIC CONCEPTS
  • Hadoop project and Hadoop components
  • Hadoop distributed file system
  • Hadoop on exercise using HDFS
  • How map reduce works
  • Hands on exercise running a map reduce job
  • How a Hadoop cluster operates
  • Other Hadoop Ecosystem projects
  1. WRITING A MAP REDUCE PROGRAM
  • The Map reduce flow
  • Basic map reduce API concepts
  • Writing map reduce drivers, mappers and reducers in java
  • Writing mappers and reducers in another languages using the streaming API
  • Speeding up hadoop development by using eclipse
  • Hands on exercise writing a Map reduce program
  • Difference between old and new Map reduces APIs
  1. UNIT TESTING MAP REDUCE PROGRAMS
  • Unit testing
  • The J unit and MR unit testing frame works
  • Writing unit tests and MR units
  • Hand on exercise writing unit test and MR test frame works
  1. DELVING DEPER IN TO HADOOP API
  • Using the tool runner class
  • Decreasing the amount of intermediate data with combiners
  • Hands on experience writing and implementing combiners
  • Setting up and tearing down mappers and reducers by using the configure and close methods
  • Writing custom practitioners for better load balancing
  • Hands-on exercise on writing a practitioner
  • Accessing HDFS programmatically
  • Using the distributed cache
  • Using the Hadoop APIs library of mappers, reducers and practitioners
  1. PRACTICAL DEVELOPMENT TIPS AND TECHNIQUES
  • Strategies for debugging map reduce code
  • Testing map reduce code locally by using local job reducer
  • Writing and viewing log files
  • Retrieving job information with counters
  • Determining the optimal number of reducers for a job
  • Creating map only map reduce jobs
  • Hands on exercise using counters and a map only job
  1. DATA INPUT AND OUTPUT
  • Creating custom writable and writable comparable implementations
  • Saving binary data using sequence file and Avro data files
  • Implementing custom input formats and output formats
  • Issues to consider when using file compression
  • Hands-on exercises using sequence files and file compression
  1. COMMAN MAP REDUCE ALLOGORITHMS
  • Sorting and searching large data sets
  • Performing a secondary sort
  • Indexing data
  • Hand-on exercise creating an inverted index
  • Computing term frequency -inverse document frequency
  • Calculating word concurrence
  • Hands-on exercise calculating word concurrence
  • Hands-on exercise implementing word concurrence with a customer writable comparable
  1. JOINING DATA SETS IN MAP REDUCE JOBS
  • Writing a map-side join
  • Writing a reduce -side join
  1.    INTEGRATING HADOOP IN TO ENTERPRISE WORK FLOW
  • Integrating hadoop in to an existing enterprise
  • Loading data from an RDBMS in to HDFS by using sqoop
  • Hands-on exercise importing data with sqoop
  • Managing real-time data using flume
  • Accessing HDFS from legacy systems with fuse DFS and HTTP FS
  1. MACHINE LEARNING AN MAHOUT
  • Introduction to machine learning
  • using mahout
  • Hands-on exercise using a mahout recommended
  1. AN INTRODUCTION HIVE AND PIG
  • The motivation for HIVE and PIG
  • Hive basics
  • Hands-on exercise manipulating data with HIVE
  • PIG basics
  • Hand-on exercise using PIG to retrieve movie names from our recommender
  • Choosing between HIVE and PIG
  • Introduction to OOZIE,HADOOP ONLINE TRAINING,HADOOP TRAINING 
  • Creating OOZE work flow
  • Hand-on exercise running and OOZE work flow

CONCLUSION

APPENDIX: GRAPH PROCESSING IN MAP REDUCE AN INTRODUCTION TO OOZIE