Apache Hadoop – Vipul Pathak

July 24, 2015

Cloudera Certified Developer for Apache Hadoop (CCDH)

. Happy to share the earning of CCDH certification 🙂 Verification URL: http://certification.cloudera.com/verify (with License # 100-013-285) . Loads of conceptual as well as programming questions that include multiple choice questions as well. Reading Hadoop: The Definitive Guide and Programming Hive a couple of times and practicing Map-Reduce programming model rigorously, was instrumental in clearing the exams. Setting up…

Read more →

July 18, 2015

Writing a Generic UDF in Hive

. There are few type of UDFs that we can write in Hive. Functions that act on each column value passed to it, e.g. Select Length(name) From Customer Specific functions written for a specific data type (simple UDFs) Generic functions written to working with more than one data type Functions that act on a group…

Read more →

June 25, 2015

Loading Data in Pig Using HCatalog

. HCatalog is an extension of Hive and in a nutshell, it exposes the schema information in Hive Metastore such that applications outside of Hive can use it. The objective of HCatalog is to hold the following type of information about the data in HDFS – Location of the data Metadata about the data (e.g.…

Read more →

June 13, 2015

Writing Simple UDF in Hive

. There are a few type of UDFs that we can write in Hive. Functions that act on each column value passed to it, e.g. Select Length(name) From Customer Specific functions written for a specific data type Generic functions written to working with more than one data type (GenericUDF) Functions that act on a group…

Read more →

June 6, 2015

Writing EvalFunc UDF in Pig

. UDFs (User Defined Functions) are ways in pig to extend its functionality. There are two type of UDFs that we can write in pig – Evaluate (extends from EvalFunc base class) Load/Store functions (extends from LoadFunc base class) Here we will stepwise develop an Evaluate UDF. Lets start by conceptualizing a UDF (named VowelCount)…

Read more →