Vipul

  • Cloudera Certified Developer for Apache Hadoop (CCDH)

    . Happy to share the earning of CCDH certification  🙂 Verification URL:  http://certification.cloudera.com/verify   (with License # 100-013-285) . Loads of conceptual as well as programming questions that include multiple choice questions as well. Reading Hadoop: The Definitive Guide and Programming Hive a couple of times and practicing Map-Reduce programming model rigorously, was instrumental in clearing the exams. Setting up

    Read more →

  • . — emp  = LOAD ‘Employees.txt’ … Data in text file resembles the “EMP” table in Oracle — dept = LOAD ‘Dept.txt’ …….. Data in text file resembles the “DEPT” table in Oracle — Filter data in emp to only those whose job is Clerk. Filtered_Emp = FILTER emp BY (job == ‘CLERK’); — Supports

    Read more →

  • Pig Data Types

    Pig Data Types

    . Simple: INT and FLOAT are 32 bit signed numeric datatypes backed by java.lang.Integer and java.lang.Float Simple: LONG and DOUBLE are 64 bit signed numeric Java datatypes Simple: CHARARRAY (Unicode backed by java.lang.String) Simple: BYTEARRAY (Bytes / Blob, backed by Pig’s DataByteArray class that wraps byte[]) Simple: BOOLEAN (“true” or “false” case sensitive) Simple: DATETIME

    Read more →

  • Pig is a data flow language developed at Yahoo and is a high level language. Pig programs are translated into a lower level instructions supported by underlying execution engine. Pig is designed for working on complex operations with speed.

    Read more →