.
- Simple: INT and FLOAT are 32 bit signed numeric datatypes backed by java.lang.Integer and java.lang.Float
- Simple: LONG and DOUBLE are 64 bit signed numeric Java datatypes
- Simple: CHARARRAY (Unicode backed by java.lang.String)
- Simple: BYTEARRAY (Bytes / Blob, backed by Pig’s DataByteArray class that wraps byte[])
- Simple: BOOLEAN (“true” or “false” case sensitive)
- Simple: DATETIME (Supported format “1970-01-01T00:00:00.000+00:00”).
- Simple: BIGDECIMAL and BIGINTEGER (size same as in Java)
- Complex: TUPLE (resembles a Row and is an ordered “Set” of values) : Indicated by ( Parentheses ).
- Complex: BAG (an unordered “Collection” of Tuples- {(tuple4), (tuple1), …} ) : Indicated by { Curly Braces }.
- Complex: MAP (a “Collection” of Key-Value pairs: [‘state’#’MD’, ‘name’#’Vipul’] ) : Indicated by [ Square Brackets and stores Key#Value].
The complex types can have other complex datatypes as fields, e.g. A MAP may contain a Tuple as its value and that Tuple may have a Bag as one of its fields. Map’s key is supposed to be a character array, while value can be of any type. The most smallest element of data is an atom (a cell basically) and a collection of atoms make a tuple. A GROUP operation returns a BAG of grouped rows (tuples), which clearly explains that these complex data types can be used in nested fashion. Pig is a gently typed language (neither strongly typed, nor weakly typed). If schema and type information is provided, Pig stick to the standard, however if the type information is not provided, Pig still allow to use of fields and try to conclude the data type based on it’s use.
.

Leave a comment