Pig Latin and its Operators
Malini Shukla
Senior Data Scientist || Hiring || 6M+ impressions || Trainer || Top Data Scientist || Speaker || Top content creator on LinkedIn || Tech Evangelist
What is Pig Latin?
While we need to analyze data in Hadoop using Apache Pig, we use Pig Latin language. Basically, first, we need to transform Pig Latin statements into MapReducejobs using an interpreter layer. In this way, Hadoop process these jobs.
However, we can say, Pig Latin is a very simple language with SQL like semantics. It is possible to use it in a productive manner. It also contains a rich set of functions. Those exhibits data manipulation. Moreover, by writing user-defined functions (UDF) using Java, we can extend them easily. That implies they are extensible in nature.
Learn more in detail about Apache Pig introduction
Data Model in Pig Latin
The data model of Pig is fully nested. In addition, the outermost structure of the Pig Latin data model is a Relation. Also, it is a bag. While?
- A bag, what we call a collection of tuples.
- A tuple, what we call an ordered set of fields.
- A field, what we call a piece of data.
Statements in Pig Latin
Also, make sure, statements are the basic constructs while processing data using Pig Latin.
- Basically, statements work with relations. Also, includes expressions and schemas.
- Here, every statement ends with a semicolon (;).
- Moreover, through statements, we will perform several operations using operators, those are offered by Pig Latin.
- However, Pig Latin statements take a relation as input and produce another relation as output, while performing all other operations Except LOAD and STORE.
- Its semantic checking will be carried out, once we enter a Load statement in the Grunt shell. Although, we need to use the Dump operator, in order to see the contents of the schema. Because, the MapReduce job for loading the data into the file system will be carried out, only after performing the dump operation.
Let us see Apache Pig Installation on Ubuntu
Pig Latin Example –
Here, is a Pig Latin statement. Basically, that loads data to Apache Pig.
- grunt> Employee_data = LOAD 'Employee_data.txt' USING PigStorage(',')as
- ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
Pig Latin Data types
Further, is the list of Pig Latin data types. Such as:
- int
“Int” represents a signed 32-bit integer.
For Example: 10
- long
It represents a signed 64-bit integer.
For Example: 10L
- float
This data type represents a signed 32-bit floating point.
For Example: 10.5F
- double
“double” represents a 64-bit floating point.
For Example: 10.5
- chararray
It represents a character array (string) in Unicode UTF-8 format.
For Example: ‘Data Flair’
- Bytearray
This data type represents a Byte array (blob).
- Boolean
“Boolean” represents a Boolean value.
For Example : true/ false.
Note: It is case insensitive.
- Datetime
It represents a date-time.
For Example : 1970-01-01T00:00:00.000+00:00
- Biginteger
This data type represents a Java BigInteger.
For Example: 60708090709
- Bigdecimal
“Bigdecimal” represents a Java BigDecimal
For Example: 185.98376256272893883
Let us see Top 3 Apache Pig Books Advised By Pig Experts
i.Complex Types
- Tuple
An ordered set of fields is what we call a tuple.
For Example : (Ankit, 32)
- Bag
A collection of tuples is what we call a bag.
For Example : {(Ankit,32),(Neha,30)}
- Map
A set of key-value pairs is what we call a Map.
Example : [ ‘name’#’Ankit’, ‘age’#32]
ii. Null Values
It is possible that values for all the above data types can be NULL. However, SQL and Pig treat null values in the same way.
On defining a null Value, It can be an unknown value or a non-existent value. Moreover, we use it as a placeholder for optional values. Either, These nulls can be the result of an operation or it can occur naturally.
Pig Latin Arithmetic Operators
Here, is the list of arithmetic operators of Pig Latin. Let’s assume,value of A = 20 and B = 40.
- +
Addition ? It simply adds values on either side of the operator.
For Example: 60, it comes to adding A+B.
- ?
Subtraction – This operator subtracts right-hand operand from left-hand operand.
For Example: ?20, it comes on subtracting A-B
- *
Multiplication ? It simply Multiplies values on either side of the operators.
For Example: 800, it comes to multiplying A*B.
- /
Division ? This operator divides left-hand operand by right-hand operand
For Example: 2, it comes to dividing, b/a
- %
Modulus ? It Divides left-hand operand by right-hand operand and returns the remainder
For Example: 0, it comes to dividing, b % a.
- ? :
Bincond ? This operator evaluates the Boolean operators. Generally, it has three operands. Such as:
variable x = (expression) ?, value1 if true or value2 if false.
For Example:
- b = (a == 1)? 20: 40;
- if a = 1 the value of b is 20.
- if a!=1 the value of b is 40.
- CASE
WHEN
THEN
ELSE END
Case ? It is equivalent to the nested bincond operator.
For Example- CASE f2 % 2
WHEN 0 THEN ‘even’
WHEN 1 THEN ‘odd’
END
See Also -