Getting started with Pig scripting in Hadoop
In this blog, we will cover some basics about Pig scripting, and get you started with it. Pig is a high-level scripting language that enables programmers to write SQL like queries to get results from the HDFS file system. Pig converts the queries into MapReduce task hence decreases the time and investment that was needed before to run write MapReduce functions. Setting Up the Pig scripting environment I am writing this tutorial using the google Dataproc service. Google Dataproc comes with a has a built configuration for the following services, so we do not need to do anything special to run a google pig script. Spark 2.3.1 Apache Hadoop 2.9.0 Apache Pig 0.17.0 Apache Hive 2.3.2 Apache Tez 0.9.0 Cloud Storage connector 1.9.9-hadoop2 Scala 2.11.8 Python 2.7 Main Components of a Pig Script Load Data First of all, let’s get some data to start processing, let us use the ml-100k[…]
Read More
No comments yet.
Add your comment