Archive - June 2020

1
Building a Web Analytics System Using Kafka and Spark Streaming in Google Cloud
2
Creating Your First Data Pipeline in Google Cloud with Apache Sqoop and Apache Airflow

Building a Web Analytics System Using Kafka and Spark Streaming in Google Cloud

The aim of the project is to create a data pipeline, that will receive hits from the website using a Flask rest API. The rest API we publish the data to Kafka topics. We subscribe to these topics using a Google Dataproc cluster. Then we use spark streaming to read the data from the Kafka topic and push it into Google Bigquery. STEP 1 – Pushing data into Kafka Topics from the Rest Api Endpoints Here is the code of the Javascript snippet that I put on the website and the Flask API code. Here is the code for the Flask Api, for Kafka producer, look into resources/webevents.py Here is the code for the file. The bootstrap servers in case of Dataproc are the worker nodes, the kafka by default works on the node 9092, you can connect to the Dataproc cluster using the internal IP of the worker nodes.[…]

Read More

Creating Your First Data Pipeline in Google Cloud with Apache Sqoop and Apache Airflow

Welcome to taking the first steps to create your first data pipeline. By the end of it, you will have a data pipeline that takes data from Mysql, we do some preprocessing on it and then store it in google Bigquery. In this exercise, we will be creating an imaginary pipeline for sales data that is present in transactional databases, in this case, cloud SQL. We are basing this exercise on the following Kaggle challenge, https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data . Here are the steps that we will be taking to setup this data pipeline.a) Creating a cloud MySQL instance in google cloud and upload the database to it.b) Create a pipeline to get the data from cloud SQL into Google Bigquery Creating a SQL Database and Upload Data For our first step, we will create a Cloud SQL database on google cloud, we will upload all the files we got from Kaggle into[…]

Read More

Copyright © 2020. Created by Meks. Powered by WordPress.