Archive - June 1, 2020

1
Creating Your First Data Pipeline in Google Cloud with Apache Sqoop and Apache Airflow

Creating Your First Data Pipeline in Google Cloud with Apache Sqoop and Apache Airflow

Welcome to taking the first steps to create your first data pipeline. By the end of it, you will have a data pipeline that takes data from Mysql, we do some preprocessing on it and then store it in google Bigquery. In this exercise, we will be creating an imaginary pipeline for sales data that is present in transactional databases, in this case, cloud SQL. We are basing this exercise on the following Kaggle challenge, https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data . Here are the steps that we will be taking to setup this data pipeline.a) Creating a cloud MySQL instance in google cloud and upload the database to it.b) Create a pipeline to get the data from cloud SQL into Google Bigquery Creating a SQL Database and Upload Data For our first step, we will create a Cloud SQL database on google cloud, we will upload all the files we got from Kaggle into[…]

Read More

Copyright © 2020. Created by Meks. Powered by WordPress.