Scheduling a Singer pipeline on Google Cloud – Part 3 Adwords to Bigquery
If you have followed the tutorial you will have a docker image with your stitch pipeline in Google container registry.

For running this setup we will be implementing the following setup in Google Cloud.

Here are the tasks we will need to complete the setup:-
- Create a VM instance to get the historical data
- Create a pub/sub topic to trigger cloud functions
- Create Google cloud functions to start and stop the instance
- Setup cloud schedular jobs
Create a VM instance to get the historical data
The strategy I am following, in this case, is that I am downloading all the historical data till the current data and store the current date in the state.json file and everyday I run the cronjob to get data of the previous data into Bigquery.
We will set up a containerized compute engine using the docker image we created in the previous article, the reason for this is that we do not need the compute engine to be on all the time. The compute engine will simply run a specified container upon boot (Make sure to set restart to On Failure
and give Privileged Access
to the container). You can also try using a startup script for implementing the same.
gcloud compute instances create-with-container temp-instance \ --image gcr.io/careful-parser-269221/ga-bigquery-replication:latest \ --zone us-east1-d \ --machine-type n1-standard-1 \ --container-restart-policy on-failure
Run the code in your google cloud shell to create the container, after the instance is created in google cloud, you should be able to see a new table and data in Bigquery with the historical data.
Create Pub/Sub Topics
After this, you will need to create two pub/sub topics, one for starting the instance and another one to stop the instance.

Create Google Cloud Functions
We will need two google cloud functions that are associated with respective pub/sub topics.
Starting the Instance

Here is the code for index.js
/** * Triggered from a message on a Cloud Pub/Sub topic. */ var Compute = require('@google-cloud/compute'); var compute = Compute(); exports.startInstance = function startInstance(req, res) { var zone = compute.zone('Zone of your instance'); var vm = zone.vm('Name of your instance'); vm.start(function(err, operation, apiResponse) { console.log('instance start successfully'); }); res.status(200).send('Success start instance'); };
Package.json
{ "name": "sample-pubsub", "version": "0.0.1", "dependencies": { "@google-cloud/pubsub": "^0.18.0", "@google-cloud/compute": "0.7.1" } }
Stopping the Instance

Index.js
var Compute = require('@google-cloud/compute'); var compute = Compute(); exports.stopInstance = function stopInstance(req, res) { var zone = compute.zone('us-east1-d'); var vm = zone.vm('temp-instance'); vm.stop(function(err, operation, apiResponse) { console.log('instance stop successfully'); }); res.status(200).send('Success stop instance'); };
Package.json
{ "name": "sample-pubsub", "version": "0.0.1", "dependencies": { "@google-cloud/pubsub": "^0.18.0", "@google-cloud/compute": "0.7.1" } }
Creating the Cloud Scheduler Jobs
This is the last step, of the setup. I have set up two jobs. The first one starts the instance sending a message to the pub/sub topic. This starts every morning at 1 am

At 1.30 every morning I stop the instance by using another job.

That’s All Folks
If you set up all the above steps you will have an automated pipeline running in Google Cloud.
Here are the links to all articles in this series:-
a) Part 1:- Creating a singer pipeline getting data from google adwords to bigquery
b) Part 2:- Creating the docker file for the singer pipeline
c) Part 3:- Automating the singer pipeline
No comments yet.
Add your comment