Getting Data From Google Adwords into Bigquery Using Singer- Part 1

When I was working on this problem, I found the documentation to be lacking in a lot of aspects. So, I decided to cover this in detail.

Here are the steps we will follow:-

a) Getting the catalog.json file ready for tap-adwords

b) Getting data into target-bigquery

c) Working with state.json file for tap-adwords

Creating a Config.json file for tap-adwords

Singer recommends different virtual environments for both the tap and the target. So, first step is to create a new virtual environment where we will install tap-adwords.

virtualenv -p python3 env-tap
source env-tap/bin/activate
pip3 install tap-adwords

The first this you need to run tap-adwords is a config file.

{"developer_token": "xxxxxxxxxxxxxxxxxx",
 "oauth_client_id": "xxxxxxxxxxxxxxxxxx",
 "oauth_client_secret": "xxxxxxxxxxxxxx",
 "refresh_token": "xxxxxxxxxxxxxxxxx",
 "start_date": "2020-03-19T00:00:00Z",
 "end_date": "2020-03-20T00:00:00Z",
 "conversion_window_days":0,
 "user_agent": "adwords",
 "customer_ids": "xxxxxxxxx"}

So here is from where you get all these values from, start date and end date are pretty self-explanatory:-

a) Developer token- You can get a developer token by logging in to your Google AdWords manager account and navigating to the AdWords API Center.

b) OAuth Client Id and Oauth Client Secret – These values you can create a new OAuth client id in your API and services section google cloud console.

Create the OAuth id for a web application. In the redirect URL’s for your app add, https://developers.google.com/oauthplayground

c) Refresh Token- I used the Google OAuth playground to get the refresh token.

Make sure to use your own client tokens to generate the refresh tokens.

d) Conversion window days – This is a tricky one, the official documentation doesn’t say much about it but if you do not put this field the tap will get data one-month past data, as the default conversion window is 30 days.

e) Customer Id – This is the account id for which we are getting the information.

Creating a Catalog File for tap-adwords

If you want to explore the available field for tap-adwords you can run it in discover mode and get a sample catalog file.

tap-adwords --config adwordsconfig.json --discover > catalog.json

But by default, all fields in singer are null and not selected.

The best way that I found to create a catalog file is this Github repository, https://github.com/chrisgoddard/singer-discover

pip install https://github.com/chrisgoddard/singer-discover/archive/master.zip
singer-discover --input catalog.json --output catalog.json

Simply take the catalog that you got from the discover mode and use the GitHub repository to select the required fields.

In your virtual environment for your tap install the singer-discover repository and use it to create required catalog file.

Creating the State.json File for tap-adwords

This step is optional but if you want to run stitch as part of a pipeline, you will need this file.

The state.json file bookmarks till where the data has been downloaded, so in the next run we can start from the bookmark.

Here is the format for the state.json file.

{"bookmarks": {"KEYWORDS_PERFORMANCE_REPORT_{{Client_ID}}": 
{"date": "2020-03-20T00:00:00.000000Z"}}}

Replace {{Client_ID}} with client Id for which you are running the report.

For solving the issue with the conversion window you can also use the state.json file, https://github.com/singer-io/tap-adwords/pull/64/files

Getting Started with Bigquery

Okay, now the next step is to get the data coming from the tap into Bigquery, if you want to test you can always send the data coming from tap-adwords into CSV to check it out.

Creating the config file for Bigquery

Here is the config file that I used

 {
    "project_id": "careful-parser-xxxxxx",
    "dataset_id": "newDataset1",
    "validate_records": true,
    "disable_collection": true
}

I skipped declaring the table here, as when I did that I was getting an error, this way singer just creates a table for you with the name of the report that you have selected.

Another suggestion that I have gotten but not used is to go with this repository, as it has more recent commits. https://github.com/dreamdata-io/target-bigquery . I didn’t use it, but in case you run into issues, give it a shot.

Getting the Client.json file for Bigquery

Go into IAM & Admin, and then go to Service Accounts. Either create a new service account or get the keys for an existing service account.

By Clicking on the three dots to the right > Key > Json.

Upload the file into your compute instance and set the GOOGLE_APPLICATION_CREDENTIALS environment variable on the machine.

export GOOGLE_APPLICATION_CREDENTIALS=$PWD/client_secrets.json

After this, you can authenticate your google bigquery credentials and you are set to use target-bigquery.,

gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS

Run your Pipeline

env-tap/bin/tap-adwords -c adwordsconfig.json -p output.json  -s state.json | env-target/bin/target-bigquery -c bigqueryconfig1.json | tee state.json

Your date for the date range should appear in google Bigquery.

Next Steps

Here are the links to all articles in this series:-

a) Part 1:- Creating a singer pipeline getting data from google adwords to bigquery
b) Part 2:- Creating the docker file for the singer pipeline
c) Part 3:- Automating the singer pipeline

About the author

admin

Mastering Data Engineering/ Data science one project at a time. I have worked and developed multiple startups before, and this blog is for my journey as a startup in itself where I iterate and learn.

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Copyright © 2020. Created by Meks. Powered by WordPress.