Getting Data From Google Adwords into Bigquery Using Singer- Part 1
When I was working on this problem, I found the documentation to be lacking in a lot of aspects. So, I decided to cover this in detail.
Here are the steps we will follow:-
a) Getting the catalog.json file ready for tap-adwords
b) Getting data into target-bigquery
c) Working with state.json file for tap-adwords
Creating a Config.json file for tap-adwords
Singer recommends different virtual environments for both the tap and the target. So, first step is to create a new virtual environment where we will install tap-adwords.
virtualenv -p python3 env-tap source env-tap/bin/activate pip3 install tap-adwords
The first this you need to run tap-adwords is a config file.
{"developer_token": "xxxxxxxxxxxxxxxxxx", "oauth_client_id": "xxxxxxxxxxxxxxxxxx", "oauth_client_secret": "xxxxxxxxxxxxxx", "refresh_token": "xxxxxxxxxxxxxxxxx", "start_date": "2020-03-19T00:00:00Z", "end_date": "2020-03-20T00:00:00Z", "conversion_window_days":0, "user_agent": "adwords", "customer_ids": "xxxxxxxxx"}
So here is from where you get all these values from, start date and end date are pretty self-explanatory:-
a) Developer token- You can get a developer token by logging in to your Google AdWords manager account and navigating to the AdWords API Center.
b) OAuth Client Id and Oauth Client Secret – These values you can create a new OAuth client id in your API and services section google cloud console.

Create the OAuth id for a web application. In the redirect URL’s for your app add, https://developers.google.com/oauthplayground
c) Refresh Token- I used the Google OAuth playground to get the refresh token.
Make sure to use your own client tokens to generate the refresh tokens.

d) Conversion window days – This is a tricky one, the official documentation doesn’t say much about it but if you do not put this field the tap will get data one-month past data, as the default conversion window is 30 days.
e) Customer Id – This is the account id for which we are getting the information.
Creating a Catalog File for tap-adwords
If you want to explore the available field for tap-adwords you can run it in discover mode and get a sample catalog file.
tap-adwords --config adwordsconfig.json --discover > catalog.json
But by default, all fields in singer are null and not selected.
The best way that I found to create a catalog file is this Github repository, https://github.com/chrisgoddard/singer-discover
pip install https://github.com/chrisgoddard/singer-discover/archive/master.zip singer-discover --input catalog.json --output catalog.json
Simply take the catalog that you got from the discover mode and use the GitHub repository to select the required fields.
In your virtual environment for your tap install the singer-discover repository and use it to create required catalog file.

Creating the State.json File for tap-adwords
This step is optional but if you want to run stitch as part of a pipeline, you will need this file.
The state.json file bookmarks till where the data has been downloaded, so in the next run we can start from the bookmark.
Here is the format for the state.json file.
{"bookmarks": {"KEYWORDS_PERFORMANCE_REPORT_{{Client_ID}}": {"date": "2020-03-20T00:00:00.000000Z"}}}
Replace {{Client_ID}} with client Id for which you are running the report.
For solving the issue with the conversion window you can also use the state.json file, https://github.com/singer-io/tap-adwords/pull/64/files
Getting Started with Bigquery
Okay, now the next step is to get the data coming from the tap into Bigquery, if you want to test you can always send the data coming from tap-adwords into CSV to check it out.
Creating the config file for Bigquery
Here is the config file that I used
{ "project_id": "careful-parser-xxxxxx", "dataset_id": "newDataset1", "validate_records": true, "disable_collection": true }
I skipped declaring the table here, as when I did that I was getting an error, this way singer just creates a table for you with the name of the report that you have selected.
Another suggestion that I have gotten but not used is to go with this repository, as it has more recent commits. https://github.com/dreamdata-io/target-bigquery . I didn’t use it, but in case you run into issues, give it a shot.
Getting the Client.json file for Bigquery
Go into IAM & Admin, and then go to Service Accounts. Either create a new service account or get the keys for an existing service account.
By Clicking on the three dots to the right > Key > Json.

Upload the file into your compute instance and set the GOOGLE_APPLICATION_CREDENTIALS
environment variable on the machine.
export GOOGLE_APPLICATION_CREDENTIALS=$PWD/client_secrets.json
After this, you can authenticate your google bigquery credentials and you are set to use target-bigquery.,
gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
Run your Pipeline
env-tap/bin/tap-adwords -c adwordsconfig.json -p output.json -s state.json | env-target/bin/target-bigquery -c bigqueryconfig1.json | tee state.json
Your date for the date range should appear in google Bigquery.
Next Steps
Here are the links to all articles in this series:-
a) Part 1:- Creating a singer pipeline getting data from google adwords to bigquery
b) Part 2:- Creating the docker file for the singer pipeline
c) Part 3:- Automating the singer pipeline
No comments yet.
Add your comment