Building a Web Analytics System Using Kafka and Spark Streaming in Google Cloud
The aim of the project is to create a data pipeline, that will receive hits from the website using a Flask rest API. The rest API we publish the data to Kafka topics. We subscribe to these topics using a Google Dataproc cluster. Then we use spark streaming to read the data from the Kafka topic and push it into Google Bigquery. STEP 1 – Pushing data into Kafka Topics from the Rest Api Endpoints Here is the code of the Javascript snippet that I put on the website and the Flask API code. Here is the code for the Flask Api, for Kafka producer, look into resources/webevents.py Here is the code for the file. The bootstrap servers in case of Dataproc are the worker nodes, the kafka by default works on the node 9092, you can connect to the Dataproc cluster using the internal IP of the worker nodes.[…]
Read More
No comments yet.
Add your comment