In this tutorial series, we will show how to build an end to end real time analytics system. We will stream the traffic (click) events from our web application to Upstash Kafka then we will analyse it on real time. We will implement one simply query with different stream processing tools:

SELECT city, count() FROM kafka_topic_page_views where  timestamp > now() - INTERVAL 15 MINUTE group by city

Namely, we will query the number of page views from different cities in last 15 minutes. We keep the query and scenario intentionally simple to make the series easy to understand. But you can easily extend the model for your more complex realtime analytics scenarios.

If you do not have already set up Kafka pipeline, see the first part of series where we did the set up our pipeline including Upstash Kafka and Cloudflare Workers (or Vercel).

In this part of the series, we will showcase how to use Tinybird to run a query on a Kafka topic.

Tinybird Setup

Create a Tinybird account and select a region. Create an empty workspace. On the wizard screen click Add Data button.

tiny1.png

In the next screen click on Kafka tab and fill the fields with the credentials copied from Upstash Kafka console. Key is username , secret is password . Select SCRAM-SHA-256. Then click on Connect button.

tiny2.png

If the connection is successful, then you should see the topic you have just created. Select it and click Continue .

tiny3.png

In the next screen, you should see data is populated from your Kafka topic. It should look like the below. Select Latest and click Create Data Source

tiny4.png

Click Create Pipe on the next screen.

tiny5.png

In the next page, you will see the query editor where you can execute queries on your data pipe. You can rename the views. Update the query as (replace the datasource):

SELECT city, count() FROM  kafka_ds_534681 where  timestamp > now() - INTERVAL 15 MINUTE group by city

tiny6.png

You should see the number of page view from cities in last 15 minutes. The good thing with TinyBird is you can chain queries (new transformation node) also you can add multiple data sources (e.g. Kafka topics) and join them in a single query.

If you are happy with your query, click on Create API Endpoint at top right. It will create an API endpoint which returns the result for your query.

tiny7.png

Copy the curl command and try, you should see the result like below:

curl --compressed -H 'Authorization: Bearer p.eyJ1IjogIjMyMDM1YzdjLTRkOGYtNDA3CJpZCI6ICJlNTY4ZjVjYS1hNjNiLTRlZTItYTFhMi03MjRiNzhiNmE5MWEifQ.3KgyhWoohcr_0HCt6J7y-kt71ZmWOlrjhYyFa7TaUlA'  https://api.us-east.tinybird.co/v0/pipes/kafka_ds_534681_pipe_2444.json
{
	"meta":
	[
		{
			"name": "city",
			"type": "String"
		},
		{
			"name": "count()",
			"type": "UInt64"
		}
	],

	"data":
	[
		{
			"city": "San Jose",
			"count()": 8
		}
	],

	"rows": 1,

	"statistics":
	{
		"elapsed": 0.000562736,
		"rows_read": 8,
		"bytes_read": 168
	}
}

Conclusion

We have built a simple data pipeline which collect data from edge to Kafka then create real time reports using SQL. Thanks to serverless technologies (Vercel, Upstash, Tinybird), we achieved this without dealing any server or infrastructural configurations. You can easily extend and adapt this example for much more complex architectures and queries.