Rockset is a real-time search and analytics database designed to serve millisecond-latency analytical queries on event streams, CDC streams, and vectors.

Upstash Kafka Setup

Create a Kafka cluster using Upstash Console or Upstash CLI by following Getting Started.

Create one topic by following the creating topic steps. This topic will be the source for the Rockset. Let’s name it “transcript” for this example tutorial.

Rockset Setup

To be able to use the Rockset, you first need to create an account.

There are a couple of steps to create your organisation. After completing them, you can see your Rockset dashboard is created.

Connect Rockset to Upstash Kafka

To ingest data from Upstash Kafka to Rockset, open Integrations and click to Create your first Integration.

Select Kafka as the external service and click Start. In the next step, name your integration and select Apache Kafka. In the data format section, select the data format and give the name of the topic you created.

When you continue, you will see 5 new steps to create and configure the Kafka Connector for Rockset. Kafka – Rockset connection can be established using the plugin Rockset provided only. Therefore, we have to create self-managed Kafka connector.

Since this tutorial explains the first connection, select New — no pre-existing Kafka Connect cluster.

To create the required Kafka connector, you must first download Apache Kafka Connect.

In the next step, you can give the endpoint of the Kafka Cluster as the Address of Apache Kafka Broker and then download the provided Kafka Connect properties.

The connect-standalone.properties file should be located in the same folder as Kafka.

Open connect-standalone.properties and add the following properties.

consumer.sasl.mechanism=SCRAM-SHA-256
consumer.security.protocol=SASL_SSL
consumer.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
  username="<UPSTASH-KAFKA-USERNAME>" \
  password="<UPSTASH-KAFKA-PASSWORD>";
sasl.mechanism=SCRAM-SHA-256
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
  username="<UPSTASH-KAFKA-USERNAME>" \
  password="<UPSTASH-KAFKA-PASSWORD>";

These additional properties will allow your Kafka connector access to your Kafka cluster and consume the topics.

In the next step, download the Rockset Sink Connector and Rockset Sink Connector Properties. Locate these files in the same folder with Kafka as well.

Now, navigate to the folder that contains all these files and execute the following command to run a standalone Apache Kafka Connect with Rockset Sink Connector.

./kafka_2.13-3.6.1/bin/connect-standalone.sh ./connect-standalone.properties ./connect-rockset-sink.properties

Before completing the integration, we can check if the data is coming to the Rockset. Let’s return to the Upstash console, click on your Kafka cluster and go to the “Topics” section. Open the source topic, which is transcript in this case. Select the Messages tab, then click Produce a new message. Send a message in JSON format like the one below:

{
	"studentID": 205,
	"firstName": "Natalie",
	"lastName": "Jones",
	"gender": "Female",
	"subject": "Maths",
	"score": 3.8
}

You should see the integration Active.

Query Data

When you complete the integration setup, click Create Collection from Integration. This will allow you to collect the data from the Uptash Kafka topic and query that data.

Type your Kafka topic in the data source selection step.

We can leave the Ingest Transformation Query as it is in the Transform Data step.

Lastly, name the collection and create.

Now, select Query This Collection. Go back to the Upstash console and produce a new message in the source topic.

{
	"studentID": 201,
	"firstName": "John",
	"lastName": "Doe",
	"gender": "Male",
	"subject": "Physics",
	"score": 4.4
}

Let’s go to the Rockset query editor and run the following query.

SELECT * FROM commons.transcipts LIMIT 10

You can see the last message you sent to the source topic returned.