September 3, 2024·8 min read

Adding Semantic Search into a Strapi Application

Noah FischerSoftware Engineer, Guest Author

Semantic search is a searching technique that allows users/searchers to find relevant content that matches the query by its meaning.

This technique uses vector representations of the content to find the most relevant results in the search domain.

To do the semantic search, we first need to extract vector embeddings of the query by using embedding models. Once we have the vector embeddings of the query, we compare it with the existing vector embeddings of the content items. The comparison should be based on the distance between vectors in N-dimensional vector space, where N is the size of the vectors.

In this blog post, we will develop semantic search functionality integrated into our Strapi application. To do that, we will first create Upstash Vector database to use as embedding storage and retriever. After that, we will create our example Strapi application which will basically have books and their descriptions as the content. Semantic search will be done based on the books and their descriptions we have in our content-driven Strapi application.

Strapi Application Setup

Let’s get to know Strapi first.

Strapi is an open-source headless Content Management System (CMS) that allows developers to manage content through an API while providing a customizable and user-friendly interface.

Unlike traditional CMS platforms, which typically include both front-end and back-end functionality and their integration, Strapi focuses on providing a back-end content delivery, enabling developers to choose any front-end technology they prefer. This “headless” architecture of Strapi decouples the front-end from the back-end, offering the flexibility of consuming the content from any kind of framework or language.

Strapi provides this flexibility by giving access to the content by REST APIs. This allows the frontend applications to retrieve the data in a common format without any coupled integration.

Lastly, Strapi provides fully managed cloud hosting for Strapi applications and their contents. The cloud hosting makes development much easier to maintain. However, in this blog post, we will do the development on our local host since the purpose of the blog post is to integrate semantic search into Strapi apps.

Now, let’s build our Strapi application. We will first install Strapi on our local host. Before doing that, the following requirements must be installed:

Node.js: Only Active LTS or Maintenance LTS versions are supported (currently v18 and v20). Odd-number releases of Node, known as "current" versions of Node.js, are not supported (e.g. v19, v21).
Your preferred Node.js package manager:
npm (v6 and above)
yarn
Python (if using a SQLite database)

After installing the prerequisites, we can create our first Strapi application by running the following command in the terminal.

npx create-strapi-app@latest bookstore --quickstart

This command will open a browser page automatically. Once we complete the installation and the form on the browser, we will be able to access to the admin panel of the application.

The admin panel runs in http://localhost:1337/admin and can be opened when we run npm run develop command as well.

Our barebones Strapi application is ready. Now, we will create our first content, which is basically a database that is going to store books in our bookstore.

To create a bookstore database, let’s go to the Content-Type Builder tab. On this page, we can create a collection type, and name it book, since the collection will consist of books.

When we click continue, the next step is configuring the attributes of the collection items. In this demo, we will have name, description, price and author just to keep the demo simple. Strapi provides a bunch of data types for configuring attributes.

In our case, name and author will be short text, description will be long text and the price will be number. Once we complete the attribute creation steps, we should click Save to create the data collection. When we save our changes, the Strapi application restarts itself to load the changes.

Now, we can go back to the Content Manager tab and see the new book collection that we have just created.

Our Strapi application is ready now!

Upstash Vector Setup

Upstash Vector is a serverless vector database designed to work with vector embeddings. In this blog post, we will store the vector representations of the data that we insert into the Strapi book collection.

We will follow the Upstash Documentation for creating the Upstash Vector database.

First, log in to the Upstash Console and create the Index. In the pop-up that is raised, we will set up our vector index. We should use a built-in embedding model of Upstash Vector. In this demo, we can use bge-base-en-v1.5 model. The vector size of this model is 768, and we can use Cosine metric for distance calculation. You can learn more about the metrics in Upstash Vector docs.

Now the vector index is ready. We will need the endpoint and token later when we integrate Upstash Vector with the Strapi application.

Integrating Upstash Vector with Strapi Application

We have all the resources ready. Now we can connect the Upstash Vector index to the Strapi bookstore application.

For a semantic search, we should extract the vector embeddings of every new book in our bookstore. Therefore, we should send the data of a new book when there is a new entry in the Strapi application. Strapi provides lifecycle methods to intervene in the lifecycle of the data ingestion and queries in the application. It lets developers execute additional logic when there is an event.

In this demo, we will use afterCreate lifecycle hook. Strapi provides lifecycle hooks for every action taken in the application. You can learn more about lifecycle events in Strapi docs.

afterCreate hook is executed after creating an entry in the collection completed. This hook is where we will send the data to the Upstash Vector index to extract the embeddings of the data and store them.

To connect to Upstash Vector, we will use typescript SDK it by installing with the following command.

npm install @upstash/vector

To manipulate the lifecycle of books collection, we need to create a lifecycles.js file in the ./src/api/[api-name]/content-types/[content-type-name]/ folder. We will write the following code into this script, which is a very basic implementation for demonstration.

'use strict';
 
module.exports = {
	beforeCreate(event) {
		// let's do a 20% discount everytime
		event.params.data.price = event.params.data.price * 0.8;
	},
 
	afterCreate(event) {
		const {Index} = require("@upstash/vector");
		const { result, params } = event;
		// do something to the result;
		const index = new Index({
			url: "<UPSTASH-VECTOR-ENDPOINT>",
			token: "<UPSTASH-VECTOR-TOKEN>",
		})
 
		index.upsert({
			id: params.data.name.toString(),
			data: params.data.toString(),
			metadata: {metadata_field: "metadata_value"},
		}).then(r => console.log("Indexed into Upstash Vector"));
	},
};

Do not forget to fill in the URL and token that you copied from the Upstash console.

This code basically will send the whole data of a new entry. That data will be vectorized and stored in the Upstash vector index. The id of the vectors will be equal to the name of the books.

Lastly, we can test the integration with an example entry. If the admin panel is closed, we can open it back by running the following command in the terminal.

npm run develop

Let’s create a new entry from the Content Manager tab.

After saving and publishing this entry, we can test by accessing [http://localhost:1337/api/books](http://localhost:1337/api/books%60) on the browser. In addition to that, we should see the vector of this new book entry in our Upstash vector index. We can check it by going back to the Upstash console and opening the Data Browser tab.

As can be seen above, with the lifecycle manipulation on the Strapi application, there will be a vector for every book that we create in the Strapi collection. For semantic searching among the books, we can put the following code wherever the users do the search calls.

const {Index} = require("@upstash/vector");
 
const index = new Index({
	url: "<UPSTASH-VECTOR-ENDPOINT>",
	token: "<UPSTASH-VECTOR-TOKEN>",
})
 
index.query(
	data="Enter the query",
	top_k=10,
	include_vectors=True,
	include_metadata=True
)

The code above will list the top 10 related records. It can be fine-tuned according to the need of the application.

Conclusion

In this blog, we created the semantic search feature for an example Strapi application by integrating it with Upstash Vector. The semantic search can be used by the users to find the most relevant books with their search. The similarity function that we used in the Upstash Vector can bring the most relevant books by comparing the meaning of the queried text with the data of the books.

Of course, this project was just a demo, so it was pretty simple. However, Strapi is almost fully customizable and user-friendly. More complex Strapi applications connected with Upstash Vector can be built by configuring the lifecycle of the Strapi collection, as we did above.

I hope this blog helps you all!

vector strapi ai semantic

Adding Semantic Search into a Strapi Application

Strapi Application Setup

Upstash Vector Setup

Integrating Upstash Vector with Strapi Application

Conclusion

Building a PDF Chatbot with Upstash RAGChat

Building a RAG Chatbot Using Langflow and Upstash Vector