# Upstash Ratelimit in LangChain

> **Source:** https://upstash.com/blog/ratelimit-langchain
> **Date:** 2024-06-10
> **Author(s):** Cahid Arda Oz
> **Reading time:** 4 min read
> **Tags:** upstash, ratelimit, langchain
> **Format:** text/markdown — machine-readable content for agents and LLMs

---

In this post, we will showcase how [Upstash Ratelimit](https://upstash.com/docs/oss/sdks/ts/ratelimit/overview) can be used in LangChain in Javascript and in Python.

### Motivation

Large Language Models (LLMs) are powerful tools but can be costly. To manage their cost, it's essential to limit the number of requests processed, ensuring the use of LLMs remains affordable. This is where Upstash Ratelimit comes into play.

Using Upstash Ratelimit in LangChain allows you to:

1. Allow a limited number of chain invocations over a time period.
2. Allow a limited number of tokens (either prompt or prompt and completion) over a time period.

### Usage

Start by installing `@upstash/ratelimit` and `@langchain/community` (you can find more information about installing LangChain community [here](https://js.langchain.com/v0.2/docs/how_to/installation/#installing-integration-packages)):

```bash
npm install @upstash/ratelimit @langchain/community
```

If you are using Python, run:

```bash
pip install upstash-ratelimit langchain-community
```

Then, set the environment variables `UPSTASH_REDIS_REST_URL` and `UPSTASH_REDIS_REST_TOKEN`. You can get these environment variables by going to [the Upstash console](https://console.upstash.com/redis) and [creating a redis instance](https://upstash.com/docs/redis/overall/getstarted).

Now we can demonstrate how you can add rate limiting to your chain in LangChain.

First, create a rate limit instance as shown below:

```tsx
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  // 10 requests per window, where window size is 60 seconds:
  limiter: Ratelimit.fixedWindow(10, "60 s"),
});
```

This `ratelimit` object will use the Redis database to store how many requests were made by different users. [Learn more about Ratelimit from its documentation](https://upstash.com/docs/oss/sdks/ts/ratelimit/overview).

Next, create a mock chain in LangChain to showcase the callback:

```tsx
import { RunnableLambda } from "@langchain/core/runnables";

const chain = new RunnableLambda({ func: (str: string): string => str });
```

Finally, create a callback and invoke the chain:

```tsx
import {
  UpstashRatelimitHandler,
  UpstashRatelimitError,
} from "@langchain/community/callbacks/handlers/upstash_ratelimit";

const user_id = getUserId(); // your own method to get user ids

try {
  const response = await chain.invoke("Hello World!", {
    callbacks: [
      new UpstashRatelimitHandler(user_id, {
        requestRatelimit: ratelimit,
      });
    ],
  });
  console.log(response);
} catch (err) {
  if (err instanceof UpstashRatelimitError) {
    console.log("Handling ratelimit!");
  }
}
```

Here is the same code written in Python:

```py
from upstash_redis import Redis
from upstash_ratelimit import Ratelimit, FixedWindow

from langchain_core.runnables import RunnableLambda
from langchain_community.callbacks import UpstashRatelimitError, UpstashRatelimitHandler

# your own method to get user ids
user_id = getUserId();

# mock chain
chain = RunnableLambda(str)

# init ratelimiter
ratelimit = Ratelimit(
    redis=Redis.from_env(),
    # 10 requests per window, where window size is 60 seconds:
    limiter=FixedWindow(max_requests=10, window=60),
)

try:
    response = chain.invoke("Hello World!", {
        "callbacks": [
            UpstashRatelimitHandler(
                identifier=user_id,
                request_ratelimit=ratelimit,
            )
        ]
    })
except UpstashRatelimitError:
    print("Handling ratelimit!")

```

Note that we initialize the callback when we invoke the chain. This is because the handler has a state which needs to be reset with each invocation.

In the configuration above, the handler will allow 10 requests per minute for each user. However, this is not the only way to configure the handler. To learn more about Ratelimit callbacks in LangChain, see the Upstash Ratelimit Callback documentation for [TypeScript](https://js.langchain.com/v0.2/docs/integrations/callbacks/upstash_ratelimit_callback) and [Python](https://python.langchain.com/v0.2/docs/integrations/callbacks/upstash_ratelimit/).

You can also rate limit based on the number of tokens. You can configure the handler to rate limit based on requests, number of tokens, or both. To add token-based rate limiting, use the `tokenRatelimit` parameter when initializing the handler:

```tsx
const handler = new UpstashRatelimitHandler(user_id, {
  tokenRatelimit: ratelimit,
  includeOutputTokens: true // whether completion is included when counting tokens
});
```

In the token-based rate limiting, we expect the LLM in your chain to return an `LLMResult` in this format:

```
{
  "tokenUsage": {
    "totalTokens": 123,
    "promptTokens": 456,
    "otherFields: "..."
  },
  "otherFields: "..."
}
```

Not all LLMs in LangChain use this format, however. If the keys are different, you can use `llmOutputTokenUsageField`, `llmOutputTotalTokenField` and `llmOutputPromptTokenField` fields of `UpstashRatelimitHandler`.

### Conclusion

The `UpstashRatelimitHandler` provides a simple way to add rate limiting to your LangChain applications. With just a few steps, you can control the number of requests and tokens used. For more detailed information and advanced configurations, explore [the LangChain documentation](https://js.langchain.com/v0.2/docs/integrations/callbacks/upstash_ratelimit_callback).