Metadata Filtering
You can further limit the vector similarity search by providing a filter based on a specific metadata criteria.
Queries with metadata filters only return vectors which have metadata matching with the filter.
Upstash Vector allows you to filter keys which have the following value types:
- string
- number
- boolean
- object
- array
Filtering is implemented as a combination of in and post-filtering. Every query is assigned a filtering budget,
determining the number of candidate vectors that can be compared against the filter during query execution. If this
budget is exceeded, the system fallbacks into post-filtering. Therefore, with highly selective filters, fewer
than topK
vectors may be returned.
Filter Syntax
A filter has a syntax that resembles SQL, which consists of operators on object keys and boolean operators to combine them.
Assuming you have a metadata like below:
{
"city": "Istanbul",
"country": "Turkey",
"is_capital": false,
"population": 15460000,
"geography": {
"continent": "Asia",
"coordinates": {
"latitude": 41.0082,
"longitude": 28.9784
}
},
"economy": {
"currency": "TRY",
"major_industries": [
"Tourism",
"Textiles",
"Finance"
]
}
}
Then, you can query similar vectors with a filter like below:
from upstash_vector import Index
index = Index(
url="UPSTASH_VECTOR_REST_URL",
token="UPSTASH_VECTOR_REST_TOKEN",
)
index.query(
vector=[0.9215, 0.3897],
filter="population >= 1000000 AND geography.continent = 'Asia'",
top_k=5,
include_metadata=True
)
Operators
Equals (=)
The equals
operator filters keys whose value is equal to given literal.
It is applicable to string, number, and boolean values.
country = 'Turkey' AND population = 15460000 AND is_capital = false
Not Equals (!=)
The not equals
operator filters keys whose value is not equal to given literal.
It is applicable to string, number, and boolean values.
country != 'Germany' AND population != 12500000 AND is_capital != true
Less Than (<)
The less than
operator filters keys whose value is less than the given literal.
It is applicable to number values.
population < 20000000 OR geography.coordinates.longitude < 30.0
Less Than or Equals (<=)
The less than or equals
operator filters keys whose value is less than or equal to the given literal.
It is applicable to number values.
population <= 20000000 OR geography.coordinates.longitude <= 30.0
Greater Than (>)
The greater than
operator filters keys whose value is greater than the given literal.
It is applicable to number values.
population > 10000000 OR geography.coordinates.latitude > 39.5
Greater Than or Equals (>=)
The greater than or equals
operator filters keys whose value is greater than or equal to the given literal.
It is applicable to number values.
population >= 10000000 OR geography.coordinates.latitude >= 39.5
Glob
The glob
operator filters keys whose value matches the given UNIX glob pattern.
It is applicable to string values.
It is a case sensitive operator.
The glob operator supports the following wildcards:
*
matches zero or more characters.?
matches exactly one character.[]
matches one character from the list[abc]
matches eithera
,b
, orc
.[a-z]
matches one of the range of characters froma
toz
.[^abc]
matches any one character other thana
,b
, orc
.[^a-z]
matches any one character other thana
toz
.
For example, the filter below would only match with city names whose second character is s
or z
,
and ends with anything other than m
to z
.
city GLOB '?[sz]*[^m-z]'
Not Glob
The not glob
operator filters keys whose value does not match the given UNIX glob pattern.
It is applicable to string values.
It has the same properties with the glob operator.
For example, the filter below would only match with city names whose first character is anything other than A
.
city NOT GLOB 'A*'
In
The in
operator filters keys whose value is equal to any of the given literals.
It is applicable to string, number, and boolean values.
country IN ('Germany', 'Turkey', 'France')
Semantically, it is equivalent to equals operator applied to all of the given literals with OR
boolean operator in between:
country = 'Germany' OR country = 'Turkey' OR country = 'France'
Not In
The not in
operator filters keys whose value is not equal to any of the given literals.
It is applicable to string, number, and boolean values.
economy.currency NOT IN ('USD', 'EUR')
Semantically, it is equivalent to not equals operator applied to all of the given literals with AND
boolean operator in between:
economy.currency != 'USD' AND economy.currency != 'EUR'
Contains
The contains
operator filters keys whose value contains the given literal.
It is applicable to array
values.
economy.major_industries CONTAINS 'Tourism'
Not Contains
The not contains
operator filters keys whose value does not contain the given literal.
It is applicable to array
values.
economy.major_industries NOT CONTAINS 'Steel Production'
Has Field
The has field
operator filters keys which having the given JSON field.
HAS FIELD geography.coordinates
Has Not Field
The has not field
operator filters keys which not having the given JSON field.
HAS NOT FIELD geography.coordinates.longitude
Boolean Operators
Operators above can be combined with AND
and OR
boolean operators to form
compound filters.
country = 'Turkey' AND population > 10000000
Boolean operators can be grouped with parentheses to have higher precendence.
country = 'Turkey' AND (population > 10000000 OR is_capital = false)
When no parentheses are provided in ambigous filters, AND
will have higher
precendence than OR
. So, the filter
country = 'Turkey' AND population > 10000000 OR is_capital = false
would be equivalent to
(country = 'Turkey' AND population > 10000000) OR is_capital = false
Filtering Nested Objects
It is possible to filter nested object keys by referencing them with the .
accessor.
Nested objects can be at arbitrary depths, so more than one .
accessor can be used
in the same identifier.
economy.currency != 'USD' AND geography.coordinates.latitude >= 35.0
Filtering Array Elements
Apart from the CONTAINS
and NOT CONTAINS
operators, individual array elements can also
be filtered by referencing them with the []
accessor by their indexes.
Indexing is zero based.
economy.major_industries[0] = 'Tourism'
Also, it is possible to index from the back using the #
character with negative values.
#
can be thought as the number of elements in the array, so [#-1]
would reference the
last character.
economy.major_industries[#-1] = 'Finance'
Miscellaneous
- Identifiers (the left side of the operators) should be of the form
[a-zA-Z_][a-zA-Z_0-9.[\]#-]*
. In simpler terms, they should start with characters from the English alphabet or_
, and can continue with same characters plus numbers and other accessors like.
,[0]
, or[#-1]
. - The string literals (strings in the right side of the operators) can be either single or double quoted.
- Boolean literals are represented as
1
or0
. - The operators, boolean operators, and boolean literals are case insensitive.
Was this page helpful?