# Algorithm

## Approximate Nearest Neighbor Search

The primary functionality of the vector store is straightforward: identifying the most similar vectors to a given vector. While the concept is simple, translating it into a practical product poses significant challenges.

A simple and basic approach to searching in a vector database is to perform an exhaustive search by comparing a query vector to every other vector stored in the database one by one. However, this consumes too many resources and results in very high latencies, making it not very practical. To address this problem, Approximate Nearest Neighbor (`ANN`

) algorithms are used. `ANN`

search approximates the true nearest neighbor, which means it might not find the absolute closest point, but it will find one that’s close enough, with a **low-latency** and by consuming **fewer resources**.
In the literature, the comparison of the results of `ANNS`

with exhaustive search is called the recall rate.
The higher the recall rate the better the results.

Several `ANNS`

algorithms, such as `HNSW`

[1], `NSG`

[2], and `DiskANN`

[3], are available for use,
each with its distinct characteristics. One of the difficult problems in ANN algorithms is that indexing and querying vectors may require storing the whole data in memory. When the dataset is huge, then memory requirements for indexing may exceed available memory. `DiskANN`

algorithm tries to solve this problem by using disk as the main storage for indexes and for performing queries directly on disk.`DiskANN`

paper acknowledges that, if you try to store your vectors
in disk and use `HNSW`

or `NSG`

, you may end up with again very high latencies. `DiskANN`

is focused
on serving queries from disk with **low-latency** and **good recall rate**.
And this helps Upstash Vector to be **cost-effective**, therefore cheaper compared to alternatives.

Even though `DiskANN`

has its advantages, it also requires more work to be practical.
Main problem is that, you can’t insert/update existing index without reindexing all the vectors.
For this problem, there is another improved paper `FreshDiskANN`

[4]. `FreshDiskANN`

improves `DiskANN`

via introducing
a temporary index for up-to-date data in memory. Queries are served from both the temporary(up-to-date) index
and also from the disk. And these temporary indexes are merged to the disk from time-to-time behind the scene.

Upstash Vector is based on `DiskANN`

and `FreshDiskANN`

with more improvements based on our
tests and observations.

### References

- Malkov, Y. A., Yashunin, D. A. (2016).
*Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs*. CoRR, abs/1603.09320 (2016). [https://arxiv.org/abs/1603.09320] - Fu, C., Xiang, C., Wang, C., Cai, D. (2019).
*Fast Approximate Nearest Neighbor Search with Navigating Spreading-Out Graphs*. Proceedings of the VLDB, 12(5), 461–474. doi: 10.14778/3303753.3303754. [https://www.vldb.org/pvldb/vol12/p461-fu.pdf] - Subramanya, S. J., Devvrit, Kadekodi, R., Krishaswamy, R., Simhadri, H. V. (2019).
*DiskANN: Fast Accurate Billion-Point Nearest Neighbor Search on a Single Node*. In Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS ‘19), Article No.: 1233, Pages 13766–13776. [https://dl.acm.org/doi/abs/10.5555/3454287.3455520] - Singh, A., Subramanya, S. J., Krishnaswamy, R., Simhadri, H. V. (2021).
*FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search*. CoRR abs/2105.09613 (2021). [https://arxiv.org/abs/2105.09613]

Was this page helpful?