Why the RedisVectorStore adds a prefix

At a glance

Why the RedisVectorStore adds a prefix to the vectors?

30 comments

https://docs.llamaindex.ai/en/stable/examples/vector_stores/RedisIndexDemo.html#initialize-the-redis-vector-store

Not sure maybe to identify particular index from pool of indexes 🧐

LLORKA

See, my problem is that if the index is a DocumentSummaryIndex.from_documents() that gets saved in the RedisVectorStore and RedisIndexStore the load_index_from_storage() fails due to the added prefixes

LLORKA

Not sure how to solve it

WWhiteFang_Jr

https://docs.llamaindex.ai/en/stable/examples/vector_stores/RedisIndexDemo.html#saving-and-loading
https://docs.llamaindex.ai/en/stable/examples/vector_stores/RedisIndexDemo.html#restore-index-from-redis

This is not working?

LLORKA

In this cases the index is made from the vector store, and is a VectorStoreIndex

LLORKA

Plain Text

import redis

from llama_index import (
    StorageContext,
    DocumentSummaryIndex,
    load_index_from_storage,
    ServiceContext,
)

from llama_index.storage.docstore import RedisDocumentStore
from llama_index.storage.index_store import RedisIndexStore
from llama_index.vector_stores import RedisVectorStore
from llama_index.embeddings import HuggingFaceEmbedding
from llama_index.schema import Document


REDIS_DB_HOST = "localhost"
REDIS_DB_PASSWORD = ""
REDIS_DB_PORT = 6380

redis_client = redis.Redis(
    host=REDIS_DB_HOST, password=REDIS_DB_PASSWORD, port=REDIS_DB_PORT, db=0
)


docstore = RedisDocumentStore.from_redis_client(
    redis_client=redis_client,
    namespace="Fail_doc_store",
)
index_store = RedisIndexStore.from_redis_client(
    redis_client=redis_client,
    namespace="Fail_index_store",
)
vector_store = RedisVectorStore(
    redis_url=f"redis://:{REDIS_DB_PASSWORD}@{REDIS_DB_HOST}:{REDIS_DB_PORT}/0",
    index_name="Fail_vector_store",
    index_prefix="llama",
    overwrite=True,
)
storage_context = StorageContext.from_defaults(
    docstore=docstore, index_store=index_store, vector_store=vector_store
)

service_context = ServiceContext.from_defaults(
    llm=<SOME_LLM_MODEL>,
    embed_model=HuggingFaceEmbedding(
        model_name="intfloat/multilingual-e5-large",
    ),
)

document = Document(text="This is a test document", id="test_id")
index = DocumentSummaryIndex.from_documents(
    documents=[document],
    storage_context=storage_context,
    service_context=service_context,
)
index_id = index.index_id
index2 = load_index_from_storage(
    index_id=index_id, storage_context=storage_context, service_context=service_context
)
query_engine = index2.as_query_engine().query("This is a test query")

LLORKA

In that example the llm of the service context must be change to a custom llm

LLORKA

@WhiteFang_Jr @Logan M I hope this example helps. Should i open a github issue?

LLORKA

Plain Text

Traceback (most recent call last):
  File "/Users/HASHKELL/SHERPAS/Funds/fundssociety-rag/recreable_fail.py", line 56, in <module>
    query_engine = index2.as_query_engine().query("This is a test query")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/core/base_query_engine.py", line 40, in query
    return self._query(str_or_query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/query_engine/retriever_query_engine.py", line 171, in _query
    nodes = self.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/query_engine/retriever_query_engine.py", line 127, in retrieve
    nodes = self._retriever.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/core/base_retriever.py", line 224, in retrieve
    nodes = self._retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/HASHKELL/Library/Caches/pypoetry/virtualenvs/rag-IF6y9IJD-py3.11/lib/python3.11/site-packages/llama_index/indices/document_summary/retrievers.py", line 176, in _retrieve
    node_ids = self._index_struct.summary_id_to_node_ids[summary_id]

LLORKA

The error

LLogan M

Maybe im confused, it's adding a prefix to the IDs? Or to what?

LLORKA

Its adding a prefix to the IDs

LLORKA

But the same prefix is not added to the node_ids in the docstore

LLogan M

Hmm seems like a bug. It should be removing that prefix when retrieving the nodes then

LLORKA

Well, maybe. The problem is that the prefix is not added to the nodes because they are made form the DocummentSummaryIndex

LLogan M

You might be the first person to use redis with the doc summary index lol

LLORKA

But if the Index is build like an VectorStoreIndex.from_vector_store i think they do get added

LLORKA

Tbh I'm open of changing it, but i think that i can't achieve the same results only using a vector_store manly because there is no documment_summary transformation when doing the nodes.
If something is bad practices let me know

LLogan M

I think in the query() method of the RedisVectorStore, it should remove the prefix from the ids. Otherwise, the ids won't line up with anything else.

It will work in a standalone vector index, since it doesn't have to use the resulting ids anywhere

LLORKA

"It will work in a standalone vector index, since it doesn't have to use the resulting ids anywhere", why?

LLORKA

I can change it and make a pull request but I wasnt sure if it would breake another index

LLogan M

Because in a standalone vector index, once the nodes are retrieved, they are just sent to the LLM, there's no lookup anywhere else like the doc summary index, since we already have the node to give to the llm.

I'm thinking right here, we need to modify node.id_ to remove the prefix?
https://github.com/run-llama/llama_index/blob/dcef41ee67925cccf1ee7bb2dd386bcf0564ba29/llama_index/vector_stores/redis.py#L291

LLogan M

Actually no

LLogan M

It might be here?
https://github.com/run-llama/llama_index/blob/dcef41ee67925cccf1ee7bb2dd386bcf0564ba29/llama_index/vector_stores/redis.py#L302

LLogan M

Unsure without testing lol

LLORKA

Give me a sec to debugg and make sure

LLORKA

Attachment