Find answers from the community

Updated 3 months ago

redis index selection

hmm... llamaindex appears to be looking at ALL indices in redis and not restricting itself to only the specified index
t
L
59 comments
first, I index stuff:
Plain Text
docs_store = RedisVectorStore(
    index_name="openshift-docs",
    redis_url=f"redis://{redis_hostname}:6379",
    overwrite=True,
)

webconsole_documents = SimpleDirectoryReader('openshift-docs', file_metadata=filename_fn).load_data()
webconsole_storage_context = StorageContext.from_defaults(vector_store=docs_store)
webconsole_index = VectorStoreIndex.from_documents(webconsole_documents, storage_context=webconsole_storage_context, service_context=service_context)

usecase_store = RedisVectorStore(
    index_name="summary-docs",
    redis_url=f"redis://{redis_hostname}:6379",
    overwrite=True,
)
summary_documents = SimpleDirectoryReader('summary-docs', file_metadata=filename_fn).load_data()
summary_storage_context = StorageContext.from_defaults(vector_store=usecase_store)
summary_index = VectorStoreIndex.from_documents(summary_documents, storage_context=summary_storage_context, service_context=service_context)
then I specifically query an index:

Plain Text
vector_store = RedisVectorStore(
    index_name="summary-docs",
    redis_url=f"redis://{redis_hostname}:6379",
    overwrite=False,
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

query = "what are the steps for configuring cluster autoscaling?"
response = index.as_query_engine(verbose=True, streaming=False).query(query)
referenced_documents = "\n\nReferenced documents:\n"
for source_node in response.source_nodes:
    referenced_documents += source_node.node.metadata['file_name'] + '\n'

print()
print(referenced_documents)
But the context passed to the LLM includes data from the other index (BOTH openshift-docs and summary-docs)
the index query engine I'm creating is clearly for a vectorstore with a specified index of summary-docs
but it definitely pulls an answer from openshift-docs
now maybe my expectation for how this works is incorrect
so, happy to be told "that's not how it works" πŸ™‚
Are you familiar with the redis python package?

I see this line when adding nodes to the vector store: self._redis_client.hset(key, mapping=mapping) -- but shouldn't this include some mention of the index name?

We use the index name when querying

Plain Text
results = self._redis_client.ft(self._index_name).search(
    redis_query, query_params=query_params  # type: ignore
)
Are you familiar with the redis python package?
you can assume that I am unfamiliar with python, let alone specific packages πŸ˜†
hahaha fair enough
I'll read some docs
thanks for pointing this out
glad to have broken something YET AGAIN
at least I'm consistent.
this is related to the other thread because i'm trying to keep separate indexes of documentation in varying degrees of completeness
ha yea. I guess the redis vector store must not be used often for this to be missed πŸ˜…
funny because redis is so lightweight and easy
it clearly knows about my indices:
Plain Text
redis-cli FT._LIST
summary-docs
openshift-docs
so the indices exist
ok, interesting
the summary-docs index has everything
so it's not that the retrieve is getting things from multiple indexes
Is it because you have overwrite=True for both?
it's that indexing is putting things in the "wrong" place
but why would overwrite=True matter here?
you should be specifying which index you put things into
I will try it without
Yea just looking at your code trying to figure out a reason haha
i mean you should be able to create a reproducer pretty easily
just index 2 documents separately
If summary-docs has everything, does openshift-docs have nothing?
Yea just need to spin up redis
about to check
it looks like both indexes have everything
Yea will try to reproduce
Off the top of my head, I know a unique prefix will probably help separate the documents -- the index name kwarg might be misleading here
ok, let me try inverting
i will try index_prefix instead of index_name
that doesn't appear to have worked either
it may have somehow gotten worse
i specified index_name=llama, index_prefix=x
and then when i searched specifically in index_prefix=y i only got content from index_prefix=x
so something is VERY weird
so, not sure you want to go on this wild goose chase or not. it's entirely possible I simply don't understand how indices should work and am thinking about this wrongly.
Well, this should be fixed somehow -- what good is redis if you can't seperate things into namespaces
Ok here's my solution
When I query blue index, it only returns the blue dog
πŸ”΅ 🐢
what does your query code look like?
blue_index.as_query_engine().query("What color is the dog?")
it correctly responds with blue, and there is only a single source node (the blue dog node)
Same story with the other index
Initially when I replicated the issue, it said The dog is both blue and red and fetched both nodes lol
ok, interesting
probably worth some kind of docs update
Add a reply
Sign up and join the conversation on Discord