first, I index stuff:
docs_store = RedisVectorStore(
index_name="openshift-docs",
redis_url=f"redis://{redis_hostname}:6379",
overwrite=True,
)
webconsole_documents = SimpleDirectoryReader('openshift-docs', file_metadata=filename_fn).load_data()
webconsole_storage_context = StorageContext.from_defaults(vector_store=docs_store)
webconsole_index = VectorStoreIndex.from_documents(webconsole_documents, storage_context=webconsole_storage_context, service_context=service_context)
usecase_store = RedisVectorStore(
index_name="summary-docs",
redis_url=f"redis://{redis_hostname}:6379",
overwrite=True,
)
summary_documents = SimpleDirectoryReader('summary-docs', file_metadata=filename_fn).load_data()
summary_storage_context = StorageContext.from_defaults(vector_store=usecase_store)
summary_index = VectorStoreIndex.from_documents(summary_documents, storage_context=summary_storage_context, service_context=service_context)
then I specifically query an index:
vector_store = RedisVectorStore(
index_name="summary-docs",
redis_url=f"redis://{redis_hostname}:6379",
overwrite=False,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)
query = "what are the steps for configuring cluster autoscaling?"
response = index.as_query_engine(verbose=True, streaming=False).query(query)
referenced_documents = "\n\nReferenced documents:\n"
for source_node in response.source_nodes:
referenced_documents += source_node.node.metadata['file_name'] + '\n'
print()
print(referenced_documents)
But the context passed to the LLM includes data from the other index (BOTH openshift-docs and summary-docs)
the index query engine I'm creating is clearly for a vectorstore with a specified index of summary-docs
but it definitely pulls an answer from openshift-docs
now maybe my expectation for how this works is incorrect
so, happy to be told "that's not how it works" π
Are you familiar with the redis python package?
I see this line when adding nodes to the vector store:
self._redis_client.hset(key, mapping=mapping)
-- but shouldn't this include some mention of the index name?
We use the index name when querying
results = self._redis_client.ft(self._index_name).search(
redis_query, query_params=query_params # type: ignore
)
Are you familiar with the redis python package?
you can assume that I am unfamiliar with
python, let alone specific packages π
thanks for pointing this out
glad to have broken something YET AGAIN
this is related to the other thread because i'm trying to keep separate indexes of documentation in varying degrees of completeness
ha yea. I guess the redis vector store must not be used often for this to be missed π
funny because redis is so lightweight and easy
it clearly knows about my indices:
redis-cli FT._LIST
summary-docs
openshift-docs
the summary-docs
index has everything
so it's not that the retrieve is getting things from multiple indexes
Is it because you have overwrite=True for both?
it's that indexing is putting things in the "wrong" place
but why would overwrite=True
matter here?
you should be specifying which index you put things into
Yea just looking at your code trying to figure out a reason haha
i mean you should be able to create a reproducer pretty easily
just index 2 documents separately
If summary-docs has everything, does openshift-docs have nothing?
Yea just need to spin up redis
it looks like both indexes have everything
Yea will try to reproduce
Off the top of my head, I know a unique prefix will probably help separate the documents -- the index name kwarg might be misleading here
i will try index_prefix
instead of index_name
that doesn't appear to have worked either
it may have somehow gotten worse
i specified index_name=llama, index_prefix=x
and then when i searched specifically in index_prefix=y
i only got content from index_prefix=x
so something is VERY weird
so, not sure you want to go on this wild goose chase or not. it's entirely possible I simply don't understand how indices should work and am thinking about this wrongly.
Well, this should be fixed somehow -- what good is redis if you can't seperate things into namespaces
When I query blue index, it only returns the blue dog
what does your query code look like?
blue_index.as_query_engine().query("What color is the dog?")
it correctly responds with blue, and there is only a single source node (the blue dog node)
Same story with the other index
Initially when I replicated the issue, it said The dog is both blue and red
and fetched both nodes lol
probably worth some kind of docs update