redis index selection

At a glance

hmm... llamaindex appears to be looking at ALL indices in redis and not restricting itself to only the specified index

59 comments

first, I index stuff:

Plain Text

docs_store = RedisVectorStore(
    index_name="openshift-docs",
    redis_url=f"redis://{redis_hostname}:6379",
    overwrite=True,
)

webconsole_documents = SimpleDirectoryReader('openshift-docs', file_metadata=filename_fn).load_data()
webconsole_storage_context = StorageContext.from_defaults(vector_store=docs_store)
webconsole_index = VectorStoreIndex.from_documents(webconsole_documents, storage_context=webconsole_storage_context, service_context=service_context)

usecase_store = RedisVectorStore(
    index_name="summary-docs",
    redis_url=f"redis://{redis_hostname}:6379",
    overwrite=True,
)
summary_documents = SimpleDirectoryReader('summary-docs', file_metadata=filename_fn).load_data()
summary_storage_context = StorageContext.from_defaults(vector_store=usecase_store)
summary_index = VectorStoreIndex.from_documents(summary_documents, storage_context=summary_storage_context, service_context=service_context)

tthoraxe

then I specifically query an index:

Plain Text

vector_store = RedisVectorStore(
    index_name="summary-docs",
    redis_url=f"redis://{redis_hostname}:6379",
    overwrite=False,
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

query = "what are the steps for configuring cluster autoscaling?"
response = index.as_query_engine(verbose=True, streaming=False).query(query)
referenced_documents = "\n\nReferenced documents:\n"
for source_node in response.source_nodes:
    referenced_documents += source_node.node.metadata['file_name'] + '\n'

print()
print(referenced_documents)

tthoraxe

But the context passed to the LLM includes data from the other index (BOTH openshift-docs and summary-docs)

tthoraxe

the index query engine I'm creating is clearly for a vectorstore with a specified index of summary-docs

tthoraxe

but it definitely pulls an answer from openshift-docs

tthoraxe

now maybe my expectation for how this works is incorrect

tthoraxe

so, happy to be told "that's not how it works" 🙂

LLogan M

Are you familiar with the redis python package?

I see this line when adding nodes to the vector store: self._redis_client.hset(key, mapping=mapping) -- but shouldn't this include some mention of the index name?

We use the index name when querying

Plain Text

results = self._redis_client.ft(self._index_name).search(
    redis_query, query_params=query_params  # type: ignore
)

LLogan M

seems sus

tthoraxe

Are you familiar with the redis python package?

you can assume that I am unfamiliar with python, let alone specific packages 😆

LLogan M

hahaha fair enough

LLogan M

I'll read some docs

LLogan M

thanks for pointing this out

tthoraxe

glad to have broken something YET AGAIN

tthoraxe

at least I'm consistent.

tthoraxe

this is related to the other thread because i'm trying to keep separate indexes of documentation in varying degrees of completeness

LLogan M

ha yea. I guess the redis vector store must not be used often for this to be missed 😅

tthoraxe

funny because redis is so lightweight and easy

tthoraxe

https://stackoverflow.com/a/76868207
this looks like what you're doing

tthoraxe

it clearly knows about my indices:

Plain Text

redis-cli FT._LIST
summary-docs
openshift-docs

tthoraxe

so the indices exist

tthoraxe

ok, interesting

tthoraxe

the summary-docs index has everything

tthoraxe

so it's not that the retrieve is getting things from multiple indexes

LLogan M

Is it because you have overwrite=True for both?

tthoraxe

it's that indexing is putting things in the "wrong" place

tthoraxe

but why would overwrite=True matter here?

tthoraxe

you should be specifying which index you put things into

tthoraxe

I will try it without

LLogan M

Yea just looking at your code trying to figure out a reason haha

tthoraxe

i mean you should be able to create a reproducer pretty easily

tthoraxe

just index 2 documents separately

LLogan M

If summary-docs has everything, does openshift-docs have nothing?

LLogan M

Yea just need to spin up redis

tthoraxe

about to check

tthoraxe

it looks like both indexes have everything

LLogan M

Yea will try to reproduce

LLogan M

Off the top of my head, I know a unique prefix will probably help separate the documents -- the index name kwarg might be misleading here

tthoraxe

ok, let me try inverting

tthoraxe

i will try index_prefix instead of index_name

tthoraxe

that doesn't appear to have worked either

tthoraxe

it may have somehow gotten worse

tthoraxe

i specified index_name=llama, index_prefix=x

tthoraxe

and then when i searched specifically in index_prefix=y i only got content from index_prefix=x

tthoraxe

so something is VERY weird

tthoraxe

so, not sure you want to go on this wild goose chase or not. it's entirely possible I simply don't understand how indices should work and am thinking about this wrongly.

LLogan M

:PepeHands:

LLogan M

Well, this should be fixed somehow -- what good is redis if you can't seperate things into namespaces

LLogan M

Ok here's my solution

LLogan M

Attachment