LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

Weaviate 0.6.0

Weaviate 0.6.0

At a glance

OOverclockedClock

·

Maybe I should rephrase. I'm not even sure if this is a bug right now 😅 I using the WeaviateVectorStore from which I create a StorageContext. Then I create the GPTVectorStoreIndex from some default txt documents and attempt to query it. I assumed that using this VectorStore it would "store" the created node with the embeddings in weaviate. But when checking my objects in weaviate, the nodes are stored, but the vectorWeights in weaviate remain null. Do these need to be re-embedded every time during query-time. And if so, why?

1

L

O

j

38 comments

Hmm they definitely should have been stored in weaviate!

You said you were using a custom embedding model though?

OOverclockedClock

yeah, this is the code

OOverclockedClock

Plain Text

if __name__ == "__main__":
    client = weaviate.Client("http://localhost:8080")
    print("loading model...")
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name))
    service_context = ServiceContext.from_defaults(embed_model=embed_model)
    print("loading model done...")
    print("loading docs...")
    documents = SimpleDirectoryReader('./data').load_data()
    print("loading docs done...")
    print("Loading Vector store and context")
    vector_store = WeaviateVectorStore(weaviate_client=client)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = GPTVectorStoreIndex.from_documents(
        documents,
        service_context=service_context,
        storage_context=storage_context
    )
    print("Loading Vector store and context done")
    # Query index
    query_engine = index.as_query_engine(verbose=True,)
    response = query_engine.query("Tell me a fact about Nieuwdorp")
    print(response)

OOverclockedClock

Here is the object stored in the DB

Plain Text

{
"class": "Gpt_Index_6525409999093927214_Node",
"creationTimeUnix": 1682951147197,
"id": "31400dfe-589b-4485-94cf-79a8c90a9dfe",
"lastUpdateTimeUnix": 1682951147197,
"properties": {
"doc_hash": "90b2328a8faa9559657c99708c8ae79e3d466a62bdf087453853f42d137ba951",
"doc_id": "edf995e0-8e0a-4e0f-9b02-1823f14b7e57",
"extra_info": "",
"node_info": "{\"start\": 0, \"end\": 1179}",
"ref_doc_id": "af1ae55f-8329-4a90-854b-ae1d0fd3dc1a",
"relationships": "{\"1\": \"af1ae55f-8329-4a90-854b-ae1d0fd3dc1a\"}",
"text": "Nieuwdorp is a village in the Dutch province of Zeeland. It is a part of the municipality of Borsele, and lies about 9 km east of Middelburg.\n\n\n== History ==\nThe village was first mentioned around 1750 as Het Nieuwe Dorp, and means \"new village\". Nieuw (new) has been added to distinguish from 's-Heer Arendskerke which was colloquially called Oudedorp. Nieuwdorp is a dike village which appeared after the West-Kraayertpolder was poldered in 1642 and Nieuwe Kraayertpolder was added in 1675. The village is centred around a large square.The Reformed Church was built first in 1841 and is an aisleless church in neoclassic style. It was converted into an apartment building in 2003. The Dutch Reformed church was built between 1917 and 1918 and has expressionist elements. The tower contains a bell from 1710.Nieuwdorp was home to 388 people in 1840. Nieuwdorp used to be part of the municipality of 's-Heer Arendskerke. In 1970, it became part of the municipality of Borsele. In 2009, the Liberation Museum Zeeland opened and provides an overview of World War II in the province of Zeeland with an emphasis on the Battle of the Scheldt.\n\n\n== Gallery ==\n\n\t\t\n\t\t\n\n\n== References =="
},
"vectorWeights": null
}

OOverclockedClock

vectorWeights remain null

Hmmm. If its not a huge number of documents, if you use the default embed model from openai, I wonder if it works?

Your code looks right to me though

OOverclockedClock

So the embeddings should be stored in Weaviate, right?

99% sure yes haha

Since that's what it's good at 👍

OOverclockedClock

During query-time the embeddings are present though, so I reckon it just forgets to pass along the custom_vector param when inserting them into weaviate

OOverclockedClock

yeah I was surprised already hahaha

OOverclockedClock

vectordb for storing chunks of text seemed a bit unconventional

Haha yea, the text storage is just a bonus 😅 I just wonder if it has something to do with using a custom embed model, or where the issue is exactly?

OOverclockedClock

attempting with no service_context provided rn

OOverclockedClock

vectorWeights are still null

Oh boy lol

OOverclockedClock

hmmm that might be a bug then

Did you import the weaviate client from llama index?

This is where mainly all the weaviate stuff happens in the codebase btw

https://github.com/jerryjliu/llama_index/tree/main/gpt_index/readers/weaviate

OOverclockedClock

uhhhhhhhhhhhhhhhhhh we're importing it from weaviate rn

OOverclockedClock

I didn't know there was a gptindex weaviate Client class, let me try that out real quick

OOverclockedClock

I'm not quite sure what you mean actually? I cannot find a custom weaviate client from llama_index that we can actually use

Jk I lied haha

I thought the file that said client was a custom client lol

Whoops

OOverclockedClock

haha nw

@jerryjliu0 possible weaviate bug here? No embeddings appear to be inserted, unless there's somewhere else to look

OOverclockedClock

The client.py does seem to have the funcitonality to save the vector, as it states in line 221

Plain Text

    client.batch.add_data_object(node_dict, class_name, node_id, vector)

OOverclockedClock

But the vector never seems to get set before that

Cc @disiok

Thanks for raising this, let me take a look at what's going on

hey, AFAIK the embedding should be there, especially if you can query it without problem

vectorWeights is about the vector weights, not about the vectors

@Logan M I added this test while trying to debug, maybe you could help review

Thanks Simon, that makes more sense haha

I'll take a look!

OOverclockedClock

Thanks for the explanation, makes sense now that you actually mention it! I must've overlooked the embeddings somewhere in my weaviate object store then. :)

Add a reply

Sign up and join the conversation on Discord