Find answers from the community

Updated 2 years ago

Weaviate 0.6.0

Maybe I should rephrase. I'm not even sure if this is a bug right now πŸ˜… I using the WeaviateVectorStore from which I create a StorageContext. Then I create the GPTVectorStoreIndex from some default txt documents and attempt to query it. I assumed that using this VectorStore it would "store" the created node with the embeddings in weaviate. But when checking my objects in weaviate, the nodes are stored, but the vectorWeights in weaviate remain null. Do these need to be re-embedded every time during query-time. And if so, why?
1
L
O
j
38 comments
Hmm they definitely should have been stored in weaviate!
You said you were using a custom embedding model though?
yeah, this is the code
Plain Text
if __name__ == "__main__":
    client = weaviate.Client("http://localhost:8080")
    print("loading model...")
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name=model_name))
    service_context = ServiceContext.from_defaults(embed_model=embed_model)
    print("loading model done...")
    print("loading docs...")
    documents = SimpleDirectoryReader('./data').load_data()
    print("loading docs done...")
    print("Loading Vector store and context")
    vector_store = WeaviateVectorStore(weaviate_client=client)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    index = GPTVectorStoreIndex.from_documents(
        documents,
        service_context=service_context,
        storage_context=storage_context
    )
    print("Loading Vector store and context done")
    # Query index
    query_engine = index.as_query_engine(verbose=True,)
    response = query_engine.query("Tell me a fact about Nieuwdorp")
    print(response)
Here is the object stored in the DB
Plain Text
{
  "class": "Gpt_Index_6525409999093927214_Node",
  "creationTimeUnix": 1682951147197,
  "id": "31400dfe-589b-4485-94cf-79a8c90a9dfe",
  "lastUpdateTimeUnix": 1682951147197,
  "properties": {
    "doc_hash": "90b2328a8faa9559657c99708c8ae79e3d466a62bdf087453853f42d137ba951",
    "doc_id": "edf995e0-8e0a-4e0f-9b02-1823f14b7e57",
    "extra_info": "",
    "node_info": "{\"start\": 0, \"end\": 1179}",
    "ref_doc_id": "af1ae55f-8329-4a90-854b-ae1d0fd3dc1a",
    "relationships": "{\"1\": \"af1ae55f-8329-4a90-854b-ae1d0fd3dc1a\"}",
    "text": "Nieuwdorp is a village in the Dutch province of Zeeland. It is a part of the municipality of Borsele, and lies about 9 km east of Middelburg.\n\n\n== History ==\nThe village was first mentioned around 1750 as Het Nieuwe Dorp, and means \"new village\". Nieuw (new) has been added to distinguish from 's-Heer Arendskerke which was colloquially called Oudedorp. Nieuwdorp is a dike village which appeared after the West-Kraayertpolder was poldered in 1642 and Nieuwe Kraayertpolder was added in 1675. The village is centred around a large square.The Reformed Church was built first in 1841 and is an aisleless church in neoclassic style. It was converted into an apartment building in 2003. The Dutch Reformed church was built between 1917 and 1918 and has expressionist elements. The tower contains a bell from 1710.Nieuwdorp was home to 388 people in 1840. Nieuwdorp used to be part of the municipality of 's-Heer Arendskerke. In 1970, it became part of the municipality of Borsele. In 2009, the Liberation Museum Zeeland opened and provides an overview of World War II in the province of Zeeland with an emphasis on the Battle of the Scheldt.\n\n\n== Gallery ==\n\n\t\t\n\t\t\n\n\n== References =="
  },
  "vectorWeights": null
}
vectorWeights remain null
Hmmm. If its not a huge number of documents, if you use the default embed model from openai, I wonder if it works?
Your code looks right to me though
So the embeddings should be stored in Weaviate, right?
99% sure yes haha
Since that's what it's good at πŸ‘
During query-time the embeddings are present though, so I reckon it just forgets to pass along the custom_vector param when inserting them into weaviate
yeah I was surprised already hahaha
vectordb for storing chunks of text seemed a bit unconventional
Haha yea, the text storage is just a bonus πŸ˜… I just wonder if it has something to do with using a custom embed model, or where the issue is exactly?
attempting with no service_context provided rn
vectorWeights are still null
hmmm that might be a bug then
Did you import the weaviate client from llama index?
This is where mainly all the weaviate stuff happens in the codebase btw

https://github.com/jerryjliu/llama_index/tree/main/gpt_index/readers/weaviate
uhhhhhhhhhhhhhhhhhh we're importing it from weaviate rn
I didn't know there was a gptindex weaviate Client class, let me try that out real quick
I'm not quite sure what you mean actually? I cannot find a custom weaviate client from llama_index that we can actually use
Jk I lied haha
I thought the file that said client was a custom client lol
@jerryjliu0 possible weaviate bug here? No embeddings appear to be inserted, unless there's somewhere else to look
The client.py does seem to have the funcitonality to save the vector, as it states in line 221
Plain Text
    client.batch.add_data_object(node_dict, class_name, node_id, vector)
But the vector never seems to get set before that
Thanks for raising this, let me take a look at what's going on
hey, AFAIK the embedding should be there, especially if you can query it without problem
vectorWeights is about the vector weights, not about the vectors
@Logan M I added this test while trying to debug, maybe you could help review
Thanks Simon, that makes more sense haha

I'll take a look!
Thanks for the explanation, makes sense now that you actually mention it! I must've overlooked the embeddings somewhere in my weaviate object store then. :)
Add a reply
Sign up and join the conversation on Discord