Find answers from the community

Updated 2 years ago

Hello A question about chunks When

At a glance

The community members discuss the purpose of passing the chunk_size parameter when querying a vector database. They explain that the chunk_size is primarily used when creating the index, as it determines the size of the text chunks stored in the database. However, when querying the database, passing the chunk_size is not strictly necessary.

The community members further discuss what happens when the chunk_size values are different between index creation and querying. They clarify that if the chunk_size is changed and insert() is called, the newly inserted documents will have the new chunk_size, while the older documents will remain unchanged. They recommend always passing the correct chunk_size value, even if it's not strictly required, to avoid potential issues.

Hello! A question about chunks. When querying the vector database, why pass the chunk_size? I get it when we created the index - chunk size determined the size of pieces of text to be stored, but here, why?
Another question is what happens when chunk_sizes are different: say, when we created the index, it was 1024 and when querying it's 256?
Thanks!

Plain Text
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size=chunk_size, 
                                                        callback_manager=callback_manager)
index = VectorStoreIndex.from_vector_store(vector_store, service_context)
chat_engine = index.as_chat_engine()
response = chat_engine.chat(q)
L
S
9 comments
You don't need to pass the chunk size really, it only gets used when calling from_documents() or insert()

If you change the chunk size and call insert(), then the document(s) you insert will have that chunk size, and older documents remain unchanged
When you say that I don't need to pass the chunk size, do you mean ServiceContext.from_defaults in my example?
yea exactly. Like, it's only needed if a) you changed the default chunk size from 1024 and b) plan on using insert() or from_documents() with that service context
Well, okay, my chunk size is not equal 1024, should I pass the exact value?
Sure, why not πŸ™‚

Again, it's only used if you call insert() or from_documents()
No harm in always passing it
then, what if on insert it was one value, and ServiceContext.from_defaults another? It happened I changed the default chunk size so my older clients already have their chunks equal 1024, I wonder what happens now if I pass 256 to ServiceContext.from_defaults?
If you call insert with a new value, then the newly inserted chunks will have the chunk size you specified
Not a big deal really, but something to be aware of
Add a reply
Sign up and join the conversation on Discord