Find answers from the community

Updated 3 months ago

Hi, how to pass the chunk size and embed

Hi, how to pass the chunk size and embed model if ServiceContext is not in use anymore? My code:

Plain Text
    vector_store = storage_service.get_vector_store(collection_name, db_name)
    embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000, api_key=api_Key)
    service_context = ServiceContext.from_defaults(chunk_size=project_chunk_size, embed_model=embed_model,
                                                    llm=None,
                                                    callback_manager=token_counter_callback_manager)
    node_parser = SimpleNodeParser.from_defaults(chunk_size=project_chunk_size, chunk_overlap=20)
    index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

The StorageContext doesn't have those parameters. Thanks!
W
S
L
11 comments
You can still use ServiceContext but with warning!

To do it with the new way is to define in the settings.

Plain Text
from llama_index.core import Settings
Settings.node_parser = SimpleNodeParser.from_defaults(chunk_size=project_chunk_size, chunk_overlap=20)
Thanks! I can't use Settings because this is a global object and my clients they all have their own settings
And what to do with node_parser then?
No I don't see
You can chunk the new docs based on this settings!
VectorStoreIndex(..., transformations=[SentenceSplitter(chunk_size=512)])
You can pass in the node parser under transformations, similar to the ingestion pipeline
I still don't see how.. in your example you create a SenteceSplitter
And the splitter has the chunk size
That's how it works
Add a reply
Sign up and join the conversation on Discord