Hi, how to pass the chunk size and embed

At a glance

Hi, how to pass the chunk size and embed model if ServiceContext is not in use anymore? My code:

Plain Text

    vector_store = storage_service.get_vector_store(collection_name, db_name)
    embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000, api_key=api_Key)
    service_context = ServiceContext.from_defaults(chunk_size=project_chunk_size, embed_model=embed_model,
                                                    llm=None,
                                                    callback_manager=token_counter_callback_manager)
    node_parser = SimpleNodeParser.from_defaults(chunk_size=project_chunk_size, chunk_overlap=20)
    index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

The StorageContext doesn't have those parameters. Thanks!

11 comments

WWhiteFang_Jr

You can still use ServiceContext but with warning!

To do it with the new way is to define in the settings.

Plain Text

from llama_index.core import Settings
Settings.node_parser = SimpleNodeParser.from_defaults(chunk_size=project_chunk_size, chunk_overlap=20)

SSeaCat

Thanks! I can't use Settings because this is a global object and my clients they all have their own settings

SSeaCat

And what to do with node_parser then?

SSeaCat

Ah I see

SSeaCat

No I don't see

WWhiteFang_Jr

You can chunk the new docs based on this settings!

LLogan M

VectorStoreIndex(..., transformations=[SentenceSplitter(chunk_size=512)])

LLogan M

You can pass in the node parser under transformations, similar to the ingestion pipeline

SSeaCat

I still don't see how.. in your example you create a SenteceSplitter

LLogan M

And the splitter has the chunk size

LLogan M

That's how it works

Add a reply

Find answers from the community

Hi, how to pass the chunk size and embed