Getting a weird error. I'm finding for

At a glance

The community member is experiencing an issue with the llama-index library, where setting the chunk_size and chunk_overlap directly on the service_context and passing it to VectorStoreIndex.from_documents() does not seem to have any effect. They are trying to use a text splitter to force different chunking, but are encountering a ValueError about not providing both a text_splitter and node_parser.

Other community members have provided some suggestions, such as checking the llama-index version (0.9.8.post1) and providing examples of how to set the chunk_size and chunk_overlap using the ServiceContext.from_defaults() method. However, one community member is also experiencing a similar issue with the latest version of the library, where they are unable to set the global service context.

The community members are continuing to troubleshoot the issue and suggest creating a fresh virtual environment to ensure everything is installed correctly.

DDarthus

Getting a weird error. I'm finding for some reason that setting the chunk_size and chunk_overlap directly on the service_context and passing it to VectorStoreIndex.from_documents() seems to make no difference, it always chunks it the same way, so now I'm trying to pass a Text splitter to force different chunking, using this code and I'm getting a weird error about not providing both a text_splitter and node_parser (ValueError: Cannot specify both text_splitter and node_parser), when I'm only specifying one:

Plain Text

embed_model = OpenAIEmbedding(embed_batch_size=10)
    llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0)
    token_counter = TokenCountingHandler(
        tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo-16k").encode
    )
    callback_manager = CallbackManager([token_counter])
    text_splitter = SentenceSplitter(chunk_size=128, chunk_overlap=15)
    service_context = ServiceContext.from_defaults(
        llm=llm, 
        text_splitter = text_splitter,
        callback_manager=callback_manager, 
        embed_model=embed_model
    )

10 comments

LLogan M

I can take a look at this in a bit!

DDarthus

I am also setting the global service context and then doing index = VectorStoreIndex.from_documents(document_list_new, service_context=service_context, show_progress=True) if that matters.

LLogan M

What llama-index version do you have?

Setting the chunk_size/chunk_overlap definitely seems to work

Plain Text

>>> ctx = ServiceContext.from_defaults(chunk_size=20, chunk_overlap=2)
>>> ctx.node_parser
SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7efd72b234d0>, chunk_size=20, chunk_overlap=2, separator=' ', paragraph_separator='\n\n\n', secondary_chunking_regex='[^,.;。？！]+[,.;。？！]?')

And passing in text_splitter also works

Plain Text

>>> from llama_index.text_splitter import SentenceSplitter
>>> ctx = ServiceContext.from_defaults(llm=None, embed_model=None, text_splitter=SentenceSplitter(chunk_size=20, chunk_overlap=2))
>>> ctx.node_parser
SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7efd40669710>, chunk_size=20, chunk_overlap=2, separator=' ', paragraph_separator='\n\n\n', secondary_chunking_regex='[^,.;。？！]+[,.;。？！]?')

DDarthus

0.9.8.post1, but I'll try those snippets of code and compare with mien and let you know if it's still not working.

AAneerudh

Sorry to buttin in, even i got same error i'm using latest version too, couldn't able to set global service context

Plain Text

AttributeError: module 'llama_index' has no attribute 'global_service_context'

AAneerudh

Attachment

LLogan M

do you have a file or folder named "llama_index" ? Maybe try renaming that if so

AAneerudh

No @Logan M I don't have any folder or file name as llama_index

LLogan M

Not sure then -- I would create a fresh venv and make sure everything is installed correctly

AAneerudh

Let me try that

Add a reply

Find answers from the community

Getting a weird error. I'm finding for