Find answers from the community

Updated 6 months ago

Getting a weird error. I'm finding for

At a glance

The community member is experiencing an issue with the llama-index library, where setting the chunk_size and chunk_overlap directly on the service_context and passing it to VectorStoreIndex.from_documents() does not seem to have any effect. They are trying to use a text splitter to force different chunking, but are encountering a ValueError about not providing both a text_splitter and node_parser.

Other community members have provided some suggestions, such as checking the llama-index version (0.9.8.post1) and providing examples of how to set the chunk_size and chunk_overlap using the ServiceContext.from_defaults() method. However, one community member is also experiencing a similar issue with the latest version of the library, where they are unable to set the global service context.

The community members are continuing to troubleshoot the issue and suggest creating a fresh virtual environment to ensure everything is installed correctly.

Getting a weird error. I'm finding for some reason that setting the chunk_size and chunk_overlap directly on the service_context and passing it to VectorStoreIndex.from_documents() seems to make no difference, it always chunks it the same way, so now I'm trying to pass a Text splitter to force different chunking, using this code and I'm getting a weird error about not providing both a text_splitter and node_parser (ValueError: Cannot specify both text_splitter and node_parser), when I'm only specifying one:

Plain Text
embed_model = OpenAIEmbedding(embed_batch_size=10)
    llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0)
    token_counter = TokenCountingHandler(
        tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo-16k").encode
    )
    callback_manager = CallbackManager([token_counter])
    text_splitter = SentenceSplitter(chunk_size=128, chunk_overlap=15)
    service_context = ServiceContext.from_defaults(
        llm=llm, 
        text_splitter = text_splitter,
        callback_manager=callback_manager, 
        embed_model=embed_model
    )
L
D
A
10 comments
I can take a look at this in a bit!
I am also setting the global service context and then doing index = VectorStoreIndex.from_documents(document_list_new, service_context=service_context, show_progress=True) if that matters.
What llama-index version do you have?

Setting the chunk_size/chunk_overlap definitely seems to work
Plain Text
>>> ctx = ServiceContext.from_defaults(chunk_size=20, chunk_overlap=2)
>>> ctx.node_parser
SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7efd72b234d0>, chunk_size=20, chunk_overlap=2, separator=' ', paragraph_separator='\n\n\n', secondary_chunking_regex='[^,.;。?!]+[,.;。?!]?')


And passing in text_splitter also works

Plain Text
>>> from llama_index.text_splitter import SentenceSplitter
>>> ctx = ServiceContext.from_defaults(llm=None, embed_model=None, text_splitter=SentenceSplitter(chunk_size=20, chunk_overlap=2))
>>> ctx.node_parser
SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7efd40669710>, chunk_size=20, chunk_overlap=2, separator=' ', paragraph_separator='\n\n\n', secondary_chunking_regex='[^,.;。?!]+[,.;。?!]?')
0.9.8.post1, but I'll try those snippets of code and compare with mien and let you know if it's still not working.
Sorry to buttin in, even i got same error i'm using latest version too, couldn't able to set global service context

Plain Text
AttributeError: module 'llama_index' has no attribute 'global_service_context'
do you have a file or folder named "llama_index" ? Maybe try renaming that if so
No @Logan M I don't have any folder or file name as llama_index
Not sure then -- I would create a fresh venv and make sure everything is installed correctly
Let me try that
Add a reply
Sign up and join the conversation on Discord