Find answers from the community

Updated 3 months ago

Getting a weird error. I'm finding for

Getting a weird error. I'm finding for some reason that setting the chunk_size and chunk_overlap directly on the service_context and passing it to VectorStoreIndex.from_documents() seems to make no difference, it always chunks it the same way, so now I'm trying to pass a Text splitter to force different chunking, using this code and I'm getting a weird error about not providing both a text_splitter and node_parser (ValueError: Cannot specify both text_splitter and node_parser), when I'm only specifying one:

Plain Text
embed_model = OpenAIEmbedding(embed_batch_size=10)
    llm = OpenAI(model="gpt-3.5-turbo-16k", temperature=0)
    token_counter = TokenCountingHandler(
        tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo-16k").encode
    )
    callback_manager = CallbackManager([token_counter])
    text_splitter = SentenceSplitter(chunk_size=128, chunk_overlap=15)
    service_context = ServiceContext.from_defaults(
        llm=llm, 
        text_splitter = text_splitter,
        callback_manager=callback_manager, 
        embed_model=embed_model
    )
L
D
A
10 comments
I can take a look at this in a bit!
I am also setting the global service context and then doing index = VectorStoreIndex.from_documents(document_list_new, service_context=service_context, show_progress=True) if that matters.
What llama-index version do you have?

Setting the chunk_size/chunk_overlap definitely seems to work
Plain Text
>>> ctx = ServiceContext.from_defaults(chunk_size=20, chunk_overlap=2)
>>> ctx.node_parser
SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7efd72b234d0>, chunk_size=20, chunk_overlap=2, separator=' ', paragraph_separator='\n\n\n', secondary_chunking_regex='[^,.;。?!]+[,.;。?!]?')


And passing in text_splitter also works

Plain Text
>>> from llama_index.text_splitter import SentenceSplitter
>>> ctx = ServiceContext.from_defaults(llm=None, embed_model=None, text_splitter=SentenceSplitter(chunk_size=20, chunk_overlap=2))
>>> ctx.node_parser
SentenceSplitter(include_metadata=True, include_prev_next_rel=True, callback_manager=<llama_index.callbacks.base.CallbackManager object at 0x7efd40669710>, chunk_size=20, chunk_overlap=2, separator=' ', paragraph_separator='\n\n\n', secondary_chunking_regex='[^,.;。?!]+[,.;。?!]?')
0.9.8.post1, but I'll try those snippets of code and compare with mien and let you know if it's still not working.
Sorry to buttin in, even i got same error i'm using latest version too, couldn't able to set global service context

Plain Text
AttributeError: module 'llama_index' has no attribute 'global_service_context'
do you have a file or folder named "llama_index" ? Maybe try renaming that if so
No @Logan M I don't have any folder or file name as llama_index
Not sure then -- I would create a fresh venv and make sure everything is installed correctly
Let me try that
Add a reply
Sign up and join the conversation on Discord