Find answers from the community

Updated last year

text splitter

At a glance

The post asks how to customize the Text Splitter using SpacyTextSplitter. Community members provide suggestions, including using a Langchain splitter and providing specific code examples. One community member encounters an error with the expected type, and another community member suggests passing the node parser into the service context and then the service context into the index. The issue is resolved after an update to the llama-index library.

Useful resources

CChuanYue

how to customize Text Splitter use SpacyTextSplitter?

11 comments

bbmax

Hey @ChuanYue check this out https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/node_parsers/usage_pattern.html#text-splitter-customization

bbmax

you can use any Langchain splitter

CChuanYue

@bmax i got this Expected type 'TextSplitter | None', got 'SpacyTextSplitter' instead

bbmax

can you send that portion of code @ChuanYue and imports

CChuanYue

Plain Text

text_splitter = SpacyTextSplitter(chunk_size=512)
    parser = SimpleNodeParser.from_defaults(text_splitter=text_splitter)
    documents = SimpleDirectoryReader(file_path, filename_as_id=True).load_data()
    parser.get_nodes_from_documents(documents)

CChuanYue

@bmax Is that right

bbmax

that looks mostly right @ChuanYue -- you'll have to pass the node parser into the service_context and then the service context into the index

bbmax

what is your error stack trace exactly?

CChuanYue

here

LLogan M

@ChuanYue what version of llama-index do you have? I fixed this in a recent version

CChuanYue

@Logan M It's already good after my update, thanks

Add a reply