Find answers from the community

Updated 3 months ago

I want to be able to delimit my document

I want to be able to delimit my document with a string and then index it so that each node contains the tokens between the delimiter strings. How do I do this?
b
M
6 comments
can customize the token text splitter
if I understand correctly what you're trying to do
Plain Text
text_splitter = TokenTextSplitter(
    separator="*$$$*",
    chunk_size=1024,
    chunk_overlap=20,
    backup_separators=["","\n"],
)


node_parser = SimpleNodeParser.from_defaults(text_splitter=text_splitter)

service_context = ServiceContext.from_defaults(node_parser=node_parser)

documents = SimpleDirectoryReader("./data").load_data()

d = 1536
faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
                                        documents=documents, 
                                        storage_context=storage_context, 
                                        service_context=service_context,
                                        show_progress=True)

index.storage_context.persist(persist_dir="./indexes")
This is what I am doing, but it is not working, I have way too few nodes, and they contain multiple delimiters
that should be working
The separator is not having any effect, the nodes are separated just as before.
Add a reply
Sign up and join the conversation on Discord