Find answers from the community

Updated 2 months ago

It isn't clear to me the default

It isn't clear to me the default chunking and tokenization that is being performed under VectorStoreIndex.from_documents(). Usually I can figure this out on my own, but having difficulty. Is this documented somewhere?
L
n
2 comments
SentenceSplitter() using chunk_size=1024 and gpt-3.5 tokenizer

I agree it's a bit opaque -- The ingestion pipeline is a bit more preffered, since it's much more transparent as to whats happening

https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/root.html
Awesome, thank you @Logan M
Add a reply
Sign up and join the conversation on Discord