Find answers from the community

Updated 3 months ago

Splitting documents

Ah, I'm sure I used something previously when working with this (back when it was GPT Index) that split the files into pieces

6 comments

LLogan M

Once you pass the documents into an index, it will handle the splitting for you under the hood

TThePlanMan

Ah, when I looked at the index.json, it looked like it was one massive piece, maybe I've not understood the format! I'll have another look over it!

LLogan M

It will do rather large chunks at the beginning (4000 tokens i think, with some overlap), and then break them up again at query time to make sure they fit into prompts

You can also set chunk_size_limit during index construction to manually set the size limit for each chunk

TThePlanMan

Ah, chunk_size_limit sound familiar! I think that's what I did last time - the data I'm working with is kind of dense, so smaller chunks ends up getting me better results. Thank you very much for your help!

TThePlanMan

Does the index.query function allow me to specify which openAI model to use? (I'm hoping to use chatGPT for cheaper queries)

LLogan M

Yea for sure! See this notebook for an example of using chatgpt https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb

Just need to pass in llm_predictor when you query/create/load the index

Add a reply