When you input documents into an index, if they are larger than the chunk size (so 1024 tokens now), the document will get broken into chunks with some overlap
so a large chunk size means you will spend more tokens, but you will have more context for gpt - whereas a small token size will keep tokens spend to a minimum, but it won't give it as much context on a gpt query?
But then to offset this, you can also set similarity_top_k, which will fetch more chunks. Setting response_mode="compact" in the query will also stuff as much text as possible into each LLM call, rather than calling once per top_k chunk
If all the retrieved chunks don't fit into one LLM call, the answer is refined across a few calls