The community members discuss ways to improve the response time from queries in llama-index. The suggestions include:
- Setting a smaller chunk_size in the service_context
- Avoiding the use of complex index structures if possible
- Enabling streaming
Regarding the ideal chunk_size value, the community members suggest that the default of 1024 is a good balance between speed and quality of generated embeddings, and that going much lower than 512 may not be advisable.
Either setting a smaller chunk_size in the service_context, avoiding using complex index structures if possible, or enabling streaming, will improve the speed (or at least make it feel faster, i.e. with streaming)