index.query(..., similarity_top_k=3, response_mode="compact")
with a higher top k, a smaller chunk size will help speed up responses (along with setting that response size)service_context = ServiceContext.from_defaults( chunk_size_limit=512, embed_model=embeddings)
github_client = GithubClient(os.getenv("GITHUB_TOKEN")) loader = GithubRepositoryReader( github_client, **kwargs )
index = GPTChromaIndex.from_documents( docs_content, service_context=service_context, chroma_collection=chroma_collection)
while total_tokens < token_max: keep appending nodes from index which has been sorted by most relevant based on vector similarity
response_mode="compact"
in the query should do that. It fills each request with the maximum number of tokens (from the pool of text available after fetching the top k)