index.query(..., similarity_top_k=3, response_mode="compact")
with a higher top k, a smaller chunk size will help speed up responses (along with setting that response size)service_context = ServiceContext.from_defaults( chunk_size_limit=512, embed_model=embeddings)
github_client = GithubClient(os.getenv("GITHUB_TOKEN")) loader = GithubRepositoryReader( github_client, **kwargs )
index = GPTChromaIndex.from_documents( docs_content, service_context=service_context, chroma_collection=chroma_collection)
while total_tokens < token_max: keep appending nodes from index which has been sorted by most relevant based on vector similarity
in the query should do that. It fills each request with the maximum number of tokens (from the pool of text available after fetching the top k)