Find answers from the community

Updated 3 months ago

Graphs

Folks, I'm getting confused when reading the Composability docs. I have a number of Chroma Collections which I'm loading using a GPTChromaIndex for each index. I'd like to be able to query across all the documents in all indexes, or sometimes maybe the user wants to constrain the search. There will be many documents, perhaps thousands. Where I'm getting confused is -> Do I need to compose a graph? If I do compose a graph, should there be an index per document as in the sample? Chroma is using OpenAI embeddings, it's very good at finding related content... how can I best rely on embedding similarity to find content?
L
j
14 comments
I would only create a graph if you've exhausted your options with a single index and it's still not performing as well as you want.

From what I've seen, embeddings should work fine in most cases, unless you have an easy way to group your documents into specific topics.

With just a single vector index, you can try modifying the chunk size when building the index (default is 3900 tokens).

You can also adjust the top k in your query index.query(..., similarity_top_k=3, response_mode="compact") with a higher top k, a smaller chunk size will help speed up responses (along with setting that response size)

However, decreasing chunk size too much can make answers harder to find
Okay cool! Will the index query return more than one chunk if it makes sense to?
Also where does one feed in the chunk size when building the index?
I have
Plain Text
 service_context = ServiceContext.from_defaults(
    chunk_size_limit=512, embed_model=embeddings)
Plain Text
github_client = GithubClient(os.getenv("GITHUB_TOKEN"))

        loader = GithubRepositoryReader(
            github_client,            
            **kwargs
        )
Plain Text
index = GPTChromaIndex.from_documents(
        docs_content, service_context=service_context, chroma_collection=chroma_collection)
Putting it in the service context is the right place πŸ’ͺ🫑
Currently, it only returns a hard-coded number of nodes. The default is 1
Any thoughts on how to return node contents up to token length (parameter) and have the llm answer the question across that text as the context?
Plain Text
while total_tokens < token_max: keep appending nodes from index which has been sorted by most relevant based on vector similarity
Hmm setting response_mode="compact" in the query should do that. It fills each request with the maximum number of tokens (from the pool of text available after fetching the top k)
okay sweet, will give it a shot
Add a reply
Sign up and join the conversation on Discord