Find answers from the community

Home
Members
erlebach123
e
erlebach123
Offline, last seen 3 months ago
Joined September 25, 2024
I have constructed a SummaryIndex from a collection of nodes. I throw away the nodes. From the SummaryIndex stored in memory, what is the best way to retrieve the collection of nodes? Should I do so via the docstore? I would have thought that a get_nodes() function on the SummaryIndex instance would be useful (it should be defined on a parent class in the class hierarchy), maybe ComponentIndex. Why do I say this? Because when querying, I assume you must iterate over the list of nodes, so there should be an efficient way to access them. Thanks.

I answered my own questions by searching the source code of LlamaIndex:
nodes = summary_index._index_struct.nodes, which involves accessing a private variable. This is useful to learn and use the code, but I should obviously not use this approach in any code I wish to deploy. Do the maintainers of LlamaIndex simply assume that this function is not necessary or useful?
1 comment
L
Can you point me to the author of the following demo on LlamaIndex: Query Pipeline Chat Engine (https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_memory/)? The demo does not work for me: the pipeline states that it only accepts a single output. Apparently, there i an error in lower-level libraries:
Plain Text
File ~/src/2024/llama_index_gordon/basics/.venv/lib/python3.12/site-packages/llama_index/core/query_pipeline/query.py:410, in QueryPipeline.run(self, return_values_direct, callback_manager, batch, *args, **kwargs)
    406     query_payload = json.dumps(str(kwargs))
    407 with self.callback_manager.event(
    408     CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_payload}
    409 ) as query_event:
--> 410     outputs, _ = self._run(
    411         *args,
    412         return_values_direct=return_values_direct,
    413         show_intermediates=False,
    414         batch=batch,
    415         **kwargs,
    416     )
    418     return outputs

Thanks.
13 comments
e
L
W
Here is another questions. Ollama has recently introduced "OLLAMA_NUM_PARALLEL" to allow multiple models to run concurrently. However, I have not seen explicit support for this in Llama-Index. Do you know if there are any experimental features or branches attempting to incorporate the use of OLLAMA_NUM_PARALLEL? Thanks.
3 comments
L
Is is possible to make, say, 10 queries to a LLM and put them in a single batch to get faster results? I know I can do asynchronous querying, but this is different. I would like to do both. Thanks.
4 comments
L
e
I have a question about the use of Settings. Consider the constructor of SentenceSplitter, taken from the LlamaIndex repository:
Plain Text
class SentenceSplitter(MetadataAwareTextSplitter):
    """Parse text with a preference for complete sentences.

    In general, this class tries to keep sentences and paragraphs together. Therefore
    compared to the original TokenTextSplitter, there are less likely to be
    hanging sentences or parts of sentences at the end of the node chunk.
    """

    chunk_size: int = Field(
        default=DEFAULT_CHUNK_SIZE,
        description="The token chunk size for each chunk.",
        gt=0,
    )
...

Clearly, the default chunk size is used if arguments are not specified. For other classes, argument values are taken from Settings. Are there rules that allow us to know when to rely on Settings and when not to? Thanks.
1 comment
W
I have read quite a bit on DocumentSummaryIndex. However, I have not found a single example that stored this index in persistent storage such as a Chroma Database. Of course, I could write custom tools, but would rather not. Has anybody stored this type of index on a file system? I am interested in working examples. Since a summary index can be expensive to compute if there are many long documents, and it is meant to be reused many times, I surmise that such examples must exist. I am working on my laptop (i.e., not in the cloud). Thanks.

I am trying to work with a DocumentSummaryIndex together with Chroma, and I fear I have a seroius misunderstanding. All examples I have see that discuss this particular index do so using VectorStoreindex. The DocumentSummaryIndex is composed of nodes and summaries. Given a set of documents, I chunk them into nodes. I then save these nodes into a Chroma database with the idea to reload them at a later time to consturct my DocumentSummaryIndex. Since I know that indexes can be persisted, I figured that the DocumentSummaryIndex could be stored in the Chroma Databse. Is this correct, or am I mistaken. If I the former, I would really appreciate a minimum working example that demonstrates saving nodes and index to the database and reloading the data. I am working 100% with open source models.
4 comments
e
L