SummaryIndex
from a collection of nodes. I throw away the nodes. From the SummaryIndex
stored in memory, what is the best way to retrieve the collection of nodes? Should I do so via the docstore
? I would have thought that a get_nodes()
function on the SummaryIndex
instance would be useful (it should be defined on a parent class in the class hierarchy), maybe ComponentIndex
. Why do I say this? Because when querying, I assume you must iterate over the list of nodes, so there should be an efficient way to access them. Thanks.nodes = summary_index._index_struct.nodes
, which involves accessing a private variable. This is useful to learn and use the code, but I should obviously not use this approach in any code I wish to deploy. Do the maintainers of LlamaIndex simply assume that this function is not necessary or useful?File ~/src/2024/llama_index_gordon/basics/.venv/lib/python3.12/site-packages/llama_index/core/query_pipeline/query.py:410, in QueryPipeline.run(self, return_values_direct, callback_manager, batch, *args, **kwargs) 406 query_payload = json.dumps(str(kwargs)) 407 with self.callback_manager.event( 408 CBEventType.QUERY, payload={EventPayload.QUERY_STR: query_payload} 409 ) as query_event: --> 410 outputs, _ = self._run( 411 *args, 412 return_values_direct=return_values_direct, 413 show_intermediates=False, 414 batch=batch, 415 **kwargs, 416 ) 418 return outputs
SentenceSplitter
, taken from the LlamaIndex repository: class SentenceSplitter(MetadataAwareTextSplitter): """Parse text with a preference for complete sentences. In general, this class tries to keep sentences and paragraphs together. Therefore compared to the original TokenTextSplitter, there are less likely to be hanging sentences or parts of sentences at the end of the node chunk. """ chunk_size: int = Field( default=DEFAULT_CHUNK_SIZE, description="The token chunk size for each chunk.", gt=0, ) ...
DocumentSummaryIndex
. However, I have not found a single example that stored this index in persistent storage such as a Chroma Database. Of course, I could write custom tools, but would rather not. Has anybody stored this type of index on a file system? I am interested in working examples. Since a summary index can be expensive to compute if there are many long documents, and it is meant to be reused many times, I surmise that such examples must exist. I am working on my laptop (i.e., not in the cloud). Thanks.DocumentSummaryIndex
together with Chroma, and I fear I have a seroius misunderstanding. All examples I have see that discuss this particular index do so using VectorStoreindex
. The DocumentSummaryIndex
is composed of nodes and summaries. Given a set of documents, I chunk them into nodes. I then save these nodes into a Chroma database with the idea to reload them at a later time to consturct my DocumentSummaryIndex.
Since I know that indexes can be persisted, I figured that the DocumentSummaryIndex
could be stored in the Chroma Databse. Is this correct, or am I mistaken. If I the former, I would really appreciate a minimum working example that demonstrates saving nodes and index to the database and reloading the data. I am working 100% with open source models.