Find answers from the community

Updated last year

When I load an index from storage, I

When I load an index from storage, I want to see some simple statistics about it--number of bytes, number/name of docs, created/modified dates, etc. How can I do that?
T
w
J
6 comments
You can implement custom metadata, some of this like the name of doc is already implemented by default
Plain Text
print(response.source_nodes)
to view them
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/documents_and_nodes/usage_documents.html
this page is about document metadata, but I am more worried about index metadata. specifically, I want a sanity check to reassure myself that I am reloading the index that I think I am!
Ah I see, I haven't tried that myself. Have you looked at these? https://gpt-index.readthedocs.io/en/latest/examples/retrievers/simple_fusion.html
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/storage/customization.html

I assume you're already creating multiple indices with different names? I think this would require some custom implementations
yes, that is correct, for a series of standalone documents, I am creating several indexes for each, and I want to be sure that when I reload the indexes for document A, they are actually the indexes for document A, and not some other indexes from another folder.
Super hacky, but when testing, I just save the entire index map to a local file and open it in VS Code

Plain Text
logger.info(f"Conversation to Index Map... FOUND")
# Save to local txt file
with open(f"convo_id_to_index_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt", "w") as f:
  f.write(json.dumps(convo_id_to_index, indent=4, sort_keys=True, default=custom_serializer))

and built a custom_serializer to unnest objects
Plain Text
def custom_serializer(obj, depth=0, max_depth=4):
    if depth > max_depth:
        return str(obj)  # or other generic handling
    
    if isinstance(obj, (str, int, float, bool, type(None))):
        return obj
    elif isinstance(obj, uuid.UUID):
        return str(obj)
    elif isinstance(obj, dict):
        return {key: custom_serializer(value, depth+1, max_depth) for key, value in obj.items()}
    elif isinstance(obj, list):
        return [custom_serializer(element, depth+1, max_depth) for element in obj]
    # elif isinstance(obj, datetime.datetime): # if `import datetime`
        # return str(obj)
    elif isinstance(obj, datetime): # if `from datetime import datetime`
        return str(obj)
    elif hasattr(obj, '__dict__'):
        return custom_serializer(obj.__dict__, depth+1, max_depth)
    else:
        return str(obj)
Add a reply
Sign up and join the conversation on Discord