node_text_template
. In image_1, the node is properly formatted, and metadata keys are excluded as expected. In image_2, they’re not, even though the node._node_content.text_template
is explicitly "text_template": "[Excerpt from document]\n{metadata_str}\nExcerpt:\n-----\n{content}\n-----"
. This means I'm sending the LLMs junk that could mislead it.sub_question_answer_pair.sources
between image_1 and image_2, the only difference is the former seems to be missing _node_content
load_index_from_storage
fails to load indexes that have already been built.check_and_rebuild_indicies_from_vector_store
to rebuild the index if the nodes already exist in the database.rebuilt_index
results in the issues in the main thread.index.storage_context.persist(persist_dir="s3_dir", fs=aws_fsspec) ... storage_context = StorgeContext.from_defaults(persist_dir="s3_dir", fs=aws_fsspec) index = load_index_from_storage(storage_context, service_context=service_context)
check_and_rebuild_indicies_from_vector_store
function feels a little confusing. load_index_from_storage()
?I think I have a cache issue that drops my index_structs, so load_index_from_storage fails to load indexes that have already been built.
if index_id in existing_index_ids: logger.info(f"Found existing index {index_id}") loaded_index = load_index_from_storage( storage_context, index_id=index_id, service_context=service_context, ) indices.append(loaded_index) else: logger.info(f"Could not find existing index {index_id}. Checking if nodes exist in vector store...") rebuilt_index = await check_and_rebuild_indicies_from_vector_store( index_id=index_id, service_context=service_context, fs=fs ) indices.append(rebuilt_index)
CustomPGVectorStore
-- by default, as you probably know, the docstore and index store won't be populated if stores_text=True
on the vector store.store_nodes_override=True
in the vector index constructor.store_nodes_override=True
when creating your indexindex = VectorStoreIndex.from_documents( documents, service_context=service_context, storage_context=StorageContext.from_defaults(vector_store=vector_store), store_nodes_override=True ) index.storage_context.persist(...) index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage", vector_store=vector_store))
VectorStoreIndex.from_documents
, then it tries to rebuild the index and creates new embeddingsrebuilt_index = VectorStoreIndex.from_vector_store
works, but idk if it has store_nodes_override
store_nodes_override
only matters when indexing new datafrom_vector_store
was really only meant for remote vector dbs. Although it kind of works here, the preffered way to do it is load_index_from_storage()
or alternatively VectoreStoreIndex([], storage_context=storage_context, service_context=service_context)
from_vector_store
index = VectorStoreIndex.from_documents( documents, service_context=service_context, storage_context=StorageContext.from_defaults(vector_store=vector_store), store_nodes_override=True ) index.storage_context.persist(...) index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage", vector_store=vector_store))
load_index_from_storage
[TextNode]
, saving it as all_vectors
, and passing that directly when rebuilding the index. The issue is, when casting the query results as TextNodes, the old metadata gets re-wrapped as new metadata, so every N times I was query-saving-casting nodes, the metadata would get nested N times. That's why on >=2 chat calls, the text_template
and excluded_llm_metadata_keys
weren't being honored -- they weren't in the right place!metadata_dict_to_node
:rebuilt_nodes = [] for vector in all_vectors: node = metadata_dict_to_node(vector.metadata, vector.text) rebuilt_nodes.append(node) logger.info(f"Rebuilt {len(rebuilt_nodes)} nodes from the vector_store")
rebuilt_nodes
to the docstore
, use that to create a new storage_context
, and use that to rebuild the index from from_vector_store
😮💨.from_documents
/ store_nodes_override
/ load_index_from_storage
🥳