Find answers from the community

Updated 3 months ago

im following the "Retriever Query Engine

im following the "Retriever Query Engine with Custom Retrievers - Simple Hybrid Search" tutorial with my data
it isnt working because I dont have any nodes, all of my data is already in the vector store.


i assume you need to have nodes because in the example they use nodes for SimpleKeywordTableIndex. when I use the index instead of nodes, I get an error.

here is the doc code:

from llama_index.core import SimpleKeywordTableIndex, VectorStoreIndex

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)

so.. how do I get my nodes back out of the index after it is already done?
i am using weaviate.
if I do index.docstore.docs.values()., I get an empty dictionary

i dont know what to give simpleKeywordTableIndex because I am only getting the index via
"vector_store = WeaviateVectorStore(
weaviate_client=client, index_name="LatestMetadataChunk512Docstore"
)

loaded_index = VectorStoreIndex.from_vector_store(vector_store)"

how am I supposed to get the nodes? code is here:
-----
from llama_index.core import StorageContext
from llama_index.core import SimpleKeywordTableIndex, VectorStoreIndex

initialize storage context (by default it's in-memory)

storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(loaded_nodes)

vector_index = VectorStoreIndex(nodes, storage_context=storage_context)
keyword_index = SimpleKeywordTableIndex(nodes, storage_context=storage_context)
-----

i re-ran my ingestion pipeline and included the insert into docstore code, thinking i could get the nodes out if I could put them in a docstore, but ater I reran the script I dont know how to access that docstore now, i assumed that index.docstore.docs.values() would not be an empty dictionary anymore after that, but it still is.
k
1 comment
here is how I added the docstore when constructing the index and inserting it into weaviate:
# Process documents in larger batches
for j in range(0, len(documents), EMBEDDING_BATCH_SIZE):
doc_batch = documents[j:j+EMBEDDING_BATCH_SIZE]
chunk_nodes, batch_time = await process_documents_batch(pipelines[gpu_index], doc_batch, gpu_index, chunk_timeout=chunk_timeout)
if chunk_nodes:
nodes.extend(chunk_nodes)
processed_files.update([doc.metadata.get('file_name') for doc in doc_batch])

# Add nodes to docstore
docstore.add_documents(chunk_nodes)

logger.info(f"Finished processing batch {j//EMBEDDING_BATCH_SIZE + 1} on GPU {gpu_index} in {batch_time:.2f} seconds. Total nodes: {len(nodes)}")
gpu_times[gpu_index] += batch_time
gpu_batches[gpu_index] += 1
else:
logger.warning(f"Failed to process batch {j//EMBEDDING_BATCH_SIZE + 1} on GPU {gpu_index}")
Add a reply
Sign up and join the conversation on Discord