im following the "Retriever Query Engine with Custom Retrievers - Simple Hybrid Search" tutorial with my data it isnt working because I dont have any nodes, all of my data is already in the vector store.
i assume you need to have nodes because in the example they use nodes for SimpleKeywordTableIndex. when I use the index instead of nodes, I get an error.
here is the doc code:
from llama_index.core import SimpleKeywordTableIndex, VectorStoreIndex
so.. how do I get my nodes back out of the index after it is already done? i am using weaviate. if I do index.docstore.docs.values()., I get an empty dictionary
i dont know what to give simpleKeywordTableIndex because I am only getting the index via "vector_store = WeaviateVectorStore( weaviate_client=client, index_name="LatestMetadataChunk512Docstore" )
how am I supposed to get the nodes? code is here: ----- from llama_index.core import StorageContext from llama_index.core import SimpleKeywordTableIndex, VectorStoreIndex
i re-ran my ingestion pipeline and included the insert into docstore code, thinking i could get the nodes out if I could put them in a docstore, but ater I reran the script I dont know how to access that docstore now, i assumed that index.docstore.docs.values() would not be an empty dictionary anymore after that, but it still is.
here is how I added the docstore when constructing the index and inserting it into weaviate: # Process documents in larger batches for j in range(0, len(documents), EMBEDDING_BATCH_SIZE): doc_batch = documents[j:j+EMBEDDING_BATCH_SIZE] chunk_nodes, batch_time = await process_documents_batch(pipelines[gpu_index], doc_batch, gpu_index, chunk_timeout=chunk_timeout) if chunk_nodes: nodes.extend(chunk_nodes) processed_files.update([doc.metadata.get('file_name') for doc in doc_batch])
# Add nodes to docstore docstore.add_documents(chunk_nodes)
logger.info(f"Finished processing batch {j//EMBEDDING_BATCH_SIZE + 1} on GPU {gpu_index} in {batch_time:.2f} seconds. Total nodes: {len(nodes)}") gpu_times[gpu_index] += batch_time gpu_batches[gpu_index] += 1 else: logger.warning(f"Failed to process batch {j//EMBEDDING_BATCH_SIZE + 1} on GPU {gpu_index}")