Find answers from the community

Updated 3 months ago

Document Management

I have quite a large database. Is there a way i can iterate on changing some of my service_context (e.g. mix and change metadata extractors) without rebuilding the index?

Also, what would be the easist way to push NULL as the embedding. I would like to just do the embedding step in batch with a script on a rented GPU rather than as part of a pipeline.
E
W
d
4 comments
Not super clear if I can do this:

Plain Text
# build original index

service_context = create_service_context(included_metadata_extactors=False)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, service_context=service_context, show_progress=True
)

# change the service context to now include metadata extractors and refresh the index to act on any updates.
service_context = create_service_context(include_metadata_extractors=True)
index.refresh(documents, service_context=service_context)


It looks like refresh only changes if the text changes...
hey @Wizboar, in this case, you might want to just use the lower-level components to build the pipeline yourself. For example, first load documents with loaders, then use the node parser (https://docs.llamaindex.ai/en/stable/core_modules/data_modules/node_parsers/root.html#) to parse documents into nodes, then calling the embedding model to compute embeddings on your own GPU
@disiok why adding nodes to the VectorStoreIndex, I want to pass the nodes saved to jsonl from the nodeparser step.

Am I loading it in as the BaseNode schema or the IndexNode schema?
Add a reply
Sign up and join the conversation on Discord