Find answers from the community

Updated 2 months ago

I am trying to implement a sentence window retriever. In the exemple, the indexation of t

I am trying to implement a sentence window retriever.
In the exemple, the indexation of the document and the retriever are created in the same instance. So the output of the indexer is sentence_index and it is passed to the retriever.
However in my use case, I would like to create the sentence_index in one instance and the retriever on another one.
How can I import the sentence_index for the vector_store ?
k
t
L
4 comments
Please stand by @tatanfort as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To create the sentence index in one instance and the retriever in another, you would need to save the sentence index after creation and then load it in the instance where you want to create the retriever.

However, based on the provided knowledge sources, there isn't a direct example of saving and loading a sentence index in LlamaIndex. But, there are examples of creating a VectorStoreIndex from various sources, which might be similar to what you're trying to achieve.

Here's how you can create a VectorStoreIndex from documents with sentence splitting transformation:

Plain Text
from llama_index.core import VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=256)
index = VectorStoreIndex.from_documents(documents, transformations=[splitter])

Source: Reciprocal Rerank Fusion Retriever Setup ()

And here's how you can create a VectorStoreIndex from an existing vector store:

Plain Text
vector_store = PineconeVectorStore(pinecone.Index("quickstart"))
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

Source: Vector Store Index usage examples ()

You might be able to adapt these examples to your use case. However, for the exact method of saving and loading a sentence index, you might need to refer to the LlamaIndex documentation or source code, as it's not covered in the provided knowledge sources.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Here is an exemple of the implementation I try to replicate :

def build_sentence_window_index(
document, llm, vector_store, embed_model="local:BAAI/bge-small-en-v1.5"
):
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
sentence_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
node_parser=node_parser
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
sentence_index = VectorStoreIndex.from_documents(
[document], service_context=sentence_context, storage_context=storage_context
)

return sentence_index

def get_sentence_window_query_engine(
sentence_index,
similarity_top_k=6,
rerank_top_n=2,
):
postproc = MetadataReplacementPostProcessor(target_metadata_key="window")
rerank = SentenceTransformerRerank(
top_n=rerank_top_n, model="BAAI/bge-reranker-base"
)

sentence_window_engine = sentence_index.as_query_engine(
similarity_top_k=similarity_top_k, node_postprocessors=[postproc, rerank]
)
return sentence_window_engine

sentence_index = build_sentence_window_index(
document,
llm,
embed_model="local:BAAI/bge-small-en-v1.5",
vector_store=vector_store
)

query_engine = get_sentence_window_query_engine(sentence_index=sentence_index)


But I want to separate those two actions : indexation and retrieving.
@Bob @Logan M any idea?
I don't really see the issue, your code seems fine to me?
Add a reply
Sign up and join the conversation on Discord