```

At a glance

Plain Text

client = QdrantClient()
vector_store = QdrantVectorStore(client=client, collection_name="my_collection", batch_size=1)

 self.pipeline = IngestionPipeline(
            transformations=[
                SentenceSplitter(chunk_size=Settings.chunk_size, chunk_overlap=Settings.chunk_overlap),
                TitleExtractor(),
                SummaryExtractor(metadata_mode=MetadataMode.ALL, summaries=["prev", "self", "next"]),
                QuestionsAnsweredExtractor(),
                KeywordExtractor(),
                Settings.embed_model,
            ],
            vector_store=vector_store,
            docstore=SimpleDocumentStore(),  # An in-memory store for Document and Node objects
            # by default Local Cache is used, but also can be RedisCache, MongoDBCache, FirestoreCache
            # This saves time on subsequent runs that use the same data
            # cache=IngestionCache(),
        )

# ingest directly into a vector db
self.nodes = self.pipeline.run(documents=self.load_my_pdf())
# create your index
self.index = VectorStoreIndex(nodes=self.nodes, show_progress=True)

I use qdrant, Pipeline extracts most of the metadata that is possible to extract. but the query_engine doesn't consider a piece of metadata information at all.
I mean when User asks a question it doesn't take something from metadata. for example,
User: some question
Agent: some answer
User: what is a file name or document name?
Agent: A file_name is a placeholder for the actual name of a file

what am I doing wrong?

7 comments

TTsovak

Plain Text

        # Create a memory to store the chat history
        memory = ChatMemoryBuffer.from_defaults(llm=Settings.llm, token_limit=4095, chat_store=SimpleChatStore())

        # The simplest case
        query_engine = self.index.as_query_engine(
            # Select the best chat engine based on the current LLM.
            # Corresponds to `OpenAIAgent` if using an OpenAI model that supports
            # function calling API, otherwise, corresponds to `ReActAgent`
            chat_mode=ChatMode.BEST,
            streaming=True,
            vector_store_query_mode=VectorStoreQueryMode.DEFAULT,
            similarity_top_k=3,
            text_qa_template=text_qa_template,
            refine_template=refine_template,
            verbose=True,
        )

        tool = QueryEngineTool.from_defaults(query_engine, name='search', description='My search tool')
        engine = OpenAIAgent.from_tools([tool], memory=memory, verbose=True)

TTsovak

Attachment

LLogan M

File names are pretty darn hard to properly query if you are relying purely on semantic search

TTsovak

do you have any advice how to use metadata info for better answer?

LLogan M

hmm, nothing off the top of my head. But keep in mind this metadata is also available on the response object under response.source_nodes[0].node.metadata for example (source_nodes is a list of NodeWithScore objects that were used to generate the response)

TTsovak

then why do we have all those extractors if it will not be used. what is a usecase?

Plain Text

TitleExtractor(),
SummaryExtractor(),
QuestionsAnsweredExtractor(),
KeywordExtractor(),

LLogan M

Those are mostly meant to be used for retrieval -- they influence the embeddings. Its not perfect but they do make some difference

Add a reply

Find answers from the community

```