client = QdrantClient()
vector_store = QdrantVectorStore(client=client, collection_name="my_collection", batch_size=1)
self.pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=Settings.chunk_size, chunk_overlap=Settings.chunk_overlap),
TitleExtractor(),
SummaryExtractor(metadata_mode=MetadataMode.ALL, summaries=["prev", "self", "next"]),
QuestionsAnsweredExtractor(),
KeywordExtractor(),
Settings.embed_model,
],
vector_store=vector_store,
docstore=SimpleDocumentStore(), # An in-memory store for Document and Node objects
# by default Local Cache is used, but also can be RedisCache, MongoDBCache, FirestoreCache
# This saves time on subsequent runs that use the same data
# cache=IngestionCache(),
)
# ingest directly into a vector db
self.nodes = self.pipeline.run(documents=self.load_my_pdf())
# create your index
self.index = VectorStoreIndex(nodes=self.nodes, show_progress=True)
I use qdrant, Pipeline extracts most of the metadata that is possible to extract. but the query_engine doesn't consider a piece of metadata information at all.
I mean when User asks a question it doesn't take something from metadata. for example,
User: some question
Agent: some answer
User: what is a file name or document name?
Agent: A file_name is a placeholder for the actual name of a file
what am I doing wrong?