from llama_index.extractors import ( TitleExtractor, MetadataExtractor, )
Supported metadata: Node-level: - `SummaryExtractor`: Summary of each node, and pre and post nodes - `QuestionsAnsweredExtractor`: Questions that the node can answer - `KeywordsExtractor`: Keywords that uniquely identify the node Document-level: - `TitleExtractor`: Document title, possible inferred across multiple nodes
IngestionPipeline
instead of metadata_extractor
and pass the individual extractors from abovemetadata_extractor = MetadataExtractor( extractors=[ TitleExtractor(nodes=5), QuestionsAnsweredExtractor(questions=3), ], ) node_parser = SimpleNodeParser.from_defaults( text_splitter=text_splitter, metadata_extractor=metadata_extractor, ) nodes = node_parser.get_nodes_from_documents(documents)
IngestionPipeline
title_extractor = TitleExtractor(nodes=5) qa_extractor = QuestionsAnsweredExtractor(questions=3) pipeline = IngestionPipeline( transformations=[text_splitter, title_extractor, qa_extractor] ) nodes = pipeline.run( documents=documents, in_place=True, show_progress=True, )
node_parser
as follows:node_parser = SimpleNodeParser.from_defaults( text_splitter=text_splitter, metadata_extractor=metadata_extractor )
service_context = ServiceContext.from_defaults(..., transformations=[text_splitter, TitleExtractor()]) index = VectorStoreIndex.from_documents(documents, service_context=service_context)
title_extractor = TitleExtractor(nodes=5) qa_extractor = QuestionsAnsweredExtractor(questions=3) pipeline = IngestionPipeline( transformations=[text_splitter, title_extractor, qa_extractor] ) nodes = pipeline.run( documents=documents, in_place=True, show_progress=True, ) index = VectorStoreIndex(nodes)
text_splitter
and all the Extractors
you used perviously with MetadataExtractor
into the pipeline
and get the nodes via pipeline.run(...)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
... index.insert()
at places I need laterNone
or not properly initialized for a component