Find answers from the community

Updated last year

are there any changes to node_parsers, I

At a glance
are there any changes to node_parsers, I get this ...
ModuleNotFoundError: No module named 'llama_index.node_parser.extractors'
1
R
W
M
38 comments
yeah, now extractors are in llama_index.extractors
Plain Text
from llama_index.extractors import (
    TitleExtractor,
    MetadataExtractor,
)
got it ... thx
but I still get this ... ImportError: cannot import name 'MetadataExtractor' from 'llama_index.extractors'
is MetadataExtractor present in there?
oh sorry, MetadataExtractor is not there, these are the supported metadata that can be extracted
Plain Text
Supported metadata:
Node-level:
    - `SummaryExtractor`: Summary of each node, and pre and post nodes
    - `QuestionsAnsweredExtractor`: Questions that the node can answer
    - `KeywordsExtractor`: Keywords that uniquely identify the node
Document-level:
    - `TitleExtractor`: Document title, possible inferred across multiple nodes
hmm ... any alt for that?
I was using it at some place already
could modify accordingly
you can use the IngestionPipeline instead of metadata_extractor and pass the individual extractors from above

Before:
Plain Text
metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)
node_parser = SimpleNodeParser.from_defaults(
    text_splitter=text_splitter,
    metadata_extractor=metadata_extractor,
)
nodes = node_parser.get_nodes_from_documents(documents)


Now with IngestionPipeline
Plain Text
title_extractor = TitleExtractor(nodes=5)
qa_extractor = QuestionsAnsweredExtractor(questions=3)
pipeline = IngestionPipeline(
    transformations=[text_splitter, title_extractor, qa_extractor]
)
nodes = pipeline.run(
    documents=documents,
    in_place=True,
    show_progress=True,
)


Hope this helps
qq ... Mendable doesn't seem to hv these updates ... any suggestions on how to get latest updates
You can check this blog for all the latest changes that have been added since v0.9
thx for that ... shall go thru it πŸ‘πŸΌ
while developing I do query sometimes ... just wondering what's the best way
You can check if kapa bot had the latest update or not.

Rest I found dosubot which appears when you raise a issue on GitHub has better grasp on llamaindex code base
can u help plz ... I couldn't figure out the bot somehow on the issues page (or am I missing it)
Did you raise a issue?
no, I didn't raise any issue ... do we mandatorily hv to raise an issue for this purpose ... these r minor stuff only
rest, I can figure out there on
Not required though
Feel free to ask your queries in the channel
can't keep bugging πŸ™‚
It's fine, you can ask as many questions as you want
Meanwhile mendable will also get updated as well as kapa
I think they take sometime for getting updated with the latest doc
let me chk Kapa ... is this on the ask-kapa-gpt-index channel?
thx @WhiteFang_Jr πŸ‘πŸΌ
I prvly defined my node_parser as follows:
Plain Text
node_parser = SimpleNodeParser.from_defaults(
        text_splitter=text_splitter,
        metadata_extractor=metadata_extractor
    )

how do u suggest I change it? seems like the params hv change
Yea it's a bigger change. As @Rohan suggested, you can use an ingestion pipeline, or you can put transformations into a service context

SimpleNodeParser is also removed technically (just aliased to the closest equivilant)

Here's one setup

Plain Text
service_context = ServiceContext.from_defaults(..., transformations=[text_splitter, TitleExtractor()])

index = VectorStoreIndex.from_documents(documents, service_context=service_context)


Or with the new pipeline stuff

Plain Text
title_extractor = TitleExtractor(nodes=5)
qa_extractor = QuestionsAnsweredExtractor(questions=3)
pipeline = IngestionPipeline(
    transformations=[text_splitter, title_extractor, qa_extractor]
)
nodes = pipeline.run(
    documents=documents,
    in_place=True,
    show_progress=True,
)

index = VectorStoreIndex(nodes)
gotta catch upon IngestionPipeline ... thx πŸ‘πŸΌ
@Maverick yeah, with the pipeline you won't need the parser, you put your text_splitter and all the Extractors you used perviously with MetadataExtractor into the pipeline and get the nodes via pipeline.run(...)

or put the transformations in the service context as @Logan M suggested
quite a few changes πŸ‘·πŸΌ
It's not too bad πŸ™‚ And hopefully it makes it clearer a) what is happening under the hood and b) how to customize it further
what could be the reason for ...
AttributeError: 'NoneType' object has no attribute 'start_trace'
... on this ...
index = VectorStoreIndex.from_vector_store(vector_store=vector_store) ...
... btw I set the index initially, and then use index.insert() at places I need later
That's related to the callback manager πŸ€” are you using any custom classes?
some callbacks were there for the app, but none related to this @Logan M
Seems like somewhere the callback manager is either set to None or not properly initialized for a component
Add a reply
Sign up and join the conversation on Discord