LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

are there any changes to node_parsers, I

are there any changes to node_parsers, I

At a glance

The community members discuss changes in the llama_index library, specifically related to the node_parsers and extractors. They note that the extractors are now located in the llama_index.extractors module, and the MetadataExtractor is no longer available. Instead, they suggest using the IngestionPipeline with individual extractors like TitleExtractor and QuestionsAnsweredExtractor. The community members also discuss updating to the latest version of the library and using the Kapa bot or the dosubot on GitHub issues for better support. They provide code examples for using the IngestionPipeline and ServiceContext to handle the changes in the library.

Useful resources

·

are there any changes to node_parsers, I get this ...
ModuleNotFoundError: No module named 'llama_index.node_parser.extractors'

1

R

W

M

38 comments

yeah, now extractors are in llama_index.extractors

Plain Text

from llama_index.extractors import (
    TitleExtractor,
    MetadataExtractor,
)

May this help you: https://blog.llamaindex.ai/announcing-llamaindex-0-9-719f03282945

there are some changes from v0.9

got it ... thx

but I still get this ... ImportError: cannot import name 'MetadataExtractor' from 'llama_index.extractors'

is MetadataExtractor present in there?

oh sorry, MetadataExtractor is not there, these are the supported metadata that can be extracted

Plain Text

Supported metadata:
Node-level:
    - `SummaryExtractor`: Summary of each node, and pre and post nodes
    - `QuestionsAnsweredExtractor`: Questions that the node can answer
    - `KeywordsExtractor`: Keywords that uniquely identify the node
Document-level:
    - `TitleExtractor`: Document title, possible inferred across multiple nodes

hmm ... any alt for that?

I was using it at some place already

could modify accordingly

you can use the IngestionPipeline instead of metadata_extractor and pass the individual extractors from above

Before:

Plain Text

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)
node_parser = SimpleNodeParser.from_defaults(
    text_splitter=text_splitter,
    metadata_extractor=metadata_extractor,
)
nodes = node_parser.get_nodes_from_documents(documents)

Now with IngestionPipeline

Plain Text

title_extractor = TitleExtractor(nodes=5)
qa_extractor = QuestionsAnsweredExtractor(questions=3)
pipeline = IngestionPipeline(
    transformations=[text_splitter, title_extractor, qa_extractor]
)
nodes = pipeline.run(
    documents=documents,
    in_place=True,
    show_progress=True,
)

Hope this helps

qq ... Mendable doesn't seem to hv these updates ... any suggestions on how to get latest updates

You can check this blog for all the latest changes that have been added since v0.9

thx for that ... shall go thru it 👍🏼

while developing I do query sometimes ... just wondering what's the best way

You can check if kapa bot had the latest update or not.

Rest I found dosubot which appears when you raise a issue on GitHub has better grasp on llamaindex code base

can u help plz ... I couldn't figure out the bot somehow on the issues page (or am I missing it)

Did you raise a issue?

no, I didn't raise any issue ... do we mandatorily hv to raise an issue for this purpose ... these r minor stuff only

rest, I can figure out there on

Not required though

Feel free to ask your queries in the channel

can't keep bugging 🙂

It's fine, you can ask as many questions as you want

Meanwhile mendable will also get updated as well as kapa

I think they take sometime for getting updated with the latest doc

let me chk Kapa ... is this on the ask-kapa-gpt-index channel?

Yes

thx @WhiteFang_Jr 👍🏼

I prvly defined my node_parser as follows:

Plain Text

node_parser = SimpleNodeParser.from_defaults(
        text_splitter=text_splitter,
        metadata_extractor=metadata_extractor
    )

how do u suggest I change it? seems like the params hv change

Yea it's a bigger change. As @Rohan suggested, you can use an ingestion pipeline, or you can put transformations into a service context

SimpleNodeParser is also removed technically (just aliased to the closest equivilant)

Here's one setup

Plain Text

service_context = ServiceContext.from_defaults(..., transformations=[text_splitter, TitleExtractor()])

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

Or with the new pipeline stuff

Plain Text

title_extractor = TitleExtractor(nodes=5)
qa_extractor = QuestionsAnsweredExtractor(questions=3)
pipeline = IngestionPipeline(
    transformations=[text_splitter, title_extractor, qa_extractor]
)
nodes = pipeline.run(
    documents=documents,
    in_place=True,
    show_progress=True,
)

index = VectorStoreIndex(nodes)

gotta catch upon IngestionPipeline ... thx 👍🏼

@Maverick yeah, with the pipeline you won't need the parser, you put your text_splitter and all the Extractors you used perviously with MetadataExtractor into the pipeline and get the nodes via pipeline.run(...)

or put the transformations in the service context as @Logan M suggested

quite a few changes 👷🏼

It's not too bad 🙂 And hopefully it makes it clearer a) what is happening under the hood and b) how to customize it further

what could be the reason for ...
AttributeError: 'NoneType' object has no attribute 'start_trace'
... on this ...
index = VectorStoreIndex.from_vector_store(vector_store=vector_store) ...
... btw I set the index initially, and then use index.insert() at places I need later

That's related to the callback manager 🤔 are you using any custom classes?

some callbacks were there for the app, but none related to this @Logan M

Seems like somewhere the callback manager is either set to None or not properly initialized for a component

Add a reply

Sign up and join the conversation on Discord