@Logan M @WhiteFang_Jr is this code up-

At a glance

is this code up-to date

Plain Text

from llama_index.node_parser.extractors import (
    MetadataExtractor,
    QuestionsAnsweredExtractor,
    TitleExtractor
)
from llama_index.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser

text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=20)

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)

node_parser = SimpleNodeParser(
    text_splitter = text_splitter,
    metadata_extractor=metadata_extractor
)

20 comments

WWhiteFang_Jr

Yea the import for extractors have moved:
This might help
https://discord.com/channels/1059199217496772688/1192635267832614995/1192651009567227914

aandysingal

Plain Text

from llama_index.extractors  import (
    MetadataExtractor,
    QuestionsAnsweredExtractor,
    TitleExtractor
)
from llama_index.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser

text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=20)

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)

node_parser = SimpleNodeParser(
    text_splitter = text_splitter,
    metadata_extractor=metadata_extractor
)

gives error

Plain Text

ImportError                               Traceback (most recent call last)
<ipython-input-9-29ea0ec25abc> in <cell line: 1>()
----> 1 from llama_index.extractors  import (
      2     MetadataExtractor,
      3     QuestionsAnsweredExtractor,
      4     TitleExtractor
      5 )

ImportError: cannot import name 'MetadataExtractor' from 'llama_index.extractors' (/usr/local/lib/python3.10/dist-packages/llama_index/extractors/__init__.py)

aandysingal

@Logan M

aandysingal

@WhiteFang_Jr

WWhiteFang_Jr

I'm also not able find MetadataExtractor but found one tutorial for metadata extraction here: https://docs.llamaindex.ai/en/stable/examples/metadata_extraction/MetadataExtractionSEC.html

aandysingal

is this correct? @WhiteFang_Jr @Logan M

Attachment

aandysingal

What is the advantages of this? how to use Agent tools with it? @WhiteFang_Jr @Logan M

aandysingal

this is very bad docs 😦

aandysingal

@WhiteFang_Jr @Logan M is there alternative solution for MEtadata Extractor?

ttheoxd

well did it work ? x)

aandysingal

it does, but question, why use all these parsers and stuff when i can get the same result without it

LLogan M

The metadata that the extractors generated is also attached to the node.

By default, this means that the metadata is included in embeddings. This can bias the embeddings and help with retrieval.

Mostly, I would only expect this to make a difference though if you had a large number of docs

LLogan M

What do you mean alternative? What you had there is indeed how it works

aandysingal

So what I shared in snapshot is enough ? , how can i integrate with agent tools?

aandysingal

@Logan M

LLogan M

I mean, it depends what you want to use it for? It's really intended for processing nodes before inserting them into an index.

What do you want an agent to do with it?

aandysingal

I was trying to integrate docs and images to include metadata for creating a chatbot which recommends based on user input

LLogan M

Isn't that just normal multi-modal rag? With maybe a multi modal agent on top?
https://docs.llamaindex.ai/en/stable/examples/multi_modal/mm_agent.html

or
https://docs.llamaindex.ai/en/stable/examples/multi_modal/gemini.html#rd-part-build-multi-modal-rag-for-restaurant-recommendation

aandysingal

To some extent , love to see if there is an example with gradio or streamlit too see how they interact with documents and stored images and videos

aandysingal

Additionally, I saw an example to combine with data from the web on llamaindex

Add a reply

Find answers from the community

@Logan M @WhiteFang_Jr is this code up-