Find answers from the community

Updated 3 months ago

@Logan M @WhiteFang_Jr is this code up-

is this code up-to date
Plain Text
from llama_index.node_parser.extractors import (
    MetadataExtractor,
    QuestionsAnsweredExtractor,
    TitleExtractor
)
from llama_index.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser

text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=20)

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)

node_parser = SimpleNodeParser(
    text_splitter = text_splitter,
    metadata_extractor=metadata_extractor
)
1
W
a
t
20 comments
Plain Text
from llama_index.extractors  import (
    MetadataExtractor,
    QuestionsAnsweredExtractor,
    TitleExtractor
)
from llama_index.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser

text_splitter = TokenTextSplitter(separator=" ", chunk_size=512, chunk_overlap=20)

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5),
        QuestionsAnsweredExtractor(questions=3),
    ],
)

node_parser = SimpleNodeParser(
    text_splitter = text_splitter,
    metadata_extractor=metadata_extractor
)
gives error
Plain Text
ImportError                               Traceback (most recent call last)
<ipython-input-9-29ea0ec25abc> in <cell line: 1>()
----> 1 from llama_index.extractors  import (
      2     MetadataExtractor,
      3     QuestionsAnsweredExtractor,
      4     TitleExtractor
      5 )

ImportError: cannot import name 'MetadataExtractor' from 'llama_index.extractors' (/usr/local/lib/python3.10/dist-packages/llama_index/extractors/__init__.py)
I'm also not able find MetadataExtractor but found one tutorial for metadata extraction here: https://docs.llamaindex.ai/en/stable/examples/metadata_extraction/MetadataExtractionSEC.html
is this correct? @WhiteFang_Jr @Logan M
Attachment
Screenshot_2024-01-05_at_4.10.13_PM.png
What is the advantages of this? how to use Agent tools with it? @WhiteFang_Jr @Logan M
this is very bad docs 😦
@WhiteFang_Jr @Logan M is there alternative solution for MEtadata Extractor?
well did it work ? x)
it does, but question, why use all these parsers and stuff when i can get the same result without it
The metadata that the extractors generated is also attached to the node.

By default, this means that the metadata is included in embeddings. This can bias the embeddings and help with retrieval.

Mostly, I would only expect this to make a difference though if you had a large number of docs
What do you mean alternative? What you had there is indeed how it works
So what I shared in snapshot is enough ? , how can i integrate with agent tools?
I mean, it depends what you want to use it for? It's really intended for processing nodes before inserting them into an index.

What do you want an agent to do with it?
I was trying to integrate docs and images to include metadata for creating a chatbot which recommends based on user input
To some extent , love to see if there is an example with gradio or streamlit too see how they interact with documents and stored images and videos
Additionally, I saw an example to combine with data from the web on llamaindex
Add a reply
Sign up and join the conversation on Discord