Find answers from the community

Updated 3 months ago

MetadataExtraction

hi everyone!, I'm using llamaindex (great library an community by the way congrats!) to build a RAG application for question answering legal text (chilean laws in particular). Laws have a specific structure like: title, paragraph and articles. I would like to add that structure as metadata to the chunk nodes, does anyone faced a similar problem who can give a little guidance?. Thanks in advance

10 comments

EEmanuel Ferreira

https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/index/metadata_extraction.html

EEmanuel Ferreira

May it helps you

aasendra

thanks @Emanuel Ferreira!, I will give it a try

LLogan M

You can also manually set any metadata on nodes/documents when constructing them, and control which metadata is seen by the embeddings model and LLM

https://docs.llamaindex.ai/en/stable/core_modules/data_modules/documents_and_nodes/usage_documents.html#advanced-metadata-customization

aasendra

hi @Logan M , I was able to extract the metadata (law titles from chilean laws) manually, but I don't know how to add them to each nodes.

aasendra

also I'm not sure to use a Document or Node to represent each law title

LLogan M

I was thinking to add the titles to each node that has text from that title

document.metadata = {"law title": "..."}

Or

Document(text="...", metadata={...})

aasendra

that's exactly what I did!

aasendra

now I'm facing a new challenge. Each law is made up of different articles, where one or many of these belongs to a title. In these case (articles), I also need some metadata. So my question is, should I also consider articles as documents or should I treat them as a Nodes?

aasendra

thanks!

Add a reply