Find answers from the community

Updated 2 months ago

I m working on a legal document It is

I'm working on a legal document. It is very structured with chapter, paragraph, article.

I'thinking to create one document by article.
I think that the extra_info is not use by the index.

I'm wandering which index is the best index to keep the article atomicity and attachment to his chapter, section etc...
L
s
8 comments
extra_info is definitely used by the index

I'm not sure what you mean by atomicity, like you want to maintain the order?
If I ask a question, I want to find all the legal articles that are related to this question.

A article is in a section which is in a chapter. I want to keep this information also.

Thks for your help
Hmm. If you create your documents as each document is an article, you can store the section and chapter in the extra_info

document = Document('article text', extra_info={'section': section, 'chapter': chapter})
That what I was thinking.
But I'm not sure all the index will use this information. Is extra_info attached to each node ?
I was also thinking to use compostable graph. One index by chapter and put the structure ( chapter 1/ section 1 / paragraph 1 / article 1, 2, 3...) in the summary
Yea the extra-info is attached to each node. So if a document is parsed into 3 nodes, each node will have the same extra info

Then, the response.source_nodes will also show this extra info
We are planning to use local model like vicuna. Also the text is in French.

Do you have advice? What would be for you the best approach with the index ?
local model should be fine! We have a HuggingfaceLLMPredictor which might help a lot (it can be weird to setup, but happy to help if you get stuck)
https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-huggingface-llm

A simple vector index and splitting the text into articles is probably the best approach actually πŸ™‚ If the results aren't great, there is some things to tweak like top k or chunk size. If you move to a graph, the response time will be slower
Add a reply
Sign up and join the conversation on Discord