Find answers from the community

Updated 8 months ago

Hello! I have been writing my own RAG

At a glance
Hello! I have been writing my own RAG system and thought I would give LlamaIndex a try! I just heard about it a couple days ago.

I see that I can easily ingest data, but I am not sure how to attach metadata. Let's say I have the pages of a book, broken out into individual files. I want to ingest the book so that I can ask questions over a page range of the book. Right now I am doing this manually, where I store the pages in postgres, query and prompt accordingly.

Is LlamaIndex is a good fit for this use case? How would I structure the book / page data?
W
E
6 comments
If your document is a PDF doc then page label is added by default with each document object.

If you are using LlamaParse for ingesting, Try JSON mode and it returns data page wise from there you can form the document object and add page_label to metadata.

Adding metadata to a document is very easy.
Plain Text
from llama_index.core import Document

# Adding metadata while creating document object
doc = Document(text="This is text", metadata = {"key":"value"})

# adding metadata after object is created 
doc.metadata["new_key"] = "new value"
Thanks for the response! The pages would be nice individual files where each file is a page. For my use case it's important that the pages are precise. I am currently parsing so creating those Document objects should be pretty easy. So maybe I do it as you suggest rather than just ingesting from a directory
And I’m guessing I can map that Document to a Postgres table? Or tables?
Will the embeddings and contexts crossing documents be weird or hard to handle?
Not sure what you mean here
If you feed the folder or files from a directory, it will also automatically add page_label to the metadata. If the doc is PDF.
Add a reply
Sign up and join the conversation on Discord