Hello! I have been writing my own RAG

At a glance

Hello! I have been writing my own RAG system and thought I would give LlamaIndex a try! I just heard about it a couple days ago.

I see that I can easily ingest data, but I am not sure how to attach metadata. Let's say I have the pages of a book, broken out into individual files. I want to ingest the book so that I can ask questions over a page range of the book. Right now I am doing this manually, where I store the pages in postgres, query and prompt accordingly.

Is LlamaIndex is a good fit for this use case? How would I structure the book / page data?

6 comments

WWhiteFang_Jr

If your document is a PDF doc then page label is added by default with each document object.

If you are using LlamaParse for ingesting, Try JSON mode and it returns data page wise from there you can form the document object and add page_label to metadata.

Adding metadata to a document is very easy.

Plain Text

from llama_index.core import Document

# Adding metadata while creating document object
doc = Document(text="This is text", metadata = {"key":"value"})

# adding metadata after object is created 
doc.metadata["new_key"] = "new value"

EEarlkonig

Thanks for the response! The pages would be nice individual files where each file is a page. For my use case it's important that the pages are precise. I am currently parsing so creating those Document objects should be pretty easy. So maybe I do it as you suggest rather than just ingesting from a directory

EEarlkonig

And I’m guessing I can map that Document to a Postgres table? Or tables?

EEarlkonig

Will the embeddings and contexts crossing documents be weird or hard to handle?

WWhiteFang_Jr

Not sure what you mean here

WWhiteFang_Jr

If you feed the folder or files from a directory, it will also automatically add page_label to the metadata. If the doc is PDF.

Add a reply

Find answers from the community

Hello! I have been writing my own RAG