Find answers from the community

Updated 3 months ago

Hey there I m new here and I have a

Hey there, I'm new here, and I have a question to learn a little about training and model consumption.

My idea is to create a model (not sure if a model is the best way) based on specific documents, pages, etc.

That "model" would be updated every 2 weeks or so; meanwhile, we can query the "model" to get answers based on the data submitted without creating an index every time. Does this make sense?

What would be the best way to accomplish this?
L
T
4 comments
Yea that makes sense!

With llama index, it uses an existing trained model from openai, and just uses your documents as context that the model "reads" (So no training needed! πŸ’ͺ ).

Something like GPTSimpleVectorIndex is probably a good starting point. It will take your documents, create embeddings for them, and then you can save them to disk.

Then at query time, you can load the index and llama index will fetch the closest top_k documents that match your query.

There's probably some things to tune (I would set the chunk_size_limit in the service context to around 1024), and you can tune the top_k in the query call index.query("my query", similarity_top_k=3, response_mode="compact")
@Tmeister When you say "updated", are you updating existing documents, or just adding more?
Thank you, when I said "updated," I mean both; sometimes, a document can be edited in that 2 weeks range of time, and new documents can be created as well. I will read more about GPTSimpleVectorIndex
There is both insert() and refresh() functions for all the indexes. Or if you have a small number of documents, you could rebuild it (embeddings are only $0.0004/1k tokens, pretty cheap)

Just need to make sure you keep track of the doc_id of each document you plan on updating (since it requires having the same doc_id to update properly)

Plain Text
document = Document("my_text")
document.doc_id = "my doc id"
index.insert(document)
...
document = Document("my_new_text")
document.doc_id = "my doc id"  # same id, new text
index.refresh([document])
Add a reply
Sign up and join the conversation on Discord