Find answers from the community

Updated last year

Hello

At a glance

Hello!

I'm using RssReader().load_data([url]) to load an xml file. Currently it's creating a document per line item and each get_content() of the document is only like 169 characters. Is this the optimal way? especially because I'm going to be loading a lot more xml's and expect them to be searchable (also, i want to change what metadata they're automatically using).

Should I basically create my own loader and use my own node parser?

14 comments

LLogan M

Sounds like you might need to create your own reader 😅 It's a little hard to generalize a loader across all types of XML

Happy to help write one with you, but it shouldn't be too bad. Could use the existing RSS reader as a reference
https://github.com/emptycrown/llama-hub/blob/main/llama_hub/web/rss/base.py

bbmax

ok that's what i thought.

bbmax

so @Logan M if each document is a line item in agenda

bbmax

and I have 1000 agendas...

bbmax

and I want to ask a question like "What items are most likely to be in the next meeting" or something or "Give me all items that involve construction"

bbmax

A vector db is correct usage but top_k_similarity = 5 is not going to be helpful

bbmax

because there's 1000's of line items?

LLogan M

Hmm the first question seems like a time based thing? I.e. Given the last X agendas, what's next?

The second kind of points to something I was mentioning earlier -- extracting some kind of schema across your data beforehand, to enabled something like text2sql

Alternatively, you could implement a keyword search across your documents, that an agent could decide to use? 😅 or something like that

bbmax

by text2sql, do you mean converting the documents and inserting them into sql

bbmax

so we can do actual queries on them.

LLogan M

Yea, so like for example defining some kind of structured schema, using a pydantic program or similar to extract that schema, and then inserting into a db

Tbh it's a feature I've been wanting to add to the library at some point, it feels powerful lol

bbmax

yeah..... interesting but also in my case if it's xml I could just insert the agenda items right into sql

bbmax

and then maybe use openai to add keywords about the industry the line items pertain to

LLogan M

That too!

Add a reply