I'm using RssReader().load_data([url]) to load an xml file. Currently it's creating a document per line item and each get_content() of the document is only like 169 characters. Is this the optimal way? especially because I'm going to be loading a lot more xml's and expect them to be searchable (also, i want to change what metadata they're automatically using).
Should I basically create my own loader and use my own node parser?
Hmm the first question seems like a time based thing? I.e. Given the last X agendas, what's next?
The second kind of points to something I was mentioning earlier -- extracting some kind of schema across your data beforehand, to enabled something like text2sql
Alternatively, you could implement a keyword search across your documents, that an agent could decide to use? π or something like that
Yea, so like for example defining some kind of structured schema, using a pydantic program or similar to extract that schema, and then inserting into a db
Tbh it's a feature I've been wanting to add to the library at some point, it feels powerful lol