If you want to use RAG:
- Use some library like BeautifulSoup to scrape your websites.
- Filter your data appropriately.
- Convert it to Markdown or use the
HTMLNodeParser
directly. - Create your
Document
objects and build your VectorStore
on top of it. - Enjoy!
(You can do a lot of extras to improve the performance, but this should give you a baseline to work on)