- Weaviate (and other vector store integrations) are mostly needed when either a) your index is very large and the default in-memory solutions starts to slow down and b) yes for hosted storage
- It really is just text, but with some extra sugar on top (i.e. metadata) that some loaders setup for you. Really you can skip loaders and create your own
Document
objects if you wanted. One thing to keep in mind is that documents are broken into chunks when put inside an index. And those chunks/nodes inherit the metadata of the source document
Not sure what you mean by most effective. A lot of stuff depends on your data and use case. But in general, tossing a bunch of documents into an index and querying will get you pretty far. If you have a lot of documents, you can adjust the top k, or consider more complex query engines with multiple indexes (router query engines, sub question, sql query engines, agents, etc.)