Also would like some consistency, and as the main models (gpt-4 e.g.) are constenly being updated I'd rather avoid those.
Or which vector store method should I use if I want to get answers below 10 seconds, but also allow the llm to generate the answer from multiple sources?
probably a better question is, what types of data are you indexing, what types of questions, how much data?
All of those questions influence the approach to take
Thanks! I'm scraping manuals for a financial application. My chatbot should be able to answer questions for which the answer can be found in these manuals. I have 150 text files with an average size of 3 kB.
Also, in these manuals there is an FAQ section. how can I make sure my chatbot is able to answer these more frequently? The question and answer are usually quite small and might get lost in noise due to the chunk-size.
@WhiteFang_Jr Could you maybe help me out further here? π
I think for starter you can try different chunking numbers like 512, 256 or 1024 and see in which case it is giving you better results and important stuffs are not going missed out.
As I see that you are open to use OpenAI, I would suggest you parse your info with llama-parse with the help of GPT-4o. That can also help in extracting details in a better way.
Thanks! and then which embedding model would you suggest?
Oh nvm, this way you use GPT-4o as your "embedding" model as well. Can I also use LlamaParse without a LlamaCloud API-key, so exclusively the openai api-key? I'm implementing it for a company and there is much regulation before I can use the LlamaCloud API with permission
No it will not use GPT-4o as embedding model.
LlamaParse will work on the extraction part only from your text files.
For embedding model, I would suggest you use text-embedding-3-large
it is much better.
After parsing, embeddings generation step takes place and there you'll require the embedding model.
Ah okay thanks, and I just use the standard VectorStoreIndex function for that
For the parsing is the API-key required though?
Yes it is required as parsing happens once the request is authenticated on the cloud
Ah that's unfortunately not an option as of yet. In that case you just suggest to play around with the chunksize of the VectorStoreIndex function?
And then use GPT-4o and text-embedding-3-large
Yeah, this should be able to answer better. Also if you find the responses are not as per your requirements. Try reducing the chunk size. default is 1024
Alright thank you very much for the helpπ Have a good one!
if I have more than 40,000 files which is about 80% 1k, 10% 2-3k, others < 10k, what's the best trunk size?