I got my api working to connect to llama.cpp via the api_like_OAI.py with some help. now I am trying to add a vector store and im a little at a loss. any pointers?
anyone know how to flush the buffer after an upload? after upload and document ingestion is there a way to dump the buffer in python? If I upload and ingest a document then send a query it goes thru the document ingestion again every time unless i reload.
it is downloading another model for some reason... could it be that the Llama2 model I converted with llama.cpp wasn't compatable? also after it spits out a lot of llm stat info I noticed it saying "Could not load OpenAIEmbedding. Using HuggingFaceBgeEmbeddings with model_name=BAAI/bge-small-en. If you intended to use OpenAI, please check your OPENAI_API_KEY." then after downloading tokenizer and other stuff from huggingface it said "Could not load OpenAI model. Using default LlamaCPP=llama2-13b-chat. If you intended to use OpenAI, please check your OPENAI_API_KEY." then started downloading the new model from huggingface. ... 🤦♂️ does that sound about rite? lol