Do you work for llamaindex or are you only contributor ?
As someone who is trying to learn llamaindex and start building llms with custom input it is good source of info.
@LoLiPoPMaN Hey! I'm on the same page of exploration (Ollama + local LLMs), so If you wish, we can be friends and share the knowledge 🙂
@LoLiPoPMaN Heads up, I just sent you a quick DM
Hey all, I’m also in the same boat and would love to collaborate on the topic of running a no -OpenAI setup locally with open source llm to query local data sources. Would you mind including me in the conversation @LoLiPoPMaN, @pikachu8887867 @Seldo ?
Thanks in advance
We are as of now only having convesation here. Feel free to join in on our discussion. Did anyone tried to run Mistral 7B ? As is now the best open source model.
@LoLiPoPMaN Yes, I've tried. What can I say... For a general conversation it is pretty good, but my goal is RAG and I was not able to fully evaluate it's quality.
However, regarding Ollama in general: I don't know why, but the responses are super slow... I'm using 128gb ram, and I don't know what to do in order to improve the performance
Are you using M processor or Nvidia ?
@LoLiPoPMaN I'm using Nvidia (I'm on Windows WSL)
but I think I will use CPU only, because we don't have a GPU on our production server
Same here ( I have RTX 3080).
@LoLiPoPMaN Had you tried chatting?
Yes it works really well and fast
But i used nvidia toolkit
@LoLiPoPMaN Now I'm reading the trace of ollama and trying to understand If it is using my gpu or not, because I'm not sure If it does. Btw, are you using ollama from WSL or in Windows itself?
I used the ubuntu distro. I then installed docker and nvidia toolkit... then i ran Ollama and pulled the model and it worked no problem
I followed ollama documentation
It is not utilising my GPU for some reason..
I think it is running on docker anyways, isn't it?
I used ollama on my mac (intel) so it's probably not using GPU. My goal is also a simple POC: use Llamaindx to implement a RAG scenario where I can have a conversation about a local structured data (in a csv for now). I want to be able to ask a question that returns a list of products that meet a certain criteria (e.g. "what products are available in red", or "what products are available in blue that are between $2-3"). I've tried to do a sandwich model (get the input, pass it through a LLM and extract intent and parameters, then query my db, then ask another LLM to generate a response with the context of the result of the query). It works, but I feel it's not very flexible (I have to maintain a growing list of intents and examples of query->intent mapping). I'm now trying to run ollama and llamaindex and a model that allows me to have the narrow conversation without the middle part. It feels I'm close, but I keep going down rabbit holes 🙂
@Ramin Hi! Sorry wym by middle part?
My main concern about local models is that they are super slow 😢
At least in my case. Maybe I’m doing something wrong. And at least for RAG
the middle part being the query of the database... top part transforms the input from user to some intent and parameters, for example "intent=product_query, params=["color":"red", "price":"2-3"]" then the middle part would run the query against the sql or local file (csv) or whatever to return some values that would be fed into the lower LLM which would generate an output that I would send back to the user
@Ramin Okay, I did not get this part: « narrow down the conversation without the middle part ».
So you transform a user query in the top part and send it directly to LLM and receive an answer without extracting context from your sources?
in this case it won’t be RAG app, but a normal user —> LLM flow, but why do you need to transform your query then?
Let me clarify...
user input --> LLM1 (with prompt such as "Given this text {input}, what is the intention and parameters")
LLM1 output (which is intent and query params, e.g. "intent=product inquiry, params=['color': 'red']") goes to my db query
db query result --> LLM2 (with prompt such as "given this text {db query result}, return the result in a friendly format as part of the conversation")
LLM2 output --> user
I believe that's a typical sandwich RAG model
In other words, I can't fit all my product listings into the context window, and if I normalize my structured info (csv columns) into one long text per product, it wont necessarily return good results (e.g. it wont be able to do math on it such as "products that cost between $2-4"), so I have to extract intent, and parameters so I can do my query to return exact product listings.
I haven't been able to find a better solution other than this, but if you have ideas, please share!
LlamaIndex also offer SQLAgent if i am not mistaken....Would it be more practical to insert csv into SQL DB and then query that ?
Just an observation. Other than that for data interpretation this sure seems as an extreamly useful use case.
yea good observation. That was my next rabbit hole journey 🙂
Do you guys use any vector databases ? Does it improve the speed and quality of retrieval ?
I use qdrant. It is fast. Regarding quality, it depends on your chunks and other factors