LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Does anyone runs ollama locally for

Does anyone runs ollama locally for

At a glance

·

Does anyone runs ollama locally for llamaindex use ? Or the best practice is to just use OpenAI GPT models ?

2

W

L

p

46 comments

OpenAI models gives the best response in comaprison to other LLMs out other but it comes with a price and sesnse of no data security.

However there are few LLMs that are performing very well in some of the cases like QnA RAG,Pydantic etc.

LlamaIndex has tested some of the opensource LLMs and some paid LLMs other OpenAI for these kind of compatibility testing: https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#llm-compatibility-tracking

Thanks for response!

Do you work for llamaindex or are you only contributor ?

https://pythonwarriors.com/ is a sick website! Found it in your description.

Haha thank you 😁

As someone who is trying to learn llamaindex and start building llms with custom input it is good source of info.

That's good to hear

ppikachu8887867

@LoLiPoPMaN Hey! I'm on the same page of exploration (Ollama + local LLMs), so If you wish, we can be friends and share the knowledge 🙂

Happy to meet you!

@LoLiPoPMaN Heads up, I just sent you a quick DM

Hey all, I’m also in the same boat and would love to collaborate on the topic of running a no -OpenAI setup locally with open source llm to query local data sources. Would you mind including me in the conversation @LoLiPoPMaN, @pikachu8887867 @Seldo ?
Thanks in advance

We are as of now only having convesation here. Feel free to join in on our discussion. Did anyone tried to run Mistral 7B ? As is now the best open source model.

ppikachu8887867

@LoLiPoPMaN Yes, I've tried. What can I say... For a general conversation it is pretty good, but my goal is RAG and I was not able to fully evaluate it's quality.

However, regarding Ollama in general: I don't know why, but the responses are super slow... I'm using 128gb ram, and I don't know what to do in order to improve the performance

Are you using M processor or Nvidia ?

ppikachu8887867

@LoLiPoPMaN I'm using Nvidia (I'm on Windows WSL)

ppikachu8887867

but I think I will use CPU only, because we don't have a GPU on our production server

Same here ( I have RTX 3080).

ppikachu8887867

@LoLiPoPMaN Had you tried chatting?

Yes it works really well and fast

But i used nvidia toolkit

so it uses my gpu

ppikachu8887867

@LoLiPoPMaN Now I'm reading the trace of ollama and trying to understand If it is using my gpu or not, because I'm not sure If it does. Btw, are you using ollama from WSL or in Windows itself?

I used the ubuntu distro. I then installed docker and nvidia toolkit... then i ran Ollama and pulled the model and it worked no problem

I followed ollama documentation

So WSL

ppikachu8887867

Ok same

ppikachu8887867

It is not utilising my GPU for some reason..

I used this : https://ollama.ai/blog/ollama-is-now-available-as-an-official-docker-image

ppikachu8887867

Ah, wait...

ppikachu8887867

I used this

Attachment

ppikachu8887867

I think it is running on docker anyways, isn't it?

I used ollama on my mac (intel) so it's probably not using GPU. My goal is also a simple POC: use Llamaindx to implement a RAG scenario where I can have a conversation about a local structured data (in a csv for now). I want to be able to ask a question that returns a list of products that meet a certain criteria (e.g. "what products are available in red", or "what products are available in blue that are between $2-3"). I've tried to do a sandwich model (get the input, pass it through a LLM and extract intent and parameters, then query my db, then ask another LLM to generate a response with the context of the result of the query). It works, but I feel it's not very flexible (I have to maintain a growing list of intents and examples of query->intent mapping). I'm now trying to run ollama and llamaindex and a model that allows me to have the narrow conversation without the middle part. It feels I'm close, but I keep going down rabbit holes 🙂

ppikachu8887867

@Ramin Hi! Sorry wym by middle part?

ppikachu8887867

My main concern about local models is that they are super slow 😢

At least in my case. Maybe I’m doing something wrong. And at least for RAG

the middle part being the query of the database... top part transforms the input from user to some intent and parameters, for example "intent=product_query, params=["color":"red", "price":"2-3"]" then the middle part would run the query against the sql or local file (csv) or whatever to return some values that would be fed into the lower LLM which would generate an output that I would send back to the user

ppikachu8887867

@Ramin Okay, I did not get this part: « narrow down the conversation without the middle part ».

So you transform a user query in the top part and send it directly to LLM and receive an answer without extracting context from your sources?

ppikachu8887867

in this case it won’t be RAG app, but a normal user —> LLM flow, but why do you need to transform your query then?

Let me clarify...
user input --> LLM1 (with prompt such as "Given this text {input}, what is the intention and parameters")
LLM1 output (which is intent and query params, e.g. "intent=product inquiry, params=['color': 'red']") goes to my db query
db query result --> LLM2 (with prompt such as "given this text {db query result}, return the result in a friendly format as part of the conversation")
LLM2 output --> user

I believe that's a typical sandwich RAG model

In other words, I can't fit all my product listings into the context window, and if I normalize my structured info (csv columns) into one long text per product, it wont necessarily return good results (e.g. it wont be able to do math on it such as "products that cost between $2-4"), so I have to extract intent, and parameters so I can do my query to return exact product listings.

I haven't been able to find a better solution other than this, but if you have ideas, please share!

LlamaIndex also offer SQLAgent if i am not mistaken....Would it be more practical to insert csv into SQL DB and then query that ?

Just an observation. Other than that for data interpretation this sure seems as an extreamly useful use case.

yea good observation. That was my next rabbit hole journey 🙂

Do you guys use any vector databases ? Does it improve the speed and quality of retrieval ?

ppikachu8887867

I use qdrant. It is fast. Regarding quality, it depends on your chunks and other factors

Add a reply

Sign up and join the conversation on Discord