This is probably a dumb question, or at

AAffableHoneyBadger

This is probably a dumb question, or at the very least a very beginner question. I've been using python to play/test with LangChain, RAG and running an LLM locally (via HuggingFacePipeline/HuggingFaceHub). I'm curious...

How might LlamaIndex relate or fit into this? I mean, is it a competing framework vs. LangChain? I see in the docs it can work with LangChain, so I'm not sure what the purpose of one vs. the other is.
If you use LlamaIndex, can it replace LangChain or is it supposed to work with another framework for the rest of the pipeline? Like, LlamaIndex is focused on connecting to our data sources, but then you pass that info to a LangChain pipeline?
Since I'm looking to run an LLM locally, Ollama has popped onto my radar. In my mental model, I'm not sure where it fits in. I don't think it's a framework like LlamaIndex/LangChain, but so far I haven't needed it (Ollama), so I'm not sure what "problem" it solves beyond using HuggingFacePipeline/HuggingFaceHub to download and use models locally.

Thanks!

9 comments

LLogan M

Its kind of competing? LlamaIndex has a much higher focus on supporting apps that use RAG/context augmentation. We try to make it as easy as possible to get started, and easy to customize. Langchain (and llama-index) also support the concept of tools. This is usually what people mean when they say using them together (i.e. creating a langchain tool for a llama-index query engine, and using that in an agent)

It can definitely replace langchain imo. But as described above, some people prefer using them together

Ollama is just another LLM. Both llamaindex and langchain support using Ollama (and many others) as your LLM in your pipeline

AAffableHoneyBadger

Ollama FEELS more like a docker that only supports specific LLMs. You actually called it "another LLM" - so am I thinking about Ollama incorrectly? Their list of models https://ollama.ai/library makes think of it less like a model and more like a platform for specific LLMs

AAffableHoneyBadger

That said, so far, a coworker has run Ollama, and the responses are MUCH faster than anything I've downloaded via LangChain (regardless of model size). 🤷

LLogan M

Yea, ollama is like an easy way to run several different models locally

When I said Ollama is an LLM, I meant more in the abstraction sense

For example

Plain Text

from llama_index.llms import Ollama
llm = Ollama(model="llama2", request_timeout=300)
llm.complete("Hello!")

AAffableHoneyBadger

Is there a difference between using a model through Ollama vs. using it directly from HF? I'm asking because the Ollama response was so much faster, but you're also more limited by model selection. Plus, outside of the speed, it seems you're adding another service into the chain and relying on them to keep their models up to date, supported, etc.

LLogan M

Ollama is heavily optimized for running without a GPU -- huggingface is not

If you had a GPU, huggingface would likely be faster

LLogan M

Ollama is a wrapper around llama.cpp

LLogan M

It is true, its adding another dependency 🙂 Usuaully I use ollama for local testing, and something more prod-ready for deployment (huggingface, vLLM, TGI)

AAffableHoneyBadger

gotcha. Thanks for all your help @Logan M !

Add a reply

Find answers from the community

This is probably a dumb question, or at