I hate the overly dependence on ollama

At a glance

The community member expressed frustration with the increasing reliance on third-party software like Ollama and OpenAI in the LlamaIndex documentation, stating that it was easier to set up everything locally in the past. Other community members responded that there is no over-dependence on third-party software, and that integrations have been separated into PyPI packages. They provided examples of how to use custom language models, including GPU-based models, with LlamaIndex. The community members discussed using a Hugging Face sentence transformer model as an embedding model, and how to set up a local language model to use GPU.

Useful resources

RRickler

I hate the overly dependence on ollama and openai stuff in the docs these days. half a year ago everything could be setup locally, now llamaindex docs is bloated with this 3rd party software and always download embeddings and every little thing online

11 comments

WWhiteFang_Jr

There is no over dependencies on third party 😅
Actually all the integrations have been separated as pypi packages.

To use your hosted llm, you can use ollama, openailike.

If you have your own model, you can use custom llm and interact with llm

RRickler

can you show me the syntax for using my own gpu based model with llamaindex

RRickler

Settings.llm = LlamaCPP(
model_path="wikibot_models/zephyr-7b-gguf/zephyr-7b-beta.Q2_K.gguf",
#model_path="wikibot_models/gemma-2b/gemma-2b.gguf",
#model_path="wikibot_models/gemma-7b/gemma-7b.gguf",
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 20, "torch_dtype": torch.bfloat16},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)

def readDocs(prompt: str):

start = time.time()

if os.path.isfile("index/docstore.json"):
print("found index at index/docstore.json\n")
storage_context = StorageContext.from_defaults(persist_dir="index")
index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine()
response = query_engine.query(prompt)
print('\nResponse: ',response)

else:
print("create index\n")
documents = SimpleDirectoryReader(input_dir="docs",recursive=True,exclude=(".docx" , ".png" , ".jpeg", ".tmp", ".lnk", ".unk", ".vsdx", ".heic", '.avif', ".so.1")).load_data()
index = VectorStoreIndex.from_documents(documents)

index.storage_context.persist(persist_dir="index")

query_engine = index.as_query_engine()
response = query_engine.query(prompt)
print('\nResponse: ',response)

if hasattr(response, 'metadata'):
document_info = str(response.metadata)
find = re.findall(r"'page_label': '[^']', 'file_name': '[^']'", document_info)

print('\n'+'=' * 60+'\n')
print('Context Information')
print(str(find))
print('\n'+'=' * 60+'\n')

end = time.time()
print("Elapsed Time: ", end-start, " Sekunden")

I have this one working right now, but i want to use gpu heavy models, not llamacpp

RRickler

https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2

can i use this one with llamaindex?

WWhiteFang_Jr

This is a embed model

WWhiteFang_Jr

Yes you can use this in llamaindex: https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/
follow the doc and it will download the model locally to your system

WWhiteFang_Jr

just replace the embedding path with yours

RRickler

thank you

RRickler

but i will need an embedding model and a normal llm too right?

RRickler

how can i define a normal llm to only use GPU

WWhiteFang_Jr

Yes if you are going local and not depending upon OpenAI then yes.

for llm also you can use the following doc to setup llm locally: https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/

Add a reply

Find answers from the community

I hate the overly dependence on ollama