Find answers from the community

Updated 9 months ago

I hate the overly dependence on ollama

At a glance

The community member expressed frustration with the increasing reliance on third-party software like Ollama and OpenAI in the LlamaIndex documentation, stating that it was easier to set up everything locally in the past. Other community members responded that there is no over-dependence on third-party software, and that integrations have been separated into PyPI packages. They provided examples of how to use custom language models, including GPU-based models, with LlamaIndex. The community members discussed using a Hugging Face sentence transformer model as an embedding model, and how to set up a local language model to use GPU.

Useful resources
I hate the overly dependence on ollama and openai stuff in the docs these days. half a year ago everything could be setup locally, now llamaindex docs is bloated with this 3rd party software and always download embeddings and every little thing online
W
R
11 comments
There is no over dependencies on third party πŸ˜…
Actually all the integrations have been separated as pypi packages.

To use your hosted llm, you can use ollama, openailike.

If you have your own model, you can use custom llm and interact with llm
can you show me the syntax for using my own gpu based model with llamaindex
Settings.llm = LlamaCPP(
model_path="wikibot_models/zephyr-7b-gguf/zephyr-7b-beta.Q2_K.gguf",
#model_path="wikibot_models/gemma-2b/gemma-2b.gguf",
#model_path="wikibot_models/gemma-7b/gemma-7b.gguf",
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 20, "torch_dtype": torch.bfloat16},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)


def readDocs(prompt: str):

start = time.time()

if os.path.isfile("index/docstore.json"):
print("found index at index/docstore.json\n")
storage_context = StorageContext.from_defaults(persist_dir="index")
index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine()
response = query_engine.query(prompt)
print('\nResponse: ',response)

else:
print("create index\n")
documents = SimpleDirectoryReader(input_dir="docs",recursive=True,exclude=(".docx" , ".png" , ".jpeg", ".tmp", ".lnk", ".unk", ".vsdx", ".heic", '.avif', ".so.1")).load_data()
index = VectorStoreIndex.from_documents(documents)

index.storage_context.persist(persist_dir="index")

query_engine = index.as_query_engine()
response = query_engine.query(prompt)
print('\nResponse: ',response)

if hasattr(response, 'metadata'):
document_info = str(response.metadata)
find = re.findall(r"'page_label': '[^']', 'file_name': '[^']'", document_info)

print('\n'+'=' * 60+'\n')
print('Context Information')
print(str(find))
print('\n'+'=' * 60+'\n')

end = time.time()
print("Elapsed Time: ", end-start, " Sekunden")


I have this one working right now, but i want to use gpu heavy models, not llamacpp
This is a embed model
Yes you can use this in llamaindex: https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/
follow the doc and it will download the model locally to your system
just replace the embedding path with yours
but i will need an embedding model and a normal llm too right?
how can i define a normal llm to only use GPU
Yes if you are going local and not depending upon OpenAI then yes.

for llm also you can use the following doc to setup llm locally: https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/
Add a reply
Sign up and join the conversation on Discord