Find answers from the community

Updated 7 months ago

I hate the overly dependence on ollama

I hate the overly dependence on ollama and openai stuff in the docs these days. half a year ago everything could be setup locally, now llamaindex docs is bloated with this 3rd party software and always download embeddings and every little thing online
W
R
11 comments
There is no over dependencies on third party πŸ˜…
Actually all the integrations have been separated as pypi packages.

To use your hosted llm, you can use ollama, openailike.

If you have your own model, you can use custom llm and interact with llm
can you show me the syntax for using my own gpu based model with llamaindex
Settings.llm = LlamaCPP(
model_path="wikibot_models/zephyr-7b-gguf/zephyr-7b-beta.Q2_K.gguf",
#model_path="wikibot_models/gemma-2b/gemma-2b.gguf",
#model_path="wikibot_models/gemma-7b/gemma-7b.gguf",
temperature=0.1,
max_new_tokens=256,
context_window=3900,
generate_kwargs={},
model_kwargs={"n_gpu_layers": 20, "torch_dtype": torch.bfloat16},
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)


def readDocs(prompt: str):

start = time.time()

if os.path.isfile("index/docstore.json"):
print("found index at index/docstore.json\n")
storage_context = StorageContext.from_defaults(persist_dir="index")
index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine()
response = query_engine.query(prompt)
print('\nResponse: ',response)

else:
print("create index\n")
documents = SimpleDirectoryReader(input_dir="docs",recursive=True,exclude=(".docx" , ".png" , ".jpeg", ".tmp", ".lnk", ".unk", ".vsdx", ".heic", '.avif', ".so.1")).load_data()
index = VectorStoreIndex.from_documents(documents)

index.storage_context.persist(persist_dir="index")

query_engine = index.as_query_engine()
response = query_engine.query(prompt)
print('\nResponse: ',response)

if hasattr(response, 'metadata'):
document_info = str(response.metadata)
find = re.findall(r"'page_label': '[^']', 'file_name': '[^']'", document_info)

print('\n'+'=' * 60+'\n')
print('Context Information')
print(str(find))
print('\n'+'=' * 60+'\n')

end = time.time()
print("Elapsed Time: ", end-start, " Sekunden")


I have this one working right now, but i want to use gpu heavy models, not llamacpp
This is a embed model
Yes you can use this in llamaindex: https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/
follow the doc and it will download the model locally to your system
just replace the embedding path with yours
but i will need an embedding model and a normal llm too right?
how can i define a normal llm to only use GPU
Yes if you are going local and not depending upon OpenAI then yes.

for llm also you can use the following doc to setup llm locally: https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/
Add a reply
Sign up and join the conversation on Discord