Find answers from the community

Updated 2 years ago

so im currently using llamacpp but my

so im currently using llamacpp, but my gpu is a lot stronger than my cpu and i understand that llamacpp predominantly uses cpu (even with CUDA acceleration, its still using both right?)

does llamaindex/langchain support other quantized llamas for gpu, like exllama or gptq? also, if my intended use case is to do a more complex semantic search on documents, would llama be better, or would alpaca be better? llama seems like a text generation model, and alpaca might be better in this sense?
L
1 comment
You can use any model supported by Langchain, just by passing in the llm into our LLMPredictor class

llm_predictor = LLMPredictor(llm=langchain_llm)

You can also use any model from huggingface using our huggingface wrapper:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-huggingface-llm

Aaaand you can also implement a custom class, to implement actually anything:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-custom-llm-model-advanced

Hard to say which would be better. Most open-source models are... gpt-3.5 level at best (i.e. falcon 40b). I always look at the leaderboard lol
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
Add a reply
Sign up and join the conversation on Discord