so im currently using llamacpp but my

Homosexual Toaster · 2023-06-22T07:57:22.368Z

so im currently using llamacpp, but my gpu is a lot stronger than my cpu and i understand that llamacpp predominantly uses cpu (even with CUDA acceleration, its still using both right?) does llamaindex/langchain support other quantized llamas for gpu, like exllama or gptq? also, if my intended use case is to do a more complex semantic search on documents, would llama be better, or would alpaca be better? llama seems like a text generation model, and alpaca might be better in this sense?

You can use any model supported by Langchain, just by passing in the llm into our LLMPredictor class

llm_predictor = LLMPredictor(llm=langchain_llm)

You can also use any model from huggingface using our huggingface wrapper:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-huggingface-llm

Aaaand you can also implement a custom class, to implement actually anything:
https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-using-a-custom-llm-model-advanced

Hard to say which would be better. Most open-source models are... gpt-3.5 level at best (i.e. falcon 40b). I always look at the leaderboard lol
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

Find answers from the community

so im currently using llamacpp but my