so im currently using llamacpp, but my gpu is a lot stronger than my cpu and i understand that llamacpp predominantly uses cpu (even with CUDA acceleration, its still using both right?)
does llamaindex/langchain support other quantized llamas for gpu, like exllama or gptq? also, if my intended use case is to do a more complex semantic search on documents, would llama be better, or would alpaca be better? llama seems like a text generation model, and alpaca might be better in this sense?