CUBLAS_STATUS_NOT_SUPPORTED
error when trying to make a query. It's really weird because i'm able to use transformers
to run this same GPTQ model without llama-index, but running it within llama-index is giving me this error.RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)
model_name
and device_map="auto"
HuggingFaceLLM(..., model=model)