12GB vram right? Google tells me you probably need a 3090 minimum with 8bit quantization 💸
I suspect the community will be working on other methods to load this into smaller GPUs (similar to the recent progress witg llama), so keep on eye on github and whatnot. Still very fresh