Are you running the model on GPU? After loading the model how much space is left as llm model tends to beef up the space while generating tokens. so if less space is there the process will take more time to generate
Nope im running it on a normal laptop with 16 GB RAM, its taking 100% of the RAM and GPU so I am assuming the laptop is the reason this is happening and running it on a GPU should make the response time normal?