Find answers from the community

Updated 2 months ago

I am currently using GROQ api with llama

I am currently using GROQ api with llama index to rum LLAMA3 70B model. I want to run this model on linux based server. I downloaded the model but dont know how to load it in llama index/groq. I don't want to change the rest of the code besides loading the LLM part. Is it possible to do this?
L
S
10 comments
groq is not an open-source API? I dont think you actually run groq locally, especially since it needs specific hardware to achieve the speeds it has
For now groq is free but I think pretty soon they will charge for it. I think we have hardware to run the 70B model just need way to load it.
for testing the capabilities of llama3 70B i used groq api. Now that I am satisfied with its performance I want to deploy it. currently I am using llm = Groq(model="llama3-70b-8192", api_key=os.getenv('GROQ_API_KEY')) this to load the model. Is there any way to load my downloaded model using llama_index?
You'd have to use something like VLLM or TGI to host the model instead of groq (note that you'd need like 80GB of VRAM to run the 70b model)
our server have 600+ GB of VRAM so I dont think that would be a problem
look into vllm or TGI then -- llama-index has integrations for both
ok thank you. I will look into that. If you can point me to some resources that would be great
you think I should my own API using vllm?
Yea that would make sense to me? vllm or tgi are the fastest local servers available right now (to my knowledge)
Add a reply
Sign up and join the conversation on Discord