Find answers from the community

Updated 4 months ago

I am currently using GROQ api with llama

At a glance

I am currently using GROQ api with llama index to rum LLAMA3 70B model. I want to run this model on linux based server. I downloaded the model but dont know how to load it in llama index/groq. I don't want to change the rest of the code besides loading the LLM part. Is it possible to do this?

10 comments

LLogan M

groq is not an open-source API? I dont think you actually run groq locally, especially since it needs specific hardware to achieve the speeds it has

SSansV2

For now groq is free but I think pretty soon they will charge for it. I think we have hardware to run the 70B model just need way to load it.

SSansV2

for testing the capabilities of llama3 70B i used groq api. Now that I am satisfied with its performance I want to deploy it. currently I am using llm = Groq(model="llama3-70b-8192", api_key=os.getenv('GROQ_API_KEY')) this to load the model. Is there any way to load my downloaded model using llama_index?

LLogan M

You'd have to use something like VLLM or TGI to host the model instead of groq (note that you'd need like 80GB of VRAM to run the 70b model)

SSansV2

our server have 600+ GB of VRAM so I dont think that would be a problem

LLogan M

look into vllm or TGI then -- llama-index has integrations for both

SSansV2

ok thank you. I will look into that. If you can point me to some resources that would be great

SSansV2

you think I should my own API using vllm?

SSansV2

@Logan M

LLogan M

Yea that would make sense to me? vllm or tgi are the fastest local servers available right now (to my knowledge)

Add a reply