Find answers from the community

Updated 3 months ago

@Logan M @everyoneI wish to demonstrate

@everyoneI wish to demonstrate to the community that Zephyr-7b-beta has a running and inference cost that is 13 times lower than GPT 3.5 and 6 times less than Llama 70b. Can I make this cost comparison in a Colab notebook?
How to do that? which will show cost per query something like this?
W
s
L
9 comments
I think you can use Token count handler for all these models.

https://docs.llamaindex.ai/en/stable/examples/callbacks/TokenCountingHandler.html#token-counting-handler

But I think you will not be able to run 70b on colab
@WhiteFang_Jr How do i count the token for zephyr7b
I think this should work for Zephyr as well. as the current example shows the token count for Anthropic model
@WhiteFang_Jr this need api key right there's no api key for Zephyr7B
Right -- you just need to change the llm to be zephyr, don't worry about an API key
I'm still learning not familier with
I don't really know how to estimate cost for zephyr? Like, since the model is free, the only cost is the hardware to run it
Thanks for the help πŸ™‚
Add a reply
Sign up and join the conversation on Discord