Find answers from the community

Home
Members
Borg1903
B
Borg1903
Offline, last seen 3 months ago
Joined September 25, 2024
For a QA retrieval query_engine.query() method, how do I get logs or trace of all the prompts and inputs to the LLM llama index makes when generating the answer. I am using the default response mode btw,
I swear I saw this somewhere before but I am unable to find it now.
3 comments
L
B
W
B
Borg1903
·

Logan M

@Logan M

What should be the ideal value for num_output in ServiceContext if I want to make the llm responses fairly long. Should I let it use default values or should I manually set this value? I use Ollama and OpenAI classes for the llm models based on some user input parameter. So, should I have some dynamic value for the num_output param based on the model? (Since llama2 models have smaller context window than something like gpt-4)
2 comments
B
L
@Logan M I was trying out llama2 model with llama-index and I know there's the LlamaCpp class in this library. However say I am hosting a llama-cpp model on my own server, is there a way I can get llama-index to use the model from the server? With the current implementation, we need to store the llama model file locally, but I want to store and use that llama model file from somewhere else entirely by hosting it on a server and making it accessible to other applications similar to the OpenAI api. Is it possible to do something like this?
57 comments
G
L
B
Hey @Logan M @ravitheja, does async index creation and aquery not run when I use LangchainEmbeddings as the embed_model? I have been getting NotImplementedError
3 comments
B
L
Hey anyone tried qdrant with llama-index while storing around 5 million of data? Also, wanted to know, in case during querying, if I set top_k to around 3 or 4 to go through 3 or 4 nodes, does it impact the costs by a lot?
2 comments
B
j