is there a method to control the times of refined_response? in many time the initial_response is good enough, but it still goes to get 2 or 3 times refined_response, how to say "good enough, please stop" to query?
hi, jerry. I saw the feature on new Sentence Text splitter. it will be called automatically during the opration of creating new index? an other question is: if it can split words in languages not using white space between words, like Chinese? I am using 0.4.32 mainly, and I saw error message about over length term (longer than max_chunk_limit), so I have to process document by a chinese word splitter before creating index, thus I think the build-in splitter not fits languages without white space...
when creating new index, I got terminal info like this: INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total LLM token usage: 0 tokens INFO:llama_index.token_counter.token_counter:> [build_index_from_documents] Total embedding token usage: 212901 tokens my question is: how to store the embedding token usage into a variable?
hi, i tried the logger method, find out that it doesn't contain the context like “Got node text” part in "verbose = True" query which i want to display on the web page for doc quality analysing, what should i do to get the terminal output stored in a variable then send to web page? many thanks!
I got this answer: "The given context information does not provide an answer to the question as it is unrelated to the topic of the passage." can we set it up to let chatgpt answer those questions that not included in given docs?
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=temperature, model_name="gpt-3.5-turbo", max_tokens=num_outputs)) new version seems prefer definition like this..