Simonas

Token usage

Hey Team, how can I pass these values to a response object together with answer:
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1221 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens

2 comments

SSimonas

Rate limit

another question, already asked, what is the best way to cost/performance point of view to ask/collect many questions answers?
I usually hit API rate limit, and huge bill

2 comments

SSimonas

Could I set a fallback mechanism when I

Could I set a fallback mechanism when I get

Plain Text

  raise ValueError("Failed to select query engine") from e
ValueError: Failed to select query engine

using query engine router?

1 comment

SSimonas

Contextual Compression

Are there any examples of usage of Contextual Compression Retriever with LLamaIndex?
https://python.langchain.com/en/latest/modules/indexes/retrievers/examples/contextual-compression.html?highlight=contextual%20compression

3 comments

SSimonas

Hi LlamaIndex team maybe you have an

Hi LlamaIndex team, maybe you have an example of how I can use Ray with open-source models with Llama index?
https://www.anyscale.com/blog/llm-open-source-search-engine-langchain-ray
https://www.anyscale.com/blog/turbocharge-langchain-now-guide-to-20x-faster-embedding
https://www.anyscale.com/blog/building-a-self-hosted-question-answering-service-using-langchain-ray

6 comments

SSimonas

Token cost

To retrieve one question cost around 1000 tokens, is there a way to reduce the token usage? I already using optimiser

1 comment

SSimonas

Hi

Hi!
Simple question. How can I chain query results or what is the best approach?
Example:
answer_1 = query_engine.query("Question 1?")
answer_2=query_engine.query("Question 2?")
answer_3=query_engine.query("Question 3?")

Make a report based on [answer_1, answer_2, answer_3].

Should I simply loop through questions/answers and feed them to LLMChain for the end result?
I tried to give all questions to the agent, but it finishes the chain after the first question is answered.
Also, agent some fail when the query router fails to select the engine for the query

4 comments

SSimonas

Graph persist

Plain Text

query_engine_builder = QASummaryQueryEngineBuilder(service_context=service_context_gpt3)

how can I persist index, not to create every time?

6 comments

SSimonas

Good morning I have a question about

Good morning! I have a question about sharing a knowledge base among users. I'd like to create a solution that stores all data in a single vector store or database. However, I want to restrict user access to only certain portions of the data. Is there a way to namespace the knowledge base so that each user can only access their designated areas?

1 comment

Find answers from the community

Token usage

Rate limit

Could I set a fallback mechanism when I

Contextual Compression

Hi LlamaIndex team maybe you have an

Token cost

Hi

Graph persist

Good morning I have a question about