Find answers from the community

Home
Members
chantlong
c
chantlong
Offline, last seen 2 months ago
Joined September 25, 2024
what's the best way to see what packages are compatible with what version of llama-index-core?

like if I installed llama-index-core 0.10.44, I want to make sure the packages like llama-index-llms-gemini version is compatible. a lot of times i'll install the latest version of the packages and will have resolve it manually.
1 comment
L
c
chantlong
·

Callback

Question on the callback manager. Currently in my app, when I start it, I initialize it with Settings.callback_manager = callback_manager where callback_manager just has, llama_debug, tokencountinghandler in Django.

  1. How can I import it correctly because when I try to use import Settings, and try to access Settings.callback_manager in a different file, it ignores what I setup globally and starts its own callback_manager.
  1. Am I suppose to be starting an instance of token_counter per API request? If 2 API requests happen at the same time, I assume it'll conflict so... maybe I can't really use Settings.callback_manager globally? 🤔
4 comments
c
L
Hello, I was wondering is there an API version of SentenceTransformerRerank.
When I use SentenceTransformerRerank directly in my code it takes up a lot of CPU / Ram.
I'm thinking of moving the Rerank model into a separate API and so in the postprocessor, instead of running the rerank model on my my current code/server, it calls an external API to do so.
2 comments
c
L
Reading the source code of synthezing and please correct me if I'm wrong.
It seems like query_engine calls predict when synthezing,
Is it that query_engine calls response = self._llm.complete(formatted_prompt)
where as chat_engine calls chat_response = self._llm.chat(messages) instead?
` def predict( self, prompt: BasePromptTemplate, output_cls: Optional[BaseModel] = None, **prompt_args: Any, ) -> str: """Predict.""" self._log_template_data(prompt, **prompt_args) if output_cls is not None: output = self._run_program(output_cls, prompt, **prompt_args) elif self._llm.metadata.is_chat_model: messages = prompt.format_messages(llm=self._llm, **prompt_args) messages = self._extend_messages(messages) chat_response = self._llm.chat(messages) output = chat_response.message.content or "" else: formatted_prompt = prompt.format(llm=self._llm, **prompt_args) formatted_prompt = self._extend_prompt(formatted_prompt) response = self._llm.complete(formatted_prompt) output = response.text
6 comments
c
L
When printing the trace when using query engine I always see,
SYNTHESIZE
CHUNKING
CHUNKING
LLM

Chunking has this info
Plain Text
{
  "__computed__": {
    "latency_ms": 1.436,
    "error_count": 0,
    "cumulative_token_count": {
      "total": 0,
      "prompt": 0,
      "completion": 0
    },
    "cumulative_error_count": 0
  }
}


What is this chunking actually doing? Does it use prompt tokens ?
2 comments
c
L
Really noob question, but if I am using an embedding model that save vectors in a dimension of 768. Will a chunk size of greater than 768, like 1024 fit into it?
2 comments
c
L
When I call get_prompts on query engine, it provides back 2 prompts
response_synthesizer:text_qa_template
and
'response_synthesizer:refine_template'

how do I make it use the refine_template instead of the qa_template programmatically?
3 comments
c
L
c
chantlong
·

Search

I'm just following the Notebook on Raptor and instead of Chroma using Qdrant and just dumping in my own docs.
1 comment
L
One other issue I'm noticing after migration to 0.10.X is that I get this error asyncio.run() cannot be called from a running event loop in my API call. This wasna't the case during 0.9.X. I wasn't using nestio previously too.
8 comments
L
c
One other thing is that I am already using ObjectIndex to fetch the correct SQL Table mappings.
But the problem is I still have all my few shot prompts for all tables in 1 prompt and it's getting maxed out.
Any suggestions on how can I separate those few shot prompts out so that depending on table I use a smaller prompt?
2 comments
c
L
For the SubqueryEngine, it makes an LLM call per subquestion and then a final synthesizer LLM call. How do I prevent the LLM call per subquestion and just take all the retrieved nodes and questions and dump it in the final synthesizer.

I'm reading the SubQueryEngine source code, but I'm having trouble seeing how the retrieved nodes gets passed into an LLM from right after for the subquestions. Help appreciated.
11 comments
c
L
Does query engine come with retry mechanism? I am getting Request Timeout when calling Azure OpenAI.
8 comments
L
c
Does Object Index retriever make any LLM calls? It looks like it just fetches back the corresponding nodes?
7 comments
L
c
For SubQuestionQueryEngine, it's great when you need to generate questions, but what if I already have my questions beforehand?
In that case is it better to loop over the questions and just use index._as_query_engine and ask each question? I want it to run in parallel if possible though like SubQuestionQueryEngine. If there are any best practices for that I'd love to know! Thanks.
16 comments
c
L
c
chantlong
·

Qdrant

I'm not sure if this is the right place to ask but when running this code, assuming my NOTES (documents) have a length of 2500, after adding it to Qdrant, and looking at the vectors_count, it is around 1300. I would assume if I add 2500 docs, based on the chunking, I would have at least 2500+ vectors_count?

Plain Text
client = qdrant_client.QdrantClient(url="http://localhost:6333")
vector_store = QdrantVectorStore(client=client, collection_name="NOTES")

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=0),
        HuggingFaceEmbedding(model_name='XXXXX'),
    ],
    vector_store=vector_store,
)

pipeline.run(documents=NOTES)
3 comments
L
c
W
c
chantlong
·

Score

More of a LLM question, but when SubqueryQuestionEngine does retrieval, it then passes the 3 nodes it finds along with the score to the LLM to get an answer. Do the scores have an effect on what the LLM decides to do? I'm wondering why the score is passed into the context as well.
5 comments
L
c
c
chantlong
·

Api

I've been stuck on llama index 0.8.64 since the package for openai 1.0 got released because it didn't support azure open ai with SubQueryEngine -> API calls would fail. Was wondering if that issue is gone now.
6 comments
L
c
Does SubQuestionQueryEngine have memory in that it remembers past answers (i.e. the query sent before this query) ?
1 comment
T
I'm reading this doc on MetadataExtraction.
and with using the QuestionsAnsweredExtractor(questions=3, llm=llm), I see that it generates
Plain Text
 'questions_this_excerpt_can_answer': '1. How many countries does Uber operate in?\n2. What is the total gross bookings of Uber in 2019?\n3. How many trips did Uber facilitate in 2019?'}


My understanding is when doing a Vector Search a Document/Node's Text is searched. So how does it know how to search questions_this_excerpt_can_answer without specifying it as a metafilter using qdrant for example.

https://github.com/run-llama/llama_index/blob/main/docs/examples/metadata_extraction/MetadataExtractionSEC.ipynb
5 comments
c
L
c
chantlong
·

Nodes

How can I view the nodes or I guess the result of the chunks after creating a vector query index?
I'm running into an issue where in the service context if I pass in a chunk_size of 512 I am able to get proper results from similarity search. But when I don't pass anything into the service context, which I assume the chunk_size is 1024 by default from llama index, I am getting no results back so I'd like to see what the chunks end up looking like to see whats wrong.

Plain Text
vector_query_engine_index = VectorStoreIndex.from_documents(documents, use_async=True,
service_context=service_context
)
3 comments
W
c
Does SubQuestionQueryEngine ask questions in parallel or is it sequential? i'm using the async version.
1 comment
W
Looking at the docs posted by
Plain Text
llm = AzureOpenAI(
    engine="my-custom-llm",
    model="gpt-35-turbo-16k",
    temperature=0.0,
    azure_endpoint="https://<your-resource-name>.openai.azure.com/",
    api_key="<your-api-key>",
    api_version="2023-07-01-preview",
)


if you use the latest llama index & openai, did api_base get deprecated and became azure_endpoint ?
https://github.com/run-llama/llama_index/blob/main/docs/examples/llm/azure_openai.ipynb
2 comments
L
W
I currently need to perform search that might use a Vector DB Search or an SQL search depending on the query. It seems like Llama Index can handle that with the SQL Router Query Engine.
I am using Qdrant as my Vector DB and I have metadata such as id and I want to be able to dynamically filter metadata
So if the query is give me info about something based on docs belonging to the user id I want to be able to pas that id as filter to Qdrant. Is that possible?
53 comments
L
t
c
For LlamaIndex's default VectorStoreIndex, how is its default search functionality different from others such as Pinecone, Qdrant, Latern etc.
I know they support metafilter search and maybe other stuff but if I were to use their barebones search, how is it different from VectorStoreIndex's default search?
3 comments
L
c