SeaBerg

Converting RAG app to async for production

Hi everyone. I'm working on taking my RAG app to production and want to convert it to async. My pipeline is two query engine tools called by a RouterQueryEngine.
One of the tools is a summary_index.as_query_engine and I'm using use_async=True with in it. The other tool is a QueryEngineTool with query_engine=vector_query_engine and vector_query_engine=RetrieverQueryEngine with a VectorIndexRetriever.
Both query engines include a node_postprocessor that uses CohereRerank.

I'm unsure if the QueryEngineTool part will support async, and I'm also not sure if the RouterQueryEngine also needs async specially associated with it, or if only the tools within it need to be async. It's really difficult to get clear async instructions for all of the different functions and query engines.

Any tips or info would be appreciated!

6 comments

SSeaBerg

I wonder if anyone has gotten chainlit

I wonder if anyone has gotten chainlit to work after the llama index V 0.1 upgrade? I see there was a chainlit repo update that was supposed to transition from service_context to settings, but I haven't been able to find any examples that have been updated. Have spent hours on it and had no success so far. Would be amazing if there were a functioning example somewhere. About to give up and move onto something else that might have more updated examples with the newer Llama Index codebase.

8 comments

SSeaBerg

Function calling

Has anyone tried a LlamaIndex query engine that includes a function call with o1 yet? I just saw the API doesn’t support tool usage. I am planning to try my sub-question query engine pipeline if my enterprise API token has access to o1. Hopefully will know tomorrow.

1 comment

SSeaBerg

I have been given access to the Cohere

I have been given access to the Cohere reranker through my companies Azure AI studio. I see that I can do inference in LlamaIndex with LLMs in Azure, but can I use the a reranking model in Azure as a node postprocessor, as I would do using from 'llama_index.postprocessor.cohere_rerank import CohereRerank'? I saw a github support inquiry mentioned using 'from llama_index.core.postprocessor import AzureRerank' but it doesn't work. Maybe it was an llm hallucination...

7 comments

SSeaBerg

I'm using SubQuestionQueryEngine.from_

I'm using SubQuestionQueryEngine.from_defaults. Is it possible to stream the final response from the LLM? I was hoping to reduce the apparent latency by streaming, but haven't figured out how to do it yet.

1 comment

SSeaBerg

please tell me about query response modes that are available:- refine- compact- tree_s

@kapa.ai please tell me about query response modes that are available:

refine
compact
tree_summarize
accumulate

3 comments

SSeaBerg

Is anyone aware of a convenient way to

Is anyone aware of a convenient way to manually edit the Documents object? I’m still getting some errors in pdf extraction, and it would be nice to do a few edits prior to running it through the node parser.

3 comments

SSeaBerg

Anyone know how to set llm to gpt-4o?

Anyone know how to set llm to gpt-4o?
I updated the llm package: Successfully installed llama-index-llms-openai-0.1.21

llm = OpenAI(model="gpt-4o")
node_parser = MarkdownElementNodeParser(llm=llm)

ValueError: Unknown model 'gpt-4o'. Please provide a valid OpenAI model name in: gpt-4, gpt-4-32k, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-preview, gpt-4-vision-preview, gpt-4-1106-vision-preview, gpt-4-turbo-2024-04-09, gpt-4-turbo, gpt-4-0613, gpt-4-32k-0613, gpt-4-0314, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-16k, gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, text-davinci-003, text-davinci-002, gpt-3.5-turbo-instruct, text-ada-001, text-babbage-001, text-curie-001, ada, babbage, curie, davinci, gpt-35-turbo-16k, gpt-35-turbo, gpt-35-turbo-0125, gpt-35-turbo-1106, gpt-35-turbo-0613, gpt-35-turbo-16k-0613

2 comments

SSeaBerg

llama_index/docs/docs/examples/finetunin...

I'm trying to reload a finetuned embedding model using one of the llama-index examples. https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/finetuning/embeddings/finetune_embedding_adapter.ipynb
Running into an issue with an import, so I'm assuming the source has changed or library name has changed.

Anyone know the current way to accomplish this:
from llama_index.core.embeddings import LinearAdapterEmbeddingModel

2 comments

SSeaBerg

Any tips for getting 'from llama_index.

Any tips for getting 'from llama_index.core.llms.generic_utils import messages_to_prompt' to work?
I'm getting this error: ModuleNotFoundError: No module named 'llama_index.core.llms.generic_utils'.

I have done pip install -U llama-index-llms-openai llama-index-embeddings-openai llama-index-core which I saw recommended for someone else, but didn't help. I'm not finding which package I need to install to do the import. Thanks!

1 comment

Find answers from the community

Converting RAG app to async for production

I wonder if anyone has gotten chainlit

Function calling

I have been given access to the Cohere

I'm using SubQuestionQueryEngine.from_

please tell me about query response modes that are available:- refine- compact- tree_s

Is anyone aware of a convenient way to

Anyone know how to set llm to gpt-4o?

llama_index/docs/docs/examples/finetunin...

Any tips for getting 'from llama_index.