Thomas1234

Log inLog into community

Find answers from the community

Home

Members

Thomas1234

Offline, last seen 5 months ago

Joined September 25, 2024

TThomas1234

Openailike class errors with latest vllm updates

Hello,
I'm having issues with the OpenAILike class following the latest updates of vLLM (0.6.X).
I'm getting theses types of errors when I call my chat_engine, no matter the model.

Plain Text

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'This model only supports single tool-calls at once!', 'type': 'BadRequestError', 'param': None, 'code': 400}

I don't have the issues with the versions 0.5.X any idea why ?

33 comments

TThomas1234

Hello,

Hello,
There seems to be an issue with the dependencies for the latest version of llama-index-core 0.10.37 and fastapi, I can’t use both because of the requirements for typer (needed by spacy and weasel to be 0.9.4 and fastapi needing 0.12.3)

12 comments

TThomas1234

Hello,

Hello,
Is anyone getting this type error when using mistral API for async streaming ?

TypeError: object async_generator can't be used in 'await' expression

7 comments

TThomas1234

Hello,

Hello,
I want to my users to be able to share document between another. I’m using qdrant, I’m wondering if the best practice would be to have an index per document, and each user have a collection of index, or if there is a way to move a file from one index (collection) to another collection without having to re do the embedding part ?

14 comments

TThomas1234

Hello,

Hello,
How can I load my qdrant index once it has been created, is there a load_index_from_storage kind of function ?

5 comments

TThomas1234

I’m using OpenAILike class with vLLM.

10 comments

TThomas1234

Hello,

Hello,
Is there a way to just get the context from the query ? For example I want to extract the context without having to make a call to the LLM ?
I’m trying to use vector store index alongside reranking

7 comments

TThomas1234

Hello,

Hello,
Is there a way to determine when to stop a chunk ? I want to have chunks that stops at the end of a paragraph and does not overlap. I’ve tried with sentence splitter with \n\n\n but it does not seems to be doing it. The other option would be to have hundreds of separated little txt files but I’d rather not

18 comments

TThomas1234

Async

Hello,
Is there a way to use using async function for the query index with local embedding ?

6 comments

TThomas1234

llama_index/docs/examples/query_engine/e...

Hello,
I'm trying to follow this tutorial about ensemble query engine using a local llm rather than the OpenAI API.
https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/ensemble_query_engine.ipynb
For this I'm using the HuggingFaceLLM method and a finetuned version of Llama 2.
However, when I get to the end of the tutorial for the LLMMultiselector it says it's using the default Llama CPP and I get this error message when I try the query:
" KeyError: 'choice' "
Can someone help me figure this out ?
Thanks 😄

4 comments

TThomas1234

Hello all

Hello all,
I'm trying to build a RAG and I was wondering if there is any drawbacks beside computational time to it ?
In my case I'd like to use either a vector store index + keyword index or vector store index + keyword index + reranker.
Also is there a 'standard' reranker that should be used for structured data ?
Thanks a lot !

1 comment

TThomas1234

Hello Logan M

Hello
When using the query engine with streaming set to True on your index how can you avoid having the </S> at the end of your answer ?
Thanks for the help

5 comments

TThomas1234

Hello all

Hello all,
I’m trying to use an index tree with the hugging face llm function, but I’m struggling to make it work since the number of tokens is higher than 2048. Did anyone manage to implement it ? I’m trying to implement it since I’m working with about 30 documents that are quite big (think 3000+ tokens each) and the vector store index method is not giving the results I’d wish for. Thanks a lot

11 comments

TThomas1234

Hello, I’m using the mistral API and

Hello, I’m using the mistral API and when I run it in async while streaming I get this weird bug, it only happens on odd request, for example 1: works, req 2: fail, req 3: works and so on
runtimeError: Event loop is closed

8 comments

TThomas1234

Hello,

Hello,
When I’m using node_post processors into my chat engine directly, it just ignores it and doesn’t do anything. On the other hand when I’m using the CondensePlusContextChatEngine with the query engine using the post processor it gives me the following error:

AttributeError: 'str' object has no attribute 'query_str'

5 comments

TThomas1234

Hey,

Hey,
I’m getting this error when I’m trying to build an async index with Qdrant

AttributeError: 'NoneType' object has no attribute 'get_collection'

I’m following this : https://docs.llamaindex.ai/en/stable/examples/vector_stores/QdrantIndexDemo.html

3 comments

TThomas1234

Hello,

Hello,
I’m currently using vLLM, and I’m trying to use async and streaming at the same time with my vector store index. Unfortunately, when it’s a query engine it’s not supported, and when I use a chat engine it either stream or use the async but both together don’t stream, it just use the async property. Any ideas on how to bypass this ?

43 comments

TThomas1234

Hi, I’m trying to save documents that

Hi, I’m trying to save documents that have been embedded to mongodb, but I don’t know how to persist using an ingestion pipeline

8 comments

TThomas1234

Hello everyone,

Hello everyone,
How can you give a system prompt to the instruct version of mixtral in a chat setting ?

7 comments

TThomas1234

Hello,

Hello,
Will there be a way to implement the Rags with local LLM (Hugging face or llama cpp) ?

5 comments

TThomas1234

Hello all

Hello all,
I’m wondering if there is a way to run llm call in parallel when using a local model ? I’m using Llama 13B on a single V100 GPU and I’ve tried multi threading but it’s give the same run time as when I was using asynchronous call. Did someone successfully manage to run several process at the same time on a single gpu ? Thanks in advance

12 comments

TThomas1234

Hello

Hello,
I'm trying to deploy my LLM and I have a couple of question about it.
1) First, I've seen the Llama index starter pack and I was wondering if it was compatible with Kubernetes to scale ?
2) I'm using Llama 2 the 7B and 13B versions and If i want to have approximatively 10 users simultaneously (at most) do you guys know what infrastructure size should be used (for example 2 A100 40 GB) or at least how much GPU I should be dedicating per user ?
Thanks a lot for the help 🙂

3 comments

TThomas1234

Hello

Hello,
While trying to use the new HuggingFaceLLM function with the Vector Store Index I get the following error:
'HuggingFaceLLM' object has no attribute 'predict'
Does anyone have an answer to this ?
I'm following this tutorial: https://gpt-index.readthedocs.io/en/v0.7.0/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.html
Also I've added embedding in the service context using LangChainEmbedding since I was getting an Open AI API key error
Thanks

11 comments