Find answers from the community

Home
Members
Thomas1234
T
Thomas1234
Offline, last seen 3 months ago
Joined September 25, 2024
Hello,
I'm having issues with the OpenAILike class following the latest updates of vLLM (0.6.X).
I'm getting theses types of errors when I call my chat_engine, no matter the model.
Plain Text
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'This model only supports single tool-calls at once!', 'type': 'BadRequestError', 'param': None, 'code': 400}

I don't have the issues with the versions 0.5.X any idea why ?
33 comments
L
T
T
Thomas1234
·

Hello,

Hello,
There seems to be an issue with the dependencies for the latest version of llama-index-core 0.10.37 and fastapi, I can’t use both because of the requirements for typer (needed by spacy and weasel to be 0.9.4 and fastapi needing 0.12.3)
12 comments
T
L
T
Thomas1234
·

Hello,

Hello,
Is anyone getting this type error when using mistral API for async streaming ?

TypeError: object async_generator can't be used in 'await' expression
7 comments
T
L
T
Thomas1234
·

Hello,

Hello,
I want to my users to be able to share document between another. I’m using qdrant, I’m wondering if the best practice would be to have an index per document, and each user have a collection of index, or if there is a way to move a file from one index (collection) to another collection without having to re do the embedding part ?
14 comments
L
T
T
Thomas1234
·

Hello,

Hello,
How can I load my qdrant index once it has been created, is there a load_index_from_storage kind of function ?
5 comments
W
T
I’m using OpenAILike class with vLLM.
10 comments
T
L
Hello,
Is there a way to just get the context from the query ? For example I want to extract the context without having to make a call to the LLM ?
I’m trying to use vector store index alongside reranking
7 comments
S
W
T
s
Hello,
Is there a way to determine when to stop a chunk ? I want to have chunks that stops at the end of a paragraph and does not overlap. I’ve tried with sentence splitter with \n\n\n but it does not seems to be doing it. The other option would be to have hundreds of separated little txt files but I’d rather not
18 comments
L
T
T
Thomas1234
·

Async

Hello,
Is there a way to use using async function for the query index with local embedding ?
6 comments
T
L
Hello,
I'm trying to follow this tutorial about ensemble query engine using a local llm rather than the OpenAI API.
https://github.com/jerryjliu/llama_index/blob/main/docs/examples/query_engine/ensemble_query_engine.ipynb
For this I'm using the HuggingFaceLLM method and a finetuned version of Llama 2.
However, when I get to the end of the tutorial for the LLMMultiselector it says it's using the default Llama CPP and I get this error message when I try the query:
" KeyError: 'choice' "
Can someone help me figure this out ?
Thanks 😄
4 comments
T
L
Hello all,
I'm trying to build a RAG and I was wondering if there is any drawbacks beside computational time to it ?
In my case I'd like to use either a vector store index + keyword index or vector store index + keyword index + reranker.
Also is there a 'standard' reranker that should be used for structured data ?
Thanks a lot !
1 comment
L
Hello
When using the query engine with streaming set to True on your index how can you avoid having the </S> at the end of your answer ?
Thanks for the help
5 comments
L
T
Hello all,
I’m trying to use an index tree with the hugging face llm function, but I’m struggling to make it work since the number of tokens is higher than 2048. Did anyone manage to implement it ? I’m trying to implement it since I’m working with about 30 documents that are quite big (think 3000+ tokens each) and the vector store index method is not giving the results I’d wish for. Thanks a lot
11 comments
L
T
Hello, I’m using the mistral API and when I run it in async while streaming I get this weird bug, it only happens on odd request, for example 1: works, req 2: fail, req 3: works and so on
runtimeError: Event loop is closed
8 comments
L
T
T
Thomas1234
·

Hello,

Hello,
When I’m using node_post processors into my chat engine directly, it just ignores it and doesn’t do anything. On the other hand when I’m using the CondensePlusContextChatEngine with the query engine using the post processor it gives me the following error:

AttributeError: 'str' object has no attribute 'query_str'
5 comments
T
L
T
Thomas1234
·

Hey,

Hey,
I’m getting this error when I’m trying to build an async index with Qdrant

AttributeError: 'NoneType' object has no attribute 'get_collection'

I’m following this : https://docs.llamaindex.ai/en/stable/examples/vector_stores/QdrantIndexDemo.html
3 comments
L
T
T
Thomas1234
·

Hello,

Hello,
I’m currently using vLLM, and I’m trying to use async and streaming at the same time with my vector store index. Unfortunately, when it’s a query engine it’s not supported, and when I use a chat engine it either stream or use the async but both together don’t stream, it just use the async property. Any ideas on how to bypass this ?
43 comments
T
L
Hi, I’m trying to save documents that have been embedded to mongodb, but I don’t know how to persist using an ingestion pipeline
8 comments
L
T
R
Hello everyone,
How can you give a system prompt to the instruct version of mixtral in a chat setting ?
7 comments
T
L
Hello,
Will there be a way to implement the Rags with local LLM (Hugging face or llama cpp) ?
5 comments
L
T
Hello all,
I’m wondering if there is a way to run llm call in parallel when using a local model ? I’m using Llama 13B on a single V100 GPU and I’ve tried multi threading but it’s give the same run time as when I was using asynchronous call. Did someone successfully manage to run several process at the same time on a single gpu ? Thanks in advance
12 comments
T
L
E
T
Thomas1234
·

Hello

Hello,
I'm trying to deploy my LLM and I have a couple of question about it.
1) First, I've seen the Llama index starter pack and I was wondering if it was compatible with Kubernetes to scale ?
2) I'm using Llama 2 the 7B and 13B versions and If i want to have approximatively 10 users simultaneously (at most) do you guys know what infrastructure size should be used (for example 2 A100 40 GB) or at least how much GPU I should be dedicating per user ?
Thanks a lot for the help 🙂
3 comments
T
L
T
Thomas1234
·

Hello

Hello,
While trying to use the new HuggingFaceLLM function with the Vector Store Index I get the following error:
'HuggingFaceLLM' object has no attribute 'predict'
Does anyone have an answer to this ?
I'm following this tutorial: https://gpt-index.readthedocs.io/en/v0.7.0/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.html
Also I've added embedding in the service context using LangChainEmbedding since I was getting an Open AI API key error
Thanks
11 comments
T
L