ghxsted.

Log inLog into community

Find answers from the community

Home

Members

ghxsted.

Offline, last seen 6 months ago

Joined September 25, 2024

gghxsted.

hello, is there a way to use hybrid

hello, is there a way to use hybrid search with reranking with a chat_engine ? I looked nearly through the whole documentation and found nothing about how to integrate these three components together ?

4 comments

gghxsted.

Does Persistent Client saves data to VRAM of GPU?

I have a question, when loading an index from a persistent chromadb vectorstore , is it loaded in the vram of gpu ?

5 comments

gghxsted.

adding documents to chromadb

can someone pls tell me how to add and remove documents from a collection inside an already persistent chroma db, I'm pretty confused trying to do it based on the documentation

13 comments

gghxsted.

Hey i'm using the ragDatasetGenerator to

Hey i'm using the ragDatasetGenerator to generate a set of questions but i'm using a local llm (llama3:70b-instruct) served using ollama and I get these as my questions, is there a way around it ?

3 comments

gghxsted.

is there a way to make the process of

is there a way to make the process of indexing documents parallele ?
for example here the chroma_db creation process is using only one gpu for me, I was wondering if there was an option to make it use both my gpus ?

the process is just loading documents that I have already parsed and indexing them using huggingface embeddings and saving it to a chromadb

1 comment

gghxsted.

Memory

Hello, I have a very specific question :
I have a rag system (using streamlit for the front and flask api and llama index (serving llm using llamaindex ollama integration) ) and when I use my tool for some time (ask let's say a couple of questions (7 to 8 )), I noticed that my Cuda memory gets saturated pretty quickly, I was wondering what could be causing this, if it's due to the context of my previous questions not being purged from the memory after generating a response, or is it because of some other reason :
I'm using chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True), and using chat_engine.reset() to reset it and also tried torch.cuda.empty_cache() to empty the cache and it doesn't work.

LLM : llama3 70b-instruct (4bit quant)
Embeddings : BAAI/bge-base-en-v1.5 (using Huggingface embeddings integration with llama index)

12 comments

gghxsted.

VO.10

is the Pandas Excel Loader no longer supported ? and is there an undergoing update on llama hub ?

19 comments

gghxsted.

Hello, I'm trying to load my index after

Hello, I'm trying to load my index after saving it on my disk but I can't and I get this error.

I'm using a ingestion pipeline and chromadb to create the vectorstore, I can provide the rest of the code if needed

5 comments

gghxsted.

Heyo, I have a question, it is possible

Heyo, I have a question, it is possible to create a chromadb vectorstore when using llama parse ?
I'm using llama parse to read pdfs, from I've seen from the notebooks in documentation, they usually create "nodes" from the parsed documents by llama parse.
what i used before is using the simple directory reader and and i use VectorStoreIndex from documents having created the vectorstore with a chroma collection, just basic stuff

6 comments

gghxsted.

@contributor

5 comments

gghxsted.

Hello, just a quick question, does

Hello, just a quick question, does anyone know how to use an already downloaded model using the HuggingFaceLLM class ?, for example is there an option to input the directory of the already downloaded model instead of having to download it in the cache ?

3 comments

gghxsted.

Hello, I have a general kind of question

Hello, I have a general kind of question, when using local models, what do you guys would be the best choice for the inference engine to use (free)? since llama-index supports vllm and ollama and llama cpp and even huggingface transformers, and alot of other integrations.
this is my setup:

2 Nvidia Quadro P4000 GPUs each 8gb of VRAM (which makes in total 16gb of VRAM)
intel Xeon 3.70 Ghz
32 gbs of RAM

model i'm trying to use is mistral 7b-instruct-v0.1

10 comments

gghxsted.

hello, I have a question concerning the

hello, I have a question concerning the fine tuning of embedding models using this notebook from the docs :
https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding/
Is it possible to finetune any embedding model using it ? in the notebook we're finetuning 'BAAI/bge-small-en' but i'd like to finetune this model "Alibaba-NLP/gte-large-en-v1.5" from huggingface, will an adaptiong or changes will be necessary or the process is the same for both models ?

1 comment

gghxsted.

am I doing something wrong when it comes

am I doing something wrong when it comes to indexing ?
I'm reading my markdown files using simpledirectoryreader and trying to create a vectordb with qdrant that would support hybrid search, but it's taking so much longer than I thought to index all the documents, with chromadb it took nearly 2 hours for all the documents (understandable, I have alot of documents)
but with qdrant it's nearly 7 hours now :)))))

I launched the qdrant client using docker as described in the documentation, and this is my code :

Plain Text

client = qdrant_client.QdrantClient(
    host="localhost",
    port=6333,
    timeout=3000.0
)

aclient = qdrant_client.AsyncQdrantClient(
    host="localhost",
    port=6333,
    timeout=3000.0
)

vector_store = QdrantVectorStore(
    "mydocuments",
    client=client,
    aclient=aclient,
    enable_hybrid=True,
    batch_size=20,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    Vendors_docs,
    storage_context=storage_context,
)

4 comments

gghxsted.

Hello, i have this issue when I try

Hello, i have this issue when I try using the markdownElementNodeParser,
i'm using fastEmbedding and ollama
loading the files using SimpleDirectoryReader

4 comments

gghxsted.

what would be the best node parser to

what would be the best node parser to use in the case where I have markdown files with tables inside ? MarkdownElementNodeParser or MarkdownNodeParser ?
Can anyone please explain the difference is using these two parsers?

Difference in terms of:
Indexing strategies
Influence on the retrieval part of the rag pipeline in terms of performance ?

an example of document i have would be this :

2 comments

gghxsted.

hey guys, I have a bit of a general

hey guys, I have a bit of a general question, I'm wondering what API framework to use and does it actually matter?
I have my simple Rag application with a streamlit UI, and I want to make the backend an API so It can support processing multiple user requests (I want my tool to be used by multiple users after I deploy it locally) , and I was wondering wether to use flask or Fastapi or something else, I heard Async is not supported in flask, so I was looking maybe to use FastAPI since it's supported, what do you guys think ?

2 comments

gghxsted.

Hey, I have a question concerning chroma

Hey, I have a question concerning chroma db,

Plain Text

python 
# load from disk
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

when using this code, does it mean that queryengine i make based on this index will only use the documents in the collection called quickstart and will not use any document in another collection ?

the way I created my vector database was that I create a collection for each "Category of files" (financial files, general, ...etc) (I chose this structure because I wanted to have a global database that contains all of the files from different categories, and have seperate databases for each category, but I want the index of the global database to use them all, hence my question.

8 comments

gghxsted.

does anybody knows how to set up the llm

does anybody knows how to set up the llm parameters (temperature, top_p and max output tokens) when using ollama to load a model ?
this is the code :

Plain Text

from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama2", request_timeout=60.0)

response = llm.complete("")
print(response)

1 comment

gghxsted.

Hello, I have a simple question! is the

Hello, I have a simple question! is the meta data excluded by default in the embeddings generation and LLM?
this is my code :

Plain Text

from llama_index.core import SimpleDirectoryReader

filename_fn = lambda filename: {"file_name": filename}

# automatically sets the metadata of each document according to filename_fn
documents = SimpleDirectoryReader(
    "./data", 
    file_metadata=filename_fn, 
    recursive=True
).load_data()

documents has this :

2 comments

gghxsted.

Does SubQuestion Query engine support

Does SubQuestion Query engine support using source citing ? I want my engine to be able to cite the source of information if possible but Idk if it's actually supported

2 comments

gghxsted.

https://github.com/run-llama/llama_index

https://github.com/run-llama/llama_index/blob/main/docs/examples/usecases/10q_sub_question.ipynb

4 comments

gghxsted.

Sub question

Hello, I have a question about the SubQuestionQueryEngine, Is it still only supported by only some open source llms ? I remember seeing this table in the documentation a while back so I'm wondering if it's still the case ?

1 comment

gghxsted.

Hello, I have a question about the

Hello, I have a question about the privacy of data when using llama parse, since it's with an API access, it is safe to use on data that is classified "do not redistribute" ?

Find answers from the community

hello, is there a way to use hybrid

Does Persistent Client saves data to VRAM of GPU?

adding documents to chromadb

Hey i'm using the ragDatasetGenerator to

is there a way to make the process of

Memory

VO.10

Hello, I'm trying to load my index after

Heyo, I have a question, it is possible

@contributor

Hello, just a quick question, does

Hello, I have a general kind of question

hello, I have a question concerning the

am I doing something wrong when it comes

Hello, i have this issue when I try

what would be the best node parser to

hey guys, I have a bit of a general

Hey, I have a question concerning chroma

does anybody knows how to set up the llm

Hello, I have a simple question! is the

Does SubQuestion Query engine support

https://github.com/run-llama/llama_index

Sub question

Hello, I have a question about the

If anyone worked on something similar I