Find answers from the community

Home
Members
ghxsted.
g
ghxsted.
Offline, last seen 3 months ago
Joined September 25, 2024
hello, is there a way to use hybrid search with reranking with a chat_engine ? I looked nearly through the whole documentation and found nothing about how to integrate these three components together ?
4 comments
W
g
I have a question, when loading an index from a persistent chromadb vectorstore , is it loaded in the vram of gpu ?
5 comments
L
g
t
can someone pls tell me how to add and remove documents from a collection inside an already persistent chroma db, I'm pretty confused trying to do it based on the documentation
13 comments
g
W
Hey i'm using the ragDatasetGenerator to generate a set of questions but i'm using a local llm (llama3:70b-instruct) served using ollama and I get these as my questions, is there a way around it ?
3 comments
L
is there a way to make the process of indexing documents parallele ?
for example here the chroma_db creation process is using only one gpu for me, I was wondering if there was an option to make it use both my gpus ?

the process is just loading documents that I have already parsed and indexing them using huggingface embeddings and saving it to a chromadb
1 comment
W
g
ghxsted.
·

Memory

Hello, I have a very specific question :
I have a rag system (using streamlit for the front and flask api and llama index (serving llm using llamaindex ollama integration) ) and when I use my tool for some time (ask let's say a couple of questions (7 to 8 )), I noticed that my Cuda memory gets saturated pretty quickly, I was wondering what could be causing this, if it's due to the context of my previous questions not being purged from the memory after generating a response, or is it because of some other reason :
I'm using chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True), and using chat_engine.reset() to reset it and also tried torch.cuda.empty_cache() to empty the cache and it doesn't work.

LLM : llama3 70b-instruct (4bit quant)
Embeddings : BAAI/bge-base-en-v1.5 (using Huggingface embeddings integration with llama index)
12 comments
L
g
g
ghxsted.
·

VO.10

is the Pandas Excel Loader no longer supported ? and is there an undergoing update on llama hub ?
19 comments
g
W
Hello, I'm trying to load my index after saving it on my disk but I can't and I get this error.


I'm using a ingestion pipeline and chromadb to create the vectorstore, I can provide the rest of the code if needed
5 comments
L
g
W
Heyo, I have a question, it is possible to create a chromadb vectorstore when using llama parse ?
I'm using llama parse to read pdfs, from I've seen from the notebooks in documentation, they usually create "nodes" from the parsed documents by llama parse.
what i used before is using the simple directory reader and and i use VectorStoreIndex from documents having created the vectorstore with a chroma collection, just basic stuff
6 comments
W
g
g
ghxsted.
·

@contributor

5 comments
L
g
Hello, just a quick question, does anyone know how to use an already downloaded model using the HuggingFaceLLM class ?, for example is there an option to input the directory of the already downloaded model instead of having to download it in the cache ?
3 comments
N
g
Hello, I have a general kind of question, when using local models, what do you guys would be the best choice for the inference engine to use (free)? since llama-index supports vllm and ollama and llama cpp and even huggingface transformers, and alot of other integrations.
this is my setup:
  • 2 Nvidia Quadro P4000 GPUs each 8gb of VRAM (which makes in total 16gb of VRAM)
  • intel Xeon 3.70 Ghz
  • 32 gbs of RAM
model i'm trying to use is mistral 7b-instruct-v0.1
10 comments
W
g
hello, I have a question concerning the fine tuning of embedding models using this notebook from the docs :
https://docs.llamaindex.ai/en/stable/examples/finetuning/embeddings/finetune_embedding/
Is it possible to finetune any embedding model using it ? in the notebook we're finetuning 'BAAI/bge-small-en' but i'd like to finetune this model "Alibaba-NLP/gte-large-en-v1.5" from huggingface, will an adaptiong or changes will be necessary or the process is the same for both models ?
1 comment
L
am I doing something wrong when it comes to indexing ?
I'm reading my markdown files using simpledirectoryreader and trying to create a vectordb with qdrant that would support hybrid search, but it's taking so much longer than I thought to index all the documents, with chromadb it took nearly 2 hours for all the documents (understandable, I have alot of documents)
but with qdrant it's nearly 7 hours now :)))))

I launched the qdrant client using docker as described in the documentation, and this is my code :
Plain Text
client = qdrant_client.QdrantClient(
    host="localhost",
    port=6333,
    timeout=3000.0
)

aclient = qdrant_client.AsyncQdrantClient(
    host="localhost",
    port=6333,
    timeout=3000.0
)

vector_store = QdrantVectorStore(
    "mydocuments",
    client=client,
    aclient=aclient,
    enable_hybrid=True,
    batch_size=20,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex.from_documents(
    Vendors_docs,
    storage_context=storage_context,
)
4 comments
L
W
Hello, i have this issue when I try using the markdownElementNodeParser,
i'm using fastEmbedding and ollama
loading the files using SimpleDirectoryReader
4 comments
W
g
what would be the best node parser to use in the case where I have markdown files with tables inside ? MarkdownElementNodeParser or MarkdownNodeParser ?
Can anyone please explain the difference is using these two parsers?

Difference in terms of:
Indexing strategies
Influence on the retrieval part of the rag pipeline in terms of performance ?

an example of document i have would be this :
2 comments
g
W
hey guys, I have a bit of a general question, I'm wondering what API framework to use and does it actually matter?
I have my simple Rag application with a streamlit UI, and I want to make the backend an API so It can support processing multiple user requests (I want my tool to be used by multiple users after I deploy it locally) , and I was wondering wether to use flask or Fastapi or something else, I heard Async is not supported in flask, so I was looking maybe to use FastAPI since it's supported, what do you guys think ?
2 comments
L
Hey, I have a question concerning chroma db,

Plain Text
python 
# load from disk
db2 = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db2.get_or_create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(
    vector_store,
    embed_model=embed_model,
)

when using this code, does it mean that queryengine i make based on this index will only use the documents in the collection called quickstart and will not use any document in another collection ?

the way I created my vector database was that I create a collection for each "Category of files" (financial files, general, ...etc) (I chose this structure because I wanted to have a global database that contains all of the files from different categories, and have seperate databases for each category, but I want the index of the global database to use them all, hence my question.
8 comments
W
g
does anybody knows how to set up the llm parameters (temperature, top_p and max output tokens) when using ollama to load a model ?
this is the code :
Plain Text
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama2", request_timeout=60.0)

response = llm.complete("")
print(response)
1 comment
W
Hello, I have a simple question! is the meta data excluded by default in the embeddings generation and LLM?
this is my code :
Plain Text
from llama_index.core import SimpleDirectoryReader

filename_fn = lambda filename: {"file_name": filename}

# automatically sets the metadata of each document according to filename_fn
documents = SimpleDirectoryReader(
    "./data", 
    file_metadata=filename_fn, 
    recursive=True
).load_data()

documents has this :
2 comments
g
W
Does SubQuestion Query engine support using source citing ? I want my engine to be able to cite the source of information if possible but Idk if it's actually supported
2 comments
g
L
g
ghxsted.
·

Sub question

Hello, I have a question about the SubQuestionQueryEngine, Is it still only supported by only some open source llms ? I remember seeing this table in the documentation a while back so I'm wondering if it's still the case ?
1 comment
L
Hello, I have a question about the privacy of data when using llama parse, since it's with an API access, it is safe to use on data that is classified "do not redistribute" ?
1 comment
W
If anyone worked on something similar I can seriously use some insights!
7 comments
g
L
G
W