Hey!
Can you share your error and the operation you were trying when you got the issue here please
@WhiteFang_Jr im trying to create a rag application pdf answering for my graduation project , i stock the chunks and the documents info in Mongo db and im using Faiss as a vector store , and im using Llama-2-7b when i use this function so i can get my response @app.route('/query', methods=['GET'])
def query():
query_text = request.args.get('query')
if not query_text:
return jsonify({'error': 'Query parameter is missing'}), 400
try:
logging.info(f"Processing query: {query_text}")
response = query_engine.query(query_text)
logging.info(f"Query response: {response}")
return jsonify({'response': response})
except Exception as e:
logging.error(f"Error processing query: {e}", exc_info=True)
return jsonify({'error': str(e)}), 500 and i have this error
This is because Flask does not allow pydantic response return
What data do you want to return
You can create a dict containing all the required items in it.
For example:
{ response: response.response,
node_info: [ Add all the node info here ]
}
and then return this dict as a final response
@WhiteFang_Jr i have another question why when i run my code i wait a lot with this message transformers\models\llama\modeling_llama.py:670: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
I think this is a warning for something related to pytorch.
I guess this is not stopping your code right?
im trying this code to resolve my problem
@app.route('/query', methods=['GET'])
def query():
query_text = request.args.get('query')
if not query_text:
return jsonify({'error': 'Query parameter is missing'}), 400
try:
logging.info(f"Processing query: {query_text}")
response = query_engine.query(query_text)
logging.info(f"Query response: {response}")
response_data = {
'response': response.response,
}
return jsonify(response_data)
except Exception as e:
logging.error(f"Error processing query: {e}", exc_info=True)
return jsonify({'error': str(e)}), 500
the code is runnning but ti takes too much time
Are you running on GPU or CPU?
cpu i think i put device_map="auto"
@WhiteFang_Jr 1 hour passed and the code didnt excute yet
CPU will take more time as llms are not effective or are created for CPU.
If you want to test local llm, try it with ollama I think it's optimized so might be faster then this current one that you have
can i know more about how replacing my curent llm with ollama please ?
okey thanks a lot for ur time 😄
@WhiteFang_Jr finally he gave me the answer after 2 hours😅 but the answer was in english different from the document language
Finally!!
one more thing, open-source llm may not be good at providing response in language other than english.
Also do use embedding model which works for your language if it is diff than english
i will try in first place ollama to see if i will have a fastest response
@WhiteFang_Jr when i tried to verify the response the infos are not from the documents on mongo db it just like they are generated from llm how can i identify the problem ?
You'll have to check what your embedding model is returning as nodes for your query.
@WhiteFang_Jr can u verify with this code logique plz
You are creating embedding model on your own. Is there any specific reason to not use llamaindex directly to create embedding model?
no i thought this is the way to use llamaindex embedding model i already declare this Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2")
Okay, I got confused with this piece of code:
def embed_text(text):
inputs = tokenizer(text, return_tensors='pt', padding=True, truncation=True, max_length=512)
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(1).detach().numpy()
faiss.normalize_L2(embeddings)
return embeddings
do i remove it @WhiteFang_Jr ?
Yeah you can, Also can you tell me what do you want to acheive?
Then maybe I can be able to suggest you some changes based on that. For instance here in the follwoing code:
try:
with pdfplumber.open(file_path) as pdf:
for page_number, page in enumerate(pdf.pages, start=1):
text = page.extract_text()
if text:
embeddings = embed_text(text)
if embeddings is not None:
embeddings = embeddings.reshape(1, -1) # Reshape for Faiss
vector_id = str(faiss_index.ntotal)
faiss_index.add(embeddings)
pdf_collection.insert_one({
'filename': filename,
'text': text,
'page_number': page_number,
'vector_id': vector_id
})
else:
logging.error(f"No embeddings generated for page {page_number} of {filename}")
else:
logging.error(f"No text found on page {page_number} of {filename}")
return jsonify({'message': 'PDF uploaded and processed', 'filename': filename})
You can use simply pass this text to index
@WhiteFang_Jr i want to save the chunks and the documents informations ( like the document name and the number of page where the chunk is ) and do the similarity with faiss so when i query it answer from the document
the documents are on mongo db
@WhiteFang_Jr it still just give a answer inexistant on the documents this is my query code @app.route('/query', methods=['GET'])
def query():
query_text = request.args.get('query')
if not query_text:
return jsonify({'error': 'Query parameter is missing'}), 400
try:
logging.info(f"Processing query: {query_text}")
response = query_engine.query(query_text)
logging.info(f"Query response: {response}")
response_data = {
'response': response.response,
}
return jsonify(response_data)
except Exception as e:
logging.error(f"Error processing query: {e}", exc_info=True)
return jsonify({'error': str(e)}), 500
@WhiteFang_Jr What do u think ?
You'll have to verify whether it is able to find correct data points or not for your query. For this multiple items can be at fault:
- Your embedding model may not be suited for your document language ( I think its not in english )
- The llm is not able to answer correctly ( could be because it is not familiar with your choice of doc langauge or not capable enough )
If this is your college project or something I would recommend using Qdrant + FastAPI
They are much better and have lots of examples present for these two
@WhiteFang_Jr If I understood correctly, I should replace Faiss with Qdrant and keep Mongo DB for storing chunks and document information.
Yes also one more clarity if you could provide:
- How did you store docuemnts in mongo ?
- How are you accesssing it
the objectif is to store the documents information and the chunks and their embedding on Mongofb abd do the similarity search with faiss # MongoDB setup
mongo_conn_url = os.getenv("MONGO_CONN_URL", "mongodb://localhost:27017/")
client = MongoClient(mongo_conn_url)
db = client['pdf_query_db']
pdf_collection = db['pdfs']
Setting up the Document Store and Index Store
docstore = MongoDocumentStore.from_uri(mongo_conn_url)
index_store = MongoIndexStore.from_uri(mongo_conn_url)
storage_context = StorageContext.from_defaults(docstore=docstore, index_store=index_store, vector_store=vector_store)
Initialize MongoDB reader
reader = SimpleMongoReader(uri="mongodb://localhost:27017")
documents = reader.load_data(db.name, pdf_collection.name, field_names=["text"])
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context , embed_model=Settings.embed_model)
print(pdf_collection.name)
index.storage_context.persist(persist_dir="./storage")
Load or create indices
index.set_index_id("my_index")
index = load_index_from_storage(storage_context , index_id="my_index")@WhiteFang_Jr this is the logic
I dont think you need to persist, You can always read from mongo and create index.
Or if you want to persist. then add if/else condition that if local persist exists then no need to load from mongo anything directly fetch from local persist
i have a confusion on doing the similarity search because i was convinced that the persist is for saving vectors and using them after to do the similarity search i want to know is my code is correct and how i can save the embedding with the pdf information on Mongo db @WhiteFang_Jr
Yep persisting will save the embeddings of the documents locally, But as per your code everytime you'll run the code:
- It'll fetch documents from Mongo Reader
- Create embeddings
- Create Index
- Persist it to local storage.
This is redoing again and again!
If you want to store your embeddings in mongo, Use this:
https://docs.llamaindex.ai/en/stable/examples/vector_stores/MongoDBAtlasVectorSearch/?h=mongoOnce you store it, just create the vector store instance , pass it to VectorStoreIndex and you are good to query your docs
@WhiteFang_Jr yes i understood u but i dont what to use the cloud and i heard tha tMongo db atlas is using cloud thats why i use faiss for the vector stuff and Mongo db to save the informations