Find answers from the community

s
F
Y
a
P
Updated last month

Preprocessing

Hi I am using Vector store index to build indexes then using Azure Open Ai for QnA. When I am doing qna like "What is auditor reports?". Difference sources nodes are coming but when I am just changing one word like "What is auditors reports?" I am getting correct answer. Just changing one word I am getting different nodes. How can I solve this issue?
r
t
L
14 comments
Use preprocessing techniques on input query such as stemming and lemmatization .
But if i will do that on query how will llama_index will match the processed query on nodes
I will have to also do processing on nodes when building index?
I feel like this is pretty common for embeddings. With the query being so short, any change at all can drastically change the embeddings

Maybe try increasing the top k one or two ? You could also look into writing a custom retriever that uses a keyword and vector index

https://gpt-index.readthedocs.io/en/latest/examples/query_engine/CustomRetrievers.html
A mix of keyword and vector index (hybrid) is definitely a good idea.
@Logan M Thank you!
@theOldPhilosopher the pre-processing can be done before sending the query to the index. These will work well for keyword index and maynot work as well for a vector index. Hence, a hybrid approach work much better here.

https://exchange.scale.com/public/blogs/preprocessing-techniques-in-nlp-a-guide
I want to confirm that when llama_index us using query to fetch nodes it can use semantic search?
@Logan M @ravi-decover Hi guys, I want to ask that can I use qdrant for semantic search?
Yes, when using a vector index it is (semantic search is basically just embeddings)
For sure! LlamaIndex works well with qdrant
Hi @Logan M I want to know working of llama_index properly like what's the difference between langchain and llama_index? Why I should use llama_index actually I am not that clear about why I should use llama_index instead of langchain. Semantic search is in qdarnt also.So, if it possible can you help me clearing this doubt. It will be great help.
Thanks
Qdrant (and also langchain) don't allow for more complex query structures. With llama index, you can use query engines on top of your index like sub question query engine and router query engine.

Furthermore, these query engines can all be used in agents (which we have some more news on that later today πŸ‘).

Compared to langchain, I'd say llamaindex is more customizable. Retrievers, node postprocessors, response synthesizers all come together to form a query engine, and you can customize each piece.
Okay, so in essence llama_index is more customizable then langchain. And especially when querying we can do a lot more things using llama_index in compare of others
Add a reply
Sign up and join the conversation on Discord