Find answers from the community

Home
Members
Rendy Febry
R
Rendy Febry
Offline, last seen 3 months ago
Joined September 25, 2024
Do anyone experience different ingestion order cause retrieval result? If yes, any reason for that?

For example, if I have 5 years of corpus, and I ingest them in the order of 2015, 2014, 2013, 2012, 2011. Then I search using vector similarity search, the result will be different from the case that I ingest them in the order of 2011, 2012, 2013, 2014, 2015.

Is this somehow related to the way ANN algorithm works?
4 comments
r
R
R
Rendy Febry
·

Hi2

Hi2,

Is there any QueryEngine composition that able to do "pre-requisite" data query?

for example, I have SQL Query engine that contain a table of data that grouped by catagory, and before querying the data I want to get the list of categories that potentially relevant to user query, so I end up creating another VectorStoreIndex to store the list of categories and it's description.
4 comments
L
R
Hi Team

What's the fundamental different between SQLAutoVectorQueryEngine and SQLJoinQueryEngine? their description quite vague and similar

"""SQL + Vector Index Auto Retriever Query Engine.

This query engine can query both a SQL database
as well as a vector database. It will first decide
whether it needs to query the SQL database or vector store.
If it decides to query the SQL database, it will also decide
whether to augment information with retrieved results from the vector store.
We use the VectorIndexAutoRetriever to retrieve results.


"""SQL Join Query Engine.

This query engine can "Join" a SQL database results
with another query engine.
It can decide it needs to query the SQL database or the other query engine.
If it decides to query the SQL database, it will first query the SQL database,
whether to augment information with retrieved results from the other query engine.
2 comments
L
R
Hi Guys

On PgVectorStore with Hybrid Search, we try to create sql index for the tsv column, that's why we got this bug.
https://github.com/jerryjliu/llama_index/issues/7740

Any idea why we don't built SQL index for the main vector embedding column? FYI PgVector itself support indexing with IVFFlat & HNSW
https://github.com/pgvector/pgvector#indexing

I tried it myself, the performance improvement is great, but the hardware utilization also increase significantly, especially when inserting new document. But probably I did that wrongly

WDYT guys ?
5 comments
R
L
k
Hi Everyone

I still don't understand the Hybrid Search function on PgVectorStore, can anyone help me to understand how that different than regular Vector Similarity Search?

Also, if lets say I already have huge VectorIndex, if I want to enable Hybrid Search, do I need to re-ingest the wrole index?

Thank you
4 comments
T
R
Hi Everyone

Is anyone know what's the purpose of Index Store?

If I want to have 2 separate indexes which their own sets of documents, what's the best way to achieve that?
11 comments
R
L
W
Hi,

Quick question, on llama index we have a similarity score when querying the nodes right, is that score mean 0 not similar and 1 mean very similar? In other word, the direction is on the oppisite of Eucladian Distance.

Euclidean Distance: Euclidean distance calculates the straight-line distance between two points in the vector space. Smaller distances indicate greater similarity. This metric is used when the magnitude and direction of vectors are both important.

If yes, then I think we have a problem on the PgVectorStore scoring logic and order.
More context: https://github.com/jerryjliu/llama_index/issues/7214
1 comment
L
Quick question

For VectorStoreIndex, we have an option to store the index to various VectorStorage, such as Chroma, Pinecone, and so on.

For other type of indexes (KeywordTableIndex, DocumentSummaryIndex, etc), do we have a similar storage solution? Or is it just in-memory and disk persist?
21 comments
R
b
L
I think this new release will break many user implementations
https://github.com/jerryjliu/llama_index/pull/7223

To name a few:
  • Due to the switching text splitter to sentence splitter, need to install/update additional package (eg: nltk, newer version of langchain that include HuggingFaceBgeEmbeddings)
  • Change the default text splitter to be sentence splitter might also change the end-user behavior, which might break user test-cases.
  • The sentence splitter use nltk package, and as described on https://github.com/jerryjliu/llama_index/pull/6579, nltk will try to download the file to /home which is not available on may cloud serverless solution (eg: lambda, fargate, etc). So until user explicitly set the NLTK_DATA, it won't run
16 comments
L
a
R