Hmmm I don t think anything like that

At a glance

Hmmm, I don't think anything like that exists right now. Or at least nothing that isn't super hacky lol

24 comments

Ook so basically choosing a vectorindex in the beginning is vry important decision lol?

And also just curious, I am currently using the simplevectorindex, does llama-index perform the query similarity search and retrieval manually under the hood or is it using some other techniqes?

kkrishnan99

Because my model is a bit slow for response so just wondering what the key components are so that I can speed up

LLogan M

Well kinda I suppose lol. But thankfully embeddings are very cheap to calculate 🫠

Under the hood it's just using cosine similarity, nothing too special.

In my experience the slowest part by far is the LLM calls

kkrishnan99

ONe issue I found is the LLM model. GPT3.5 seems to be sloe especiialy when the server is ovrloaded

kkrishnan99

Nice!! So when would we use things like faiss and pinecone?

kkrishnan99

And would using them in llama-index use their retrieval methods or still the llama-index sosine search?

LLogan M

Only if you have an extreme amount of embeddings (like you index.json is over 5GB or something silly), or maybe you want something where it's a little easier to manage the embeddings per user (if applicable)

Using another vector store would use their retrieval methods. So it calculates an embedding for the query text and then ships that off to the vector store to do its thing

kkrishnan99

Ohh I see

kkrishnan99

And you can still save the index as json right?

kkrishnan99

irrespective of the vectorindex used

kkrishnan99

"maybe you want something where it's a little easier to manage the embeddings per user (if applicable)" What do you mean by this?

LLogan M

Well, with third party vector stores everything is actually in the vector store itself, so nothing to save or load really

When you initialize the index, you can connect to the existing documents and just pass in an empty list of documents

LLogan M

Same index stores have ways of separating the data, I.e. per user in your application maybe.

You could totally do this yourself too with the simple vector index, just separating the index json per user

kkrishnan99

Ohh so its kinda like stored in their server?

kkrishnan99

And just a final question, when the chunks are stored as index, isnt the embedding added in as a type of metadata/look-up data. So that everytime you retrieve a chunk, its not "de-embedded" or anythin right? ahha

LLogan M

Yessir

LLogan M

Yessir, at query time, only the query text needs to be embedded, the rest are saved

kkrishnan99

Awesome thank you so much!! This has been a great learning opportunity!!!

kkrishnan99

one more quick q ahah, what is this async query lol?

kkrishnan99

And when would it be useful?

LLogan M

It's useful so that the query call doesn't block your application, so you can call it and then fetch the actual answer when it's done

In a server setting this kind of makes sense

kkrishnan99

Hmm I might need to a deep dive into this aahah. Aplogies if this was a stupid question lol I dont have a background in compsci

LLogan M

Yea no worries! It's pretty non-obvious tbh haha

Add a reply

Find answers from the community

Hmmm I don t think anything like that