Find answers from the community

Updated 5 months ago

Hi Kacper Łukawski I am thinking about

At a glance

Hi I am thinking about using Qdrant as my Vector Store database but before using it I have some queries can you help me on clearing those queries?

32 comments

KKacper Łukawski

Hi, sure! Could you please write them down here?

ttheOldPhilosopher

Below are the queries which I have right now : 1. Can I use llama_index's simple directory reader to directly read and save documents as vector stores when working with PDF files?

Is it possible to save indexes created by llama_index into Qdrant?
What is the recommended approach for saving multiple indexes in Qdrant?
While building indexes using llama_index and utilizing HuggingFaceEmbeddings, I pass the embeddings in service_context. However, when creating a new collection in Qdrant, I need to declare embeddings and distance. How can I use my own embeddings in this case?

ttheOldPhilosopher

Hi @Kacper Łukawski I hope you will able to clear these doubts thanks.

KKacper Łukawski

@theOldPhilosopher These questions are generally related to LlamaIndex, as Qdrant is just a provider. Let me check this out, and hopefully come back with some answers. However, some code snippets would be of great help.

ttheOldPhilosopher

Okay I will provide you with some code snippets. Thanks for the help again.

LLogan M

Yes, and vector store will work with any data loader from llama index
You'll have to setup Qdrant as the vector_store in the storage context before creating your index, then it will be saved to Qdrant
Qdrant supports setting a collection_name, so I would use that to denote seperate indexes
I'm unsure on this one, but llama-index will always use the embeddings on the service context, and insert the node + embeddings into qdrant (this is true for any vector db in llama-index)

ttheOldPhilosopher

Thanks @Logan M for query 4 when we create a client we have to give dimension size so should I give it to according to my custom embeddings size?

LLogan M

Yea exactly 👍

ttheOldPhilosopher

Thanks

ttheOldPhilosopher

Hi @Logan M I have created a vector store and saved the index in qdrant and used it for querying it is working good but what if I don't want to load index in memory I want to do querying directly in qdrant I don't want to load index in memory can you guide me how can I do that?

LLogan M

You can setup the vector_store object to point to an existing qdrant collection/index, and then you can do this

(service context is only needed if you've customized it)

Plain Text

index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

ttheOldPhilosopher

Yeah I have done it and it working it's fast thanks.

ttheOldPhilosopher

Hi @Logan M I have a query I hope you will help me on this.

LLogan M

What's up? 👀

ttheOldPhilosopher

I am using Qdrant as my Vector Database for now. Now, I am using chunk_size as 1000 but during creation of collection in VDB I had to define size=768 in VectorParams and after doing this when I create index inside of VDB. I checked nodes size but it is nor 768 and 1000 it's different. Can you help me clear this topic.

LLogan M

chunk size and is the number of tokens per chunk

meanwhile, this other size parameter is actually the embedding dimenions. Embedding models take a text chunk and turn it into a list of numbers. This list of numbers can be many lengths depending on the model you used. In this case, the emebddings have 768 dimensions

The embedding dimensions will be the same, no matter the chunk_size

ttheOldPhilosopher

Then what will be the nodes size?

LLogan M

The length of each node will be up to 1024 tokens long

LLogan M

since you set the chunk size to 1024

LLogan M

the embedding is just a numerical representation of the text. It's unreleated to the chunk_size

ttheOldPhilosopher

Yes, thanks man it was great help.

ttheOldPhilosopher

I forgot the basic concept of embeddings😅

LLogan M

haha no worries! There's lots to remember with this stuff 😅

ttheOldPhilosopher

Hi @Logan M I have a question about the node size. When I ran the query with response.source_nodes, I got two nodes as the result. I want to check if the chunk size of each node matches the parameter chunk_size=1024. How can I do that? Thank you for your time and guidance.

LLogan M

I thiiiink something like this should work

Note that we use the metadata mode here, since metadata is injected into the text unless otherwise configured. Although, your nodes may not have any metadata anyways 🙂

Plain Text

from llama_index.utils import GlobalsHelper
from llama_index.schema import MetadataMode
tokens = GlobalsHelper.tokenizer(response.source_nodes[0].get_content(metadata_mode=MetadataMode.LLM))
print(len(tokens))

ttheOldPhilosopher

hi @Logan M I tried the above code after updating llama_index to its latest version.

Attachment