Any way to find out how many documents I

At a glance

The community members are discussing how to find information about a vector index, such as the number of documents, the model used, dimensionality, and other basic details. One community member suggests storing this information in the metadata, using document.metadata['llm_used'] = LLM name to store the language model used. They also mention that the number of nodes in the index can be checked using print(len(index.docstore.docs)).

Another community member is having issues with the OpenAI embedding model, where they only see "ada" usage even after upgrading to version 0.10 and using the "text-embedding-3-large" model. They ask if this is normal or if there might be a bug. Other community members suggest checking the Settings.embed_model information to ensure the new model is being used, and to interact with the OpenAI embedding directly to verify the model name.

The community members also discuss whether it's possible to store the embedding model information in the vector_index metadata, rather than just the individual document metadata. They ask if they can find the embedding model used to create the vector index after it has been created, or how to save the model name when creating the index.

AAndre Tättar

Any way to find out how many documents I have in a vector index and some basic information - like size, model used, dimensionality etc?

5 comments

WWhiteFang_Jr

You can define all this in the metadata if you want.

document.metadata['llm_used'] = LLM name

also to check on how many nodes in your index, you can do print(len(index.docstore.docs))

AAndre Tättar

I upgraded to 0.10 and started using "text-mbedding-3-large", tried it out using the new settings thing using:
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-large", embed_batch_size=10)
But under OpenAI usage, I only see ada usage, it's been at least 3 hours now. Does it normally take such long time to update it or might there be a bug with it?

By the way - thanks for the answer 🙏

WWhiteFang_Jr

Check the embed_model info: print(Settings.embed_model)
It should reflect the new model name, You can check this using a py script ( Just to be sure that the model is not being replaced under the hood )

Create a py script , interact with openai embedding directly and then check if they are showing new model nam eor not

AAndre Tättar

Sorry to check in again, that is a really helpful thing actually for later evaluation with multiple indexes. One question however -
"You can define all this in the metadata if you want.

document.metadata['llm_used'] = LLM name"

Can I do that for the
vector_index = load_index_from_storage(storage_context)

So that I can do vector_index.metadata instead of the document.metadata?

AAndre Tättar

To specify, can I find out the embedding model used to create the vector_index after it is created? ("or how to save the LLM_NAME in the vector_index when creating it?")

Add a reply

Find answers from the community

Any way to find out how many documents I