The community member is asking about PDF vectorization and how to handle multiple PDF files in a folder. They want to know if it's possible to directly vectorize PDFs instead of extracting the text first, and how to update the vector store when adding new PDF files to the folder.
In the comments, another community member responds that it's not possible to create embeddings without having the text, and suggests using the filename_as_id option in the simple directory reader to refresh the index when adding new documents. However, they note that this feature currently doesn't work with vector database integrations, only the default vector database.
The original community member thanks the other for the information.
Hello Mates, Can you tell me some thing about pdf verctorization stuff?
Is it possible to directly vectorize the pdf? instead of pulling out the text 1st and then doing the vectiorization.
2nd: If I have a folder /data and it contains a pdf that need to be vectorize, then If I add a 2nd file to the same fodler, do i need to repeat the process for both files or the 2nd fill can be updated to current vector store that is locally saved on the disk.
or is it possible to create multiple vector stores and them use them all together?