Find answers from the community

Updated 3 months ago

Hello

Hello,
I want to create a search based chatbot based on users files it can be pdf, txt, rft, etc some question around that are
  1. In an environment like AWS should a vector db be used like(qdrant) or we just create index for each user as we have a multi tenant system?
  2. Can we load index in or load from some thing like AWS S3, EFS etc
  3. If we use Qdrant can we specify metadata like userid so we dont create too many index(collections) in vector db but rather define filter when querying so that only document specific to user are extracted
  4. If we are querying pdf, txt how can we get source page no, line no, references etc
1
L
h
R
7 comments
  1. Depends on how much data will be in each index. At a certain point, GPTSimpleVectorIndex will slow down because everything (embedding vectors AND douments) is loaded into memory
  1. You might be interested in the save_to_string and load_from_string functions, to make the index data easily uploadable on S3 or others
  1. You are actually the second person to ask about this today! @jerryjliu0 I think this isn't supported for qdrant yet right? Very open to a PR if you want to take a stab at it @hammad
  1. You can set the "extra_info" dict of each document and/or node object before inserting into the index.
document.extra_info = {"file_name": "my_file.txt"}, or with nodes, node.node_info = {"file_name": "my_file.txt"}
Then you can check response.source_nodes after getting a query response to see that info dict
can we specify while reading directory? And as for PDF should we parse manually and specify page no etc
Yea you can do this in the directory reader!

Plain Text
filename_fn = lambda filename: {'file_name': filename}
documents = SimpleDirectoryReader('data/', file_metadata=filename_fn)
we got a long way just using single file + index. Works really nicely. VectorDB seems only necessary if you need to support lots of files and lots of users
@Logan M can we some set extra_info via SimpleDirectoryReader treat each file differently?
for pinecone we do, don't think we have one for qdrant yet ๐Ÿ˜ฎ
super open to contributions here
Add a reply
Sign up and join the conversation on Discord