The community member built a vector from a CSV file with about 20 rows, where each row is a Document. When querying the vector, the response only shows 2 rows, which depends on the similarity_top_k variable. The community member tried setting it to 3, and the response returned 3 rows. The community member wants to get the exact row number of the CSV file.
In the comments, another community member suggests adding the total number of rows to the metadata of each Document, like this: Metadata.Document(text=content_from_csv, doc_id=doc_id, metadata={"total_rows":N}). Another community member agrees, stating that this will store the row number in the metadata, which can be used when needed.
The original community member says they will try this solution.
Hi all, I built a vector from csv (about 20 rows) with each row as a Document, when I query such as "How many rows..." it only responds 2 rows, it depends on the similarity_top_k variable , I try with 3, and it returns 3. Can I get the exact row number of the csv file?
documents = [] with open(filename) as file_obj: reader_obj = csv.reader(file_obj) heading = next(file_obj) header = list(heading.strip().split(','))
for row in reader_obj: record = {} for i, value in enumerate(row): record[header[i]] = ' '.join(value.split())
And now it's not more rows, it's turning on to documents, so like @WhiteFang_Jr said, you can store on the metadata each row number, to use it when needed