Find answers from the community

Updated 6 months ago

Hi all I built a vector from csv about

At a glance

The community member built a vector from a CSV file with about 20 rows, where each row is a Document. When querying the vector, the response only shows 2 rows, which depends on the similarity_top_k variable. The community member tried setting it to 3, and the response returned 3 rows. The community member wants to get the exact row number of the CSV file.

In the comments, another community member suggests adding the total number of rows to the metadata of each Document, like this: Metadata.Document(text=content_from_csv, doc_id=doc_id, metadata={"total_rows":N}). Another community member agrees, stating that this will store the row number in the metadata, which can be used when needed.

The original community member says they will try this solution.

Hi all, I built a vector from csv (about 20 rows) with each row as a Document, when I query such as "How many rows..." it only responds 2 rows, it depends on the similarity_top_k variable , I try with 3, and it returns 3. Can I get the exact row number of the csv file?

documents = []
with open(filename) as file_obj:
reader_obj = csv.reader(file_obj)
heading = next(file_obj)
header = list(heading.strip().split(','))

for row in reader_obj:
record = {}
for i, value in enumerate(row):
record[header[i]] = ' '.join(value.split())

doc_id = row[0]
content_from_csv = json.dumps(record)
documents.append(Document(text=content_from_csv, doc_id=doc_id))
return documents
W
E
n
3 comments
You can add details like this in Metadata.

Plain Text
Document(text=content_from_csv, doc_id=doc_id, metadata={"total_rows":N})
And now it's not more rows, it's turning on to documents, so like @WhiteFang_Jr said, you can store on the metadata each row number, to use it when needed
I will try with this solution, thanks everyone
Add a reply
Sign up and join the conversation on Discord