Hi all I built a vector from csv about

At a glance

The community member built a vector from a CSV file with about 20 rows, where each row is a Document. When querying the vector, the response only shows 2 rows, which depends on the similarity_top_k variable. The community member tried setting it to 3, and the response returned 3 rows. The community member wants to get the exact row number of the CSV file.

In the comments, another community member suggests adding the total number of rows to the metadata of each Document, like this: Metadata.Document(text=content_from_csv, doc_id=doc_id, metadata={"total_rows":N}). Another community member agrees, stating that this will store the row number in the metadata, which can be used when needed.

The original community member says they will try this solution.

nngle

Hi all, I built a vector from csv (about 20 rows) with each row as a Document, when I query such as "How many rows..." it only responds 2 rows, it depends on the similarity_top_k variable , I try with 3, and it returns 3. Can I get the exact row number of the csv file?

documents = []
with open(filename) as file_obj:
reader_obj = csv.reader(file_obj)
heading = next(file_obj)
header = list(heading.strip().split(','))

for row in reader_obj:
record = {}
for i, value in enumerate(row):
record[header[i]] = ' '.join(value.split())

doc_id = row[0]
content_from_csv = json.dumps(record)
documents.append(Document(text=content_from_csv, doc_id=doc_id))
return documents

3 comments

WWhiteFang_Jr

You can add details like this in Metadata.

Plain Text

Document(text=content_from_csv, doc_id=doc_id, metadata={"total_rows":N})

EEmanuel Ferreira

And now it's not more rows, it's turning on to documents, so like @WhiteFang_Jr said, you can store on the metadata each row number, to use it when needed

nngle

I will try with this solution, thanks everyone

Add a reply

Find answers from the community

Hi all I built a vector from csv about