could we gave here directory of csv

At a glance

AAhsan Mirza

could we gave here directory of csv files?

Attachment

21 comments

AAhsan Mirza

@Logan M

WWhiteFang_Jr

You can check the load_data method to see if it supports folder path or not

AAhsan Mirza

Could you please help me to guide how can we load 20+ csv files, or any loader that we can use for that purpose

AAhsan Mirza

please help here

AAhsan Mirza

@Logan M

LLogan M

you can pass in the loader to use in SimpleDirectoryReader

Plain Text

file_extractor = {".csv": PandasCSVReader()}

documents = SimpleDirectoryReader("./data", file_extractor=file_extractor).load_data()

AAhsan Mirza

By using this llma_index does not give right answers, It pick answers from other files, not exactly what asking for

LLogan M

You asked how to load data, not how to query it 😉

CSVs are tough to work with. Usually, I find converting to Sqlite and using a sql index works best. (Alternatively, you can also use a pandas index)

Or, you can write a custom loader for your csv, which can also help since you have the chance to structure the data the way you want the LLM to read it

AAhsan Mirza

@Logan M how can we do this. could you please share some example.

AAhsan Mirza

@Logan M please help here

AAhsan Mirza

PandasCSVReader = download_loader("PandasCSVReader")

file_extractor = {".csv": PandasCSVReader()}

property_inventory_docs = SimpleDirectoryReader("data_folder/Property Inventories", file_extractor=file_extractor).load_data()
property_inventory_index = PandasIndex(df=property_inventory_docs)
storage_context = property_inventory_index.storage_context

Persist the index

storage_context.persist("property-inventory-index")

this is myu code which is used for creating index at my machine

AAhsan Mirza

@Logan M

AAhsan Mirza

@Logan M please help here

AAhsan Mirza

i've created many index like more than 10. When ask a question it creates 10 subquestions. So i want to run query to only specific index not all. How can i do this.

AAhsan Mirza

query_engine_tools = [
QueryEngineTool(
query_engine=policies_engine,
metadata=ToolMetadata(name='policies', description="Provide information about policies of the company")
),
QueryEngineTool(
query_engine=property_info_engine,
metadata=ToolMetadata(name='property_information', description="Provide information about information of properties")
),
QueryEngineTool(
query_engine=property_19_engine,
metadata=ToolMetadata(name='19 inventory', description="Provide information about Inventory of Properties")
),
QueryEngineTool(
query_engine=property_224_engine,
metadata=ToolMetadata(name='224 inventory', description="Provide information about Inventory of Properties")
),
]

this is my code for QueryEngineTool

AAhsan Mirza

what is the address of 318 property from property_information?

if i ask question like mentions "property_information" which is index name, sometimes it create only 1 sub-question but most of the time it creates many sub questions from different.
how can we get rid from this?

LLogan M

You'll need to write better descriptions for each query engine tool

i.e.
QueryEngineTool(..., description="Useful for answer questions about property information at X")

AAhsan Mirza

there is an inventory file of a property that i have shared with you @Logan M
So there are sections inside the file related General Information, Home Information, Kitchen Information, Bedrooms and Bathrooms information.
So i want when creating index these informations will be store as separate node, is this possible?
or you can guide me to create index of that file. So we can ask different type of questions like quantities type and total items type etc, could you please help here as im new to llama_index.

AAhsan Mirza

@Logan M help here

AAhsan Mirza

@Logan M

LLogan M

Sorry man, but I don't have time to code an entire application for you 😆

If you want to split data by section, you'll probably have to write a parser to split your document into sections, and then make a Document object per section

Something like this I suppose?

Plain Text

# get a dict of section name -> text
sections = split_text_into_sections(text)

documents = []
for section_name, text in sections,items():
  documents.append(Document(text=text, metadata={'title': section_name}))

If you want to ask questions like those you gave though, you might be better off using a SQL or pandas index? Analyitical question don't work well with semantic search usually 🙂

Add a reply

Find answers from the community

could we gave here directory of csv

Persist the index