Find answers from the community

Updated 3 months ago

could we gave here directory of csv

could we gave here directory of csv files?
Attachment
image.png
A
W
L
21 comments
You can check the load_data method to see if it supports folder path or not
Could you please help me to guide how can we load 20+ csv files, or any loader that we can use for that purpose
please help here
you can pass in the loader to use in SimpleDirectoryReader

Plain Text
file_extractor = {".csv": PandasCSVReader()}

documents = SimpleDirectoryReader("./data", file_extractor=file_extractor).load_data()
By using this llma_index does not give right answers, It pick answers from other files, not exactly what asking for
You asked how to load data, not how to query it πŸ˜‰

CSVs are tough to work with. Usually, I find converting to Sqlite and using a sql index works best. (Alternatively, you can also use a pandas index)

Or, you can write a custom loader for your csv, which can also help since you have the chance to structure the data the way you want the LLM to read it
@Logan M how can we do this. could you please share some example.
@Logan M please help here
PandasCSVReader = download_loader("PandasCSVReader")

file_extractor = {".csv": PandasCSVReader()}

property_inventory_docs = SimpleDirectoryReader("data_folder/Property Inventories", file_extractor=file_extractor).load_data()
property_inventory_index = PandasIndex(df=property_inventory_docs)
storage_context = property_inventory_index.storage_context

Persist the index

storage_context.persist("property-inventory-index")


this is myu code which is used for creating index at my machine
@Logan M please help here
i've created many index like more than 10. When ask a question it creates 10 subquestions. So i want to run query to only specific index not all. How can i do this.
query_engine_tools = [
QueryEngineTool(
query_engine=policies_engine,
metadata=ToolMetadata(name='policies', description="Provide information about policies of the company")
),
QueryEngineTool(
query_engine=property_info_engine,
metadata=ToolMetadata(name='property_information', description="Provide information about information of properties")
),
QueryEngineTool(
query_engine=property_19_engine,
metadata=ToolMetadata(name='19 inventory', description="Provide information about Inventory of Properties")
),
QueryEngineTool(
query_engine=property_224_engine,
metadata=ToolMetadata(name='224 inventory', description="Provide information about Inventory of Properties")
),
]


this is my code for QueryEngineTool
what is the address of 318 property from property_information?

if i ask question like mentions "property_information" which is index name, sometimes it create only 1 sub-question but most of the time it creates many sub questions from different.
how can we get rid from this?
You'll need to write better descriptions for each query engine tool

i.e.
QueryEngineTool(..., description="Useful for answer questions about property information at X")
there is an inventory file of a property that i have shared with you @Logan M
So there are sections inside the file related General Information, Home Information, Kitchen Information, Bedrooms and Bathrooms information.
So i want when creating index these informations will be store as separate node, is this possible?
or you can guide me to create index of that file. So we can ask different type of questions like quantities type and total items type etc, could you please help here as im new to llama_index.
@Logan M help here
Sorry man, but I don't have time to code an entire application for you πŸ˜†

If you want to split data by section, you'll probably have to write a parser to split your document into sections, and then make a Document object per section

Something like this I suppose?
Plain Text
# get a dict of section name -> text
sections = split_text_into_sections(text)

documents = []
for section_name, text in sections,items():
  documents.append(Document(text=text, metadata={'title': section_name}))


If you want to ask questions like those you gave though, you might be better off using a SQL or pandas index? Analyitical question don't work well with semantic search usually πŸ™‚
Add a reply
Sign up and join the conversation on Discord