PandasQueryEngine

At a glance

I am trying to replicate the PandasQueryEngine example from the docs (with non-default LLMs and embeddings, if it matters):

Plain Text

import pandas as pd
from llama_index.query_engine import PandasQueryEngine
# Test on some sample data
df = pd.DataFrame(
    {
        "city": ["Toronto", "Tokyo", "Berlin"],
        "population": [2930000, 13960000, 3645000],
    }
)
query_engine = PandasQueryEngine(df=df, verbose=True)
response = query_engine.query(
    "What is the city with the highest population?",
)

but I keep getting an error

Plain Text

> Pandas Instructions:

 df['city'][df['population'].idxmax()]
Traceback (most recent call last):
  File "C:\Users\Stephane\miniconda3\envs\llamaindex\lib\site-packages\llama_index\query_engine\pandas_query_engine.py", line 61, in default_output_processor
    tree = ast.parse(output)
  File "C:\Users\Stephane\miniconda3\envs\llamaindex\lib\ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 2
    df['city'][df['population'].idxmax()]
IndentationError: unexpected indent
> Pandas Output: There was an error running the output as Python code. Error message: unexpected indent (<unknown>, line 2)

Any idea why?
Side-question: is PandasQueryEngine the best way to build a RAG app when the information is to be found in a table?

4 comments

WWhiteFang_Jr

One reason could the choice of llm, it may not have the capabilities of giving code related output correctly.

As I can see that you got the indentation error here.

SStéphane

@WhiteFang_Jr ok. is there a better way to index a csv table? Because I don’t need to perform calculations on the table (and thus output python code). I just need to be able to read the cell corresponding to the row and colum entered by the user.

WWhiteFang_Jr

You could use a CSV reader to read the file and then query on it but queries like What is the city with the highest population? might not work then.

Other way would be to save the record in a sql table and then try with Text2SQL approach: https://docs.llamaindex.ai/en/stable/examples/index_structs/struct_indices/SQLIndexDemo.html

LLogan M

Seems like the string has some unexpected spaces at the start -- I'm surprised its not calling strip on the output before running it 😅

Add a reply

Find answers from the community

PandasQueryEngine