Find answers from the community

Updated 3 months ago

LlamaIndex + Pandas

Running into a bit of an annoying issue with LlamaIndex + Pandas

Sometimes the query that GPT / LlamaIndex comes up with queries (e.g.
Plain Text
df.loc[(slice('07/2023','10/2023'), 'restaurants'), 'breakdown']
) that return data from Pandas like this

Plain Text
07/2023  restaurants    {'Evroulla': 179.0, 'Foukou tou Yiakoumi': 60....
08/2023  restaurants    {'Evroulla': 181.0, 'Foukou tou Yiakoumi': 58....
09/2023  restaurants    {'Evroulla': 176.0, 'Foukou tou Yiakoumi': 59....
10/2023  restaurants    {'Evroulla': 48.0, 'Foukou tou Yiakoumi': 15.0...


but the data it returns is concatenated as shown above.


This is what the whole df looks like

Plain Text
                                                             breakdown  total
month   category                                                             
01/2023 Total        {'transport': 307, 'groceries': 499, 'restaura...   7354
        groceries    {'Alphamega Hypermarket': 368.0, 'Athienitis':...    499
        home                   {'Rent': 3987.0, 'Electricity': 1824.0}   5811
        restaurants  {'Evroulla': 178.0, 'Foukou tou Yiakoumi': 59....    447
        shopping         {'ZARA': 191.0, 'Amazon': 52.0, 'Ebay': 47.0}    290
        transport    {'ESSO': 186.0, 'Petrolina': 90.0, 'Panikos Ca...    307
02/2023 Total        {'transport': 307, 'groceries': 522, 'restaura...   7290
        groceries    {'Alphamega Hypermarket': 405.0, 'Athienitis':...    522
        home                   {'Rent': 3891.0, 'Electricity': 1813.0}   5704
        restaurants  {'Evroulla': 181.0, 'Foukou tou Yiakoumi': 61....    444
        shopping         {'ZARA': 207.0, 'Amazon': 52.0, 'Ebay': 54.0}    313
        transport    {'ESSO': 185.0, 'Petrolina': 90.0, 'Panikos Ca...    307


what to do, lol?
r
a
9 comments
@andreas are you using PandasQueryEngine?

Can you share the code, data and query in the format you are trying to out?
@ravitheja thanks for taking a look at this


System prompt (you can prob skip this)
Plain Text
SYSTEM_PROMPT = "You are AskGeorge, an expert personal financial helper with context about my checking and\
                saving account balances. The date today is 2023-10-08 (8th of October, 2023). If I ask you how\
                much money I have all together, add the latest entries from my current account to my savings account. You are almost always talking\
                about money or percentage changes, the currency is euros, so when you format numbers make \
                sure to add the currency sign and if it's a decimal number make it 2 decimal points. Use chain of thought reasoning to make sure you're answering what the user is actually asking. \
                Be conversational and informative in your responses. For example, when asked 'How much has my savings account \
                grown in the last 3 months', you will respond along the lines of 'Your savings account grew \
                X euros, from Y to Z, which is a A% increase. Well done!'"


building the agent + engines (yes, I'm using PandasQueryEngine) Important parts here are spending_engine and spending_df, that's what is having the issues rn
Plain Text
# build agent
llm = OpenAI(model="gpt-4", temperature=0)
service_context = ServiceContext.from_defaults(llm=llm)


# build balance engine

balance_df = getBalancePandas()

balance_engine = PandasQueryEngine(
    balance_df, 
    service_context=service_context,
    verbose=True,
)

# build spending engine

spending_df = getSpendingPandas()

spending_engine = PandasQueryEngine(
    spending_df,
    service_context=service_context,
    verbose=True
)

query_engine_tools = [
    QueryEngineTool(
        query_engine=balance_engine,
        metadata=ToolMetadata(
            name="balance_pandas_df_tool",
            description="Useful for translating a natural language query into a Pandas query over a dataframe containing the end of day balance of my checkings and savings account for a range of dates. It is extremely important that nothing is put after the Pandas query is outputted. df.columns: Index(['id', 'checkingAccountBalance', 'savingsAccountBalance'], dtype='object') and this is df.index: DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', ...., '2023-10-07', '2023-10-08'], dtype='datetime64[ns]', name='date', length=281, freq=None)"
        ),
    ),
    QueryEngineTool(
        query_engine=spending_engine,
        metadata=ToolMetadata(
            name="spending_pandas_df_tool",
            description="Useful for translating a natural language query into a Pandas query over a dataframe containing my monthly spending breakdown. It is extremely important that nothing is put after the Pandas query is outputted. The two columns are 'breakdown' and 'total'. df['breakdown']['01/2023']['transport'] will give you the breakdown of spending in the transport category for January 2023. df['total']['02/2023]['Total'] will give you the total spending for that month. The pandas dataframe is MultiIndex. If you're querying the Pandas df and extract data using loc, add a .apply(pd.Series) after your query to make things easier to analyze and so that data isn't concatenated"
        ),
    ),
]


Agent
Plain Text
agent = OpenAIAgent.from_tools(
        query_engine_tools, 
        llm=llm, 
        verbose=True,
        system_prompt=SYSTEM_PROMPT
)

def DataAgent(query_text):
    response = agent.chat(query_text)
    return response
spending_engine is what is causing the trouble here
I'm going to copy the spending_df rn
spending_df is here. It's a bit complex (MultiIndex) which might be mucking things up
@andreas can you share data and code for getBalancePandas, getSpendingPandas functions. Can try out and see where it is going wrong.
they just return two df's, nothing complicated. The problem is that the query that the agent generates outputs a concatenated version of the DF, because the DF is perhaps too big
Thank you again for offering to take a look
Add a reply
Sign up and join the conversation on Discord