hi, I'm trying to build an AI financial

aandreas

hi, I'm trying to build an AI financial advisor. I have a bunch of mostly numerical / quantitative data. I'm trying to get it to answer the question "How much has my savings account grown by in the last 3 months"

Currently, when I ask this question, it passes a whole lot of context (it passes some daily info, some weekly info and all monthly info when it should really only be passing monthly info). Even with the unnecessary/wasteful context, it looks like OpenAI is returning the context correctly

Plain Text

The balance of your current account for the last 3 months was as follows:\\n\\n- 2023-08-01: $8741.37\\n- 2023-09-01: $9732.43\\n- 2023-10-01: $10569.80

That context is then injected into the next prompt correctly as well, but the output is wrong

Plain Text

  "content": "Your savings account has grown by €828.43 over the last 3 months, from €8741.37 to €10569.80. That\'s a growth of approximately 9.48%. Well done!"

this is the data I'm working with

Plain Text

{
    "account_number": "1234567890",
    "type": "savings",
    "frequency": "daily",
    "data": [
        {
            "date": "2023-01-01",
            "amount": 411.97,
            "changeFromYesterday": 11.97
        },
        {
            "date": "2023-01-02",
            "amount": 425.99,
            "changeFromYesterday": 14.02
        },
        {
            "date": "2023-01-03",
            "amount": 435.67,
            "changeFromYesterday": 9.68
        },

and

Plain Text

{
    "account_number": "1234567890",
    "type": "savings",
    "frequency": "monthly",
    "data": [
        {
            "date": "2023-01-01",
            "amount": 411.97,
            "changeFromLastMonth": 0
        },
        {
            "date": "2023-02-01",
            "amount": 753.52,
            "changeFromLastMonth": 341.55
        },
        {
            "date": "2023-03-01",
            "amount": 1028.29,
            "changeFromLastMonth": 274.77
        },

..and a similar file for weekly

So basically 3 json files, one containing the account balance at a monthly interval, the other at a weekly interval, and the other at a daily interval.

I'm just a bit confused as to how to approach structuring this. I currently am using a Vector Store index over the JSON documents that's being queried by a QueryEngineTool that's being controlled by a GPT 4 agent.

1) Is there any better way to structure the quantitative data other than JSON?
2) Am I going overkill by giving it monthly and weekly data as well? Should I just give it daily and let it handle everything?
3) Is Agent+QueryEngine the right approach here?
4) Is a VectorStoreIndex the best approach for timeseries type data?
5) Why is it passing so much unnecessary context?

13 comments

llucastonon

This is tabular data right?
I would probably setup a database and use a SQL agent.
Maybe using function calls + pandas.
Embedding table data isnt really a good thing to do, according to my experience so far

aandreas

Yeah it is. I had similar thoughts and am now moving it to SQL. Thank you

aandreas

What are you trying to do with table data?

llucastonon

I've done some stuff very similar ot the SEC Filing demo, some stuff for PDF Document with data and also SQL agents for relational database in order to retrieve personal credit information

llucastonon

Keep us up to date please :D, if it works or it doenst, I would love to learn from your experience as well

aandreas

Tbh I think I'm getting rid of the monthly and weekly tables all together. Just going to try and work with a single table that tracks daily balances. If it wants weekly or monthly it can work from there

aandreas

@lucastonon will do!

aandreas

Thank you for your help

aandreas

@lucastonon had some successes, going to sleep now but I think the limiting factor I just ran into is that SQLite3 doesn't support some complex queries. For some reason, it's calling this an invalid SQL query

SELECT AVG(checkingAccountBalance - LAG(checkingAccountBalance, 1) OVER (ORDER BY date)) AS average_daily_change FROM daily_balances

aandreas

But overall what I built works, I have a SQL database and an agent that can talk to it

aandreas

Another small quirk I noticed was that it once added some extra text after the query, which messed things up

OpenAI response that caused LlamaIndex to throw:

Plain Text

"The question seems to be a SQL query itself, but it's not correct according to the schema provided. The correct query to get the average balance of the checking account for the last day would be:

SQLQuery: SELECT AVG(checkingAccountBalance) FROM daily_balances WHERE date >= DATE(\'now\',\'-1 day\') ORDER BY date ASC

Please note that the DATE_SUB and CURDATE functions are not supported in SQLite. Instead, we use the DATE function with 'now' and '-1 day' as arguments to get the date of one day ago. Also, there is no account_type column in the daily_balances table, so we directly use the checkingAccountBalance column.

aandreas

This Please note thing after was included in the SQLQuery, which threw an error. Temp fix was to add this in the description of the QueryEngine

Plain Text

"It is extremely important that nothing is put after the SQL query is outputted"

llucastonon

Humm. I think it is just natural that some SQL errors can happen when generating it with GPT3.5/4.

https://github.com/run-llama/llama_index/blob/main/llama_index/prompts/default_prompts.py#L188

I would maybe try to improve also the prompt and change some stuff - you can create some custom llama_index classes if you'd like.

passing examples that are more accurate to your usecase (if it is not something quite general) could potentially help to do some more sophisticated queries 🙂

Add a reply

Find answers from the community

hi, I'm trying to build an AI financial