Find answers from the community

Updated 4 months ago

Discover LlamaIndex: Joint Text to SQL a...

At a glance

The community member is following a "Text to SQL" video tutorial, but is encountering issues with the code. Specifically, they are getting an error related to the "extra_info" field, which they believe has been deprecated. The community member is seeking help on what to use instead of "extra_info".

In the comments, another community member suggests using "node.metadata" instead of "extra_info". The community member then runs into other issues, such as errors related to the "SQLAutoVectorQueryEngine" and "BaseSQLTableQueryEngine" classes. They try to work around these issues, but continue to face challenges.

Eventually, the community members are directed to use the "SQLJoinQueryEngine" example from the llama_index documentation, which seems to work better than the original code. They also discuss the potential benefits of using GPT-4 instead of GPT-3.5 for this task.

Useful resources
Hi guys, I'm following the Text to SQL video (https://www.youtube.com/watch?v=ZIvcVJGtCrY) and I think it may be out of date.

At this part of the code:

Plain Text
# Insert documents into vector index
# Each document has metadata of the city attached
for city, wiki_doc in zip(cities, wiki_docs):
    nodes = node_parser.get_nodes_from_documents([wiki_doc])
    # add metadata to each node
    for node in nodes:
        node.extra_info = {"title": city}
    vector_index.insert_nodes(nodes)

I get this error:

Plain Text
python3 main.py
dict_keys(['city_stats'])
[('Toronto', 2930000, 'Canada'), ('Tokyo', 13960000, 'Japan'), ('Berlin', 3645000, 'Germany')]
Traceback (most recent call last):
  File "/home/bi-ai/ai/txt-to-sql/main.py", line 107, in <module>
    node.extra_info = {"title": city}
  File "pydantic/main.py", line 357, in pydantic.main.BaseModel.__setattr__
ValueError: "TextNode" object has no field "extra_info"


I think extra_info has been deprecated because when I hover over it, VSCode says "TO DO: DEPRECATED"

but I'm having trouble finding what it was replaced with. What should I use instead?

Sorry for the dumb question, I'm really new to the whole python ecosystem.

Please halp. Many thanks.
B
L
26 comments
Full code in my main.py so far:
node.metadata is what you are looking for πŸ‘
bruh your turn around time is absolutely ruthless
thank you much GRC!! (Giga RAG Chad)
Sorry to bug you again about this @Logan M but perhaps something broke in the newer releases?

I copied the code exactly as is from the video--check out master branch below:

https://github.com/biphan380/txt-to-sql/tree/master

But I run into this error:

Plain Text
ython3 main.py 
dict_keys(['city_stats'])
[('Toronto', 2930000, 'Canada'), ('Tokyo', 13960000, 'Japan'), ('Berlin', 3645000, 'Germany')]
Upserted vectors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 17/17 [00:04<00:00,  4.15it/s]
Upserted vectors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 18/18 [00:02<00:00,  7.85it/s]
Upserted vectors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:02<00:00,  4.62it/s]
<class 'llama_index.indices.struct_store.sql_query.NLStructStoreQueryEngine'>
Traceback (most recent call last):
  File "/home/bi-ai/ai/txt-to-sql/main.py", line 157, in <module>
    query_engine = SQLAutoVectorQueryEngine(
  File "/home/bi-ai/.local/lib/python3.10/site-packages/llama_index/query_engine/sql_vector_query_engine.py", line 94, in __init__
    raise ValueError(
ValueError: sql_query_tool.query_engine must be an instance of BaseSQLTableQueryEngine or NLSQLTableQueryEngine
so I tried to use the BaseSQLTableQueryEngine -- check out the playground branch here:

https://github.com/biphan380/txt-to-sql/commit/5a82d4b57438fae0f84749273752f0f44836ffc6

but I run into this:

Plain Text
python3 main.py 
dict_keys(['city_stats'])
[('Toronto', 2930000, 'Canada'), ('Tokyo', 13960000, 'Japan'), ('Berlin', 3645000, 'Germany')]
Traceback (most recent call last):
  File "/home/bi-ai/ai/txt-to-sql/main.py", line 99, in <module>
    sql_query_engine = BaseSQLTableQueryEngine([], sql_database, table_name="city_stats")
TypeError: Can't instantiate abstract class BaseSQLTableQueryEngine with abstract method _get_table_context
I understand the OO concepts of abstract classes, but am new to Python so finding a work around is taking a hot minute πŸ˜‚
Any help is blessed ❀️
Yea that class isn't meant to be used directly πŸ˜…

Use the class that implements the abstract methods
https://github.com/jerryjliu/llama_index/blob/518783f712e97a26837fd1d212ce67f0c30f6c09/llama_index/indices/struct_store/sql_query.py#L319
Should work for you
ahhh the second option the error gives haha
ok gimme a hot min
Ser I made a bunch of commits to the playground branch and finally got results!

As you can see I sometimes get really good results, but I sometimes get "none is not an allowed value"

Can you give me some guidance on my how I can begin to make sense of why it's behaving this way?
I've lost some context on this maybe -- why is vector store query spec being used? Is this a sql-join query engine?
I'll be honest, I was just doing copy pasta from the txt-to-sql tutorial video, and the copy pasta was not working lol
so I tried to change some things
but yes, I believe it's supposed to be a sql-join query engine?
Maybe just follow the example notebook for this instead πŸ™‚

Just ran it and it seems to work fine ok-ish, would probably use gpt-4 for this though, gpt-3.5 sucks at this it seems haha
https://gpt-index.readthedocs.io/en/stable/examples/query_engine/SQLJoinQueryEngine.html#define-sqljoinqueryengine

Just noticed the notebook sets up a gpt-4 service context and doesn't use it :PSadge:
Plain Text
from llama_index import set_global_service_context
set_global_service_context(service_context)
ah yea, using gpt-4 is a million times better lol
oh RIP, the video didn't have a link to the code so I just manually copy pasta πŸ˜‚
glad I did it though , learned a lot
I'll check out this guide also!
thank you for carrying this chubby panda ser ❀️
Add a reply
Sign up and join the conversation on Discord