Will take a look, thanks!
@jerryjliu0 if its just text to sql and not ingestion, can probably just use the langchain chain right? didnt you say it was just doing that under the hood?
def possible to use the langchain one. honestly either one works if you're just looking for text-to-sql (we also reuse the sql alchemy wrapper from langchain) - the index leaves open the possibility that you'd want to add more unstructured docs later
thanks @hwchase17 I'll try that out too. Was hoping to use gptindex to help reduce prompt size since some of my db schemas are quite large
ah got it, i didnt know gptindex already supported that! we've got some ideas for using chains to reduce that pain, but probably wont be in for a few days
tbf i'm not sure if gptindex supports that but was planning on testing out to see if the indexes reduce prompt size
@hwchase17 are you a contributor to langchain?
@scuba.steve.0 oh interesting, you're looking to create a more condensed representation of the table schema?
gpt index doesn't explicitly support that atm, but i can help you brainstorm a few solutions
Currently we just use the table schema directly, but I can add a wrapper function for you to use a compressed representation! And you can then use gpt index or langchain to synthesize the compressed representation
@jerryjliu0 ooo amazing! How is the table schema currently represented in the prompt when using GPTSQLStructStoreIndex
?
Actually, this is interesting. GPT index data structures themselves can be used over your table schema if your schema is very big. I can think about adding a "composability" feature to the SQL index where you can 1) define a gpt index data structure (e.g. the list index) over your table schema, 2) use that data structure in the SQL index to extract fields from unstructured data
yea I like that a lot. Another question (sorry to bombard you here, just very excited about you've built), are there plans to support passing multiple tables into a GPTSQLStructStoreIndex? For example I have multiple tables in my db schema and want to run text to sql that can potentially write joins with several tables.
maybe overkill for gptindex, looking at langchain docs seems like I can do something like that in langchain. Would be great to combine that feature with gptindex's indexing feature
@scuba.steve.0 yes! Definitely plans here as well. I want to expand GPT Index's support for structured data, and doing joins + being able to infer data for multiple tables are all potential options.
Would love your feedback in the process if you use this feature more
also to help me better understand, what is being passed in as
schema
in the prompt when using
GPTSQLStructStoreIndex
? Referencing this line:
https://github.com/jerryjliu/gpt_index/blob/e685bbf6d04aed0b1f17c07bf0bc9a6a19b9973e/gpt_index/prompts/default_prompts.py#L176Is it the actual metadata table representation eg something like:
Table(
table_name,
metadata_obj,
Column("city_name", String(16), primary_key=True),
Column("population", Integer),
Column("country", String(16), nullable=False)
?
yep! it's a formatted string π
let me know if you have feedback in this area
@jerryjliu0 if gpt index inherits from the sqldb wrapper under the hood (whcih it sounds like it does? although maybe im misunderstanding?) is it possible to start using these chians in gptindex for free?
@hwchase17 re: the first point, your sqlalchemy wrapper already has an include_tables
param right? that would probably be the initial place to start with, by manually specifying the set of tables. but being able to determine the set of tables is also a cool additional step