I thiiiiiink it's possible, after looking at the source code
Since you have the vector_store setup, you just need to pass it into a storage context and pass that in
from llama_index import StorageContext
vector_store = ...
storage_context = StorageContext.from_defaults(vector_store=vector_store)
obj_index = ObjectIndex.from_objects(self.table_schema_objs, self.table_node_mapping, VectorStoreIndex, storage_context=storage_context)
Hi @Logan M, i tried that, but everytime when i run db_query_engine.query(question), it's generating the embeddings again, instead of just searching from the table name i provided in the PGVecorStore(). Here is my full code: self.llm = ChatOpenAI(temperature=temperature, model=model_name)
self.engine = create_engine(db_uri)
self.sql_database = SQLDatabase(self.engine)
self.metadata_obj = MetaData()
self.metadata_obj.reflect(bind=self.engine)
all_table_names = list(self.metadata_obj.tables.keys())
self.table_node_mapping = SQLTableNodeMapping(self.sql_database)
self.table_schema_objs = []
for table_name in all_table_names:
self.table_schema_objs.append(SQLTableSchema(table_name=table_name))
logger.info("sql_database created: %s", self.sql_database)
self.collection_table_name = f"user_{user_id}_dbconn_{connection_id}"
logger.info("collection_table_name created: %s", self.collection_table_name)
self.vector_store = PGVectorStore.from_params(
database=DATABASE_NAME,
host=DATABASE_HOST,
password=DATABASE_PASSWORD,
port=DATABASE_PORT,
user=DATABASE_USER,
table_name=self.collection_table_name)
self.storage_context = StorageContext.from_defaults(vector_store=self.vector_store)
# self.vector_store_index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store)
self.obj_index = ObjectIndex.from_objects(self.table_schema_objs, self.table_node_mapping, VectorStoreIndex, storage_context=self.storage_context)
self.db_query_engine = SQLTableRetrieverQueryEngine(
self.sql_database,
self.obj_index.as_retriever(similarity_top_k=3),
> )
My objective here is just retrieve the top 3 results from the already created vector collection table in my PGVectorStore for the specifc query. One thing i'm confused here is, Do i have to create and pass the variables self.table_schema_objs, self.table_node_mapping to the ObjectIndex.from_objects() ? . It's taking alot of time for a database with 300+ tables. Can you please provide any guidance here ?
I think they are currently needed to create/load the index yes. But normally in an app you would only do this once at startup and keep it all in memory
ohhh well i see one issue actually
Once you initially create the object index once, you'll need to avoid using
from_objects()
to create the object index
Instead, you can instantiate it directly
index = VectorStoreIndex.from_vector_store(vector_store)
object_index = ObjectIndex(index, table_node_mapping)
Hi @Logan M I did make the changes exactly as you mentioned above to query from the exisitng pgvector collection table. But now I'm getting an error. Here is my code: DATABASE_NAME = os.environ.get('DATABASE_NAME')
DATABASE_USER = os.environ.get('DATABASE_USER')
DATABASE_HOST = os.environ.get('DATABASE_HOST')
DATABASE_PORT = os.environ.get('DATABASE_PORT')
DATABASE_PASSWORD = os.environ.get('DATABASE_PASSWORD')
DATABASE_URL2 = 'mysql+pymysql://magento-ai-user2:23RolsAj3c8R2@68.183.84.131/magento_ai_db1'
engine = create_engine(DATABASE_URL)
sql_database = SQLDatabase(engine)
user_id = 5
connection_id = 59
collection_table_name = f"user_{user_id}_dbconn_{connection_id}"
table_node_mapping1 = SQLTableNodeMapping(sql_database)
vector_store = PGVectorStore.from_params(
database=DATABASE_NAME,
host=DATABASE_HOST,
password=DATABASE_PASSWORD,
port=DATABASE_PORT,
user=DATABASE_USER,
table_name=collection_table_name)
vector_store_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
object_index = ObjectIndex(vector_store_index, table_node_mapping1)
# query_engine = index.as_query_engine()
query_engine = SQLTableRetrieverQueryEngine(
sql_database,
object_index.as_retriever(similarity_top_k=3),
)
response = query_engine.query("Give me all the tables in the database")
And this is the error I have with node.info["name"] : KeyError Traceback (most recent call last)
Cell In[322], line 30
23 # query_engine = index.as_query_engine()
25 query_engine = SQLTableRetrieverQueryEngine(
26 sql_database,
27 object_index.as_retriever(similarity_top_k=3),
28 )
---> 30 response = query_engine.query("Give me all the tables in the database")
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
21 if isinstance(str_or_query_bundle, str):
22 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
24 return response
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py:78, in llm_token_counter..wrap..wrapped_llm_predict(_self, *args, kwargs) 76 def wrapped_llm_predict(_self: Any, *args: Any, kwargs: Any) -> Any:
77 with wrapper_logic(_self):
---> 78 f_return_val = f(_self, *args, **kwargs)
80 return f_return_val
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/struct_store/sql_query.py:260, in BaseSQLTableQueryEngine._query(self, query_bundle)
257 @llm_token_counter("query")
258 def _query(self, query_bundle: QueryBundle) -> Response:
...
61 return SQLTableSchema(
---> 62 table_name=node.node_info["name"], context_str=node.node_info["context"]
63 )
KeyError: 'name'
@Logan M Can you please help me here ?
So, the old version of llama-index did not store the node-info stuff in the vector db, which is why it's failing with a key error on node info
Looking at the code though, the latest version of llama_index should remedy this though, if you upgrade and create the db again π
Appologies for the trouble here, but we are in uncharted territory haha
Okay, Let me upgrade the Llama_index and try again
@Logan M I tried with latest version v0.6.35. Now i'm facing this error initiating the pgvector store with the table: InvalidRequestError Traceback (most recent call last)
Cell In[7], line 25
22 for table_name in all_table_names:
23 table_schema_objs.append(SQLTableSchema(table_name=table_name))
---> 25 vector_store = PGVectorStore.from_params(
26 database=DATABASE_NAME,
27 host=DATABASE_HOST,
28 password=DATABASE_PASSWORD,
29 port=DATABASE_PORT,
30 user=DATABASE_USER,
31 table_name=collection_table_name)
33 storage_context = StorageContext.from_defaults(vector_store=vector_store)
35 obj_index = ObjectIndex.from_objects(
36 table_schema_objs,
37 table_node_mapping,
38 storage_context=storage_context
39 )
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/vector_stores/postgres.py:80, in PGVectorStore.from_params(cls, host, port, database, user, password, table_name)
78 """Return connection string from database parameters."""
79 conn_str = f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}"
---> 80 return cls(connection_string=conn_str, table_name=table_name)
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/vector_stores/postgres.py:60, in PGVectorStore.__init__(self, connection_string, table_name)
...
1512 _undefer_column_name(
1513 k, self.column_copies.get(value, value) # type: ignore
1514 )
InvalidRequestError: Attribute name 'metadata' is reserved when using the Declarative API.
ha, right, just merged/fixed that a few hours ago π Didn't make it to a pypi release yet
do pip install --upgrade git+https://github.com/jerryjliu/llama_index
Logan, Even after pip3 upgrading from the above git repo, I dont see any verison increase. It's still "0.6.35". Is that supposted to be like that ? And i'm still facing the error: InvalidRequestError: Attribute name 'metadata' is reserved when using the Declarative API
If you can get it, we can monkey patch the file manually lol
Monkey patch guide:
pip show llama-index
will show where llama-index is installed
(llama-index) loganm@gamingpc:~/llama_index_proper/llama_index/examples/docstore$ pip show llama-index
Name: llama-index
Version: 0.6.35
Summary: Interface between LLMs and your data
Home-page: https://github.com/jerryjliu/llama_index
Author: Jerry Liu
Author-email:
License: MIT
Location: /home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages
Requires: beautifulsoup4, dataclasses-json, fsspec, langchain, numpy, openai, pandas, sqlalchemy, tenacity, tiktoken, typing-extensions, typing-inspect, urllib3
Required-by: llama-agi, llama-hub
From the location there, we can edit the source file directly. For me, the file path is
/home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/llama_index/vector_stores/postgres.py
In that file, you'll want to change:
line 25 to say
metadata_ = Column(JSON)
line 115 to say
metadata_=node_to_metadata_dict(
line 146 to say
node = metadata_dict_to_node(item.metadata_)
line 153 to say
metadata=item.metadata_,
I just realized the line 115 was not fixed in my PR, oof
Okay, Should i wait until you fix issue with line115. Or can i go ahead to follow the steps you mentioend above ?
I just merged a fix, maybe try the pip install upgrade from git again π
if that doesn't work, then yea at least inspect the file to see if it has those changes... maybe there's something else missing if it still doesn't work π€
Okay, Let me reinstall and see
Hi @Logan M I tried it. The metadata issue is resolved, But now we have a different issue with creating with ObjectIndex creation. Here is the error I'm running into: TypeError Traceback (most recent call last)
Cell In[13], line 35
25 vector_store = PGVectorStore.from_params(
26 database=DATABASE_NAME,
27 host=DATABASE_HOST,
(...)
30 user=DATABASE_USER,
31 table_name=collection_table_name)
33 storage_context = StorageContext.from_defaults(vector_store=vector_store)
---> 35 obj_index = ObjectIndex.from_objects(
36 table_schema_objs,
37 table_node_mapping,
38 storage_context=storage_context
39 )
41 query_engine = SQLTableRetrieverQueryEngine(
42 sql_database,
43 obj_index.as_retriever(similarity_top_k=1),
44 )
46 # # llm = OpenAI(temperature=0)
47 # db_chain = SQLDatabaseChain(llm=llm, database=sql_database, return_intermediate_steps=True, verbose=True, top_k=200)
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/objects/base.py:51, in ObjectIndex.from_objects(cls, objects, object_mapping, index_cls, **index_kwargs)
49 object_mapping = SimpleObjectNodeMapping.from_objects(objects)
50 nodes = object_mapping.to_nodes(objects)
...
2135 "%r is an invalid keyword argument for %s" % (k, cls_.__name__)
2136 )
2137 setattr(self, k, kwargs[k])
TypeError: 'id_' is an invalid keyword argument for Datauser_default_user_dbconn_default_connection
Any idea why ?
Here user_default_user_dbconn_default_connection is my collection table name, i gave to the PGVectorStore.
Ugh, imma have to spin up postgres locally and figure this out. Too much back and forth lol
Give me a sec
The tracback is cut off, but lets see if I can reproduce
I know, I'm sorry for the trouble. Did you get a chance to reproduce the issue ? @Logan M
Took way too long to get postgres setup lol and also got distracted
But now I've managed to test, figure out the bug, and am about to commit a fix π
aaaand just merged it, phew
Perfect, Let me reinstall and try it again.
@Logan M Thanks a ton for your quick fixes. Appreciate it very much π
I have two quick questions 1. Is there a way to add custom prompts(a few shot examples) to the given example SQL Index guides ? 2. The extra_info
option is not working to extact the sql_query from the response of the SQL index (response = query_engine.query("Give me all the tables in the database")
response.extra_info). Any idea why ?
One way to get the sql_query from the response is this, as it's available in metadata: response.metadata['sql_query']
Yes it is response.metadata
now actually. Is there some stale documentation somewhere?
And you want to add few shot examples to the prompt when generating text to sql?
Also, @Logan M I think there is an issue with PGVectorStore. "as_retriver(similarity=5)" is no logner working with object _index. Here is the error: File "/app/langchain_ai.py", line 70, in __init__
self.object_index.as_retriever(similarity_top_k=3),
talktodata-api-api-1 | File "/usr/local/lib/python3.10/site-packages/llama_index/objects/base.py", line 61, in as_retriever
talktodata-api-api-1 | retriever=self._index.as_retriever(**kwargs),
talktodata-api-api-1 | AttributeError: 'PGVectorStore' object has no attribute 'as_retriever'
I think there are some changes on PGVectorStore an hour ago, that might be causign this. Here is how i'm using this: self.vector_store = PGVectorStore.from_params(
database=DATABASE_NAME,
host=DATABASE_HOST,
password=DATABASE_PASSWORD,
port=DATABASE_PORT,
user=DATABASE_USER,
table_name=self.collection_table_name)
self.table_node_mapping = SQLTableNodeMapping(self.sql_database)
self.object_index = ObjectIndex(self.vector_store, self.table_node_mapping)
self.db_query_engine = SQLTableRetrieverQueryEngine(
self.sql_database,
self.object_index.as_retriever(similarity_top_k=3),
)
You have an error in your usage
You need to create a vector index with the vector store, and pass the index into the object index
Right now you are passing in the bare vector store, hence the error
I think there's an example in this thread actually lol
Hi @Logan M yes, You are right. I was able to solve the problem. ANd Do you have any insight on customizing the prompt with a few shot example while working while generating text to sql ?
What is the sql_index
here ? Can you point me where should i add the text_to_sql_prompt=my_prompt
in my below code ? engine = create_engine(db_uri)
sql_database = SQLDatabase(engine)
collection_table_name = f"user_{user_id}_dbconn_{connection_id}"
table_node_mapping1 = SQLTableNodeMapping(sql_database)
vector_store = PGVectorStore.from_params(
database=DATABASE_NAME,
host=DATABASE_HOST,
password=DATABASE_PASSWORD,
port=DATABASE_PORT,
user=DATABASE_USER,
table_name=collection_table_name)
vector_store_index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
object_index = ObjectIndex(vector_store_index, table_node_mapping1)
query_engine = SQLTableRetrieverQueryEngine(
sql_database,
object_index.as_retriever(similarity_top_k=3),
)
response = query_engine.query("Give me all the tables in the database")
query_engine = SQLTableRetrieverQueryEngine(
sql_database,
object_index.as_retriever(similarity_top_k=3),
text_to_sql_prompt=my_prompt
)
Also, Iβm very sorry for asking all these primitive questions. Iβm having hard time figuring out a way to create table_node_mapping = SQLTableNodeMapping(sql_database) for large databases quickly. The usecase Iβm working with should allow to create multiple database connections and switch between them to ask questions in plain English. Currently if the table has 300+ tables, itβs taking a very long time to get the answer, because the table node mapping creation itself takes a lot of time. Do you have any idea on how to approach this problem.
@Logan M Does adding multi threading/parallel processing helps here? Or any workaround do you have in mind ?