Find answers from the community

Updated 3 months ago

How to use the latest ObjectIndex for

How to use the latest 'ObjectIndex' for working with Vector Store(PGVector) instead of default 'VectorStoreIndex'. to retrive data from already created index in my PGVector. self.vector_store = PGVectorStore.from_params( database=DATABASE_NAME, host=DATABASE_HOST, password=DATABASE_PASSWORD, port=DATABASE_PORT, user=DATABASE_USER, table_name=self.collection_table_name) self.obj_index = ObjectIndex.from_objects(self.table_schema_objs, self.table_node_mapping, VectorStoreIndex) self.db_query_engine = SQLTableRetrieverQueryEngine( self.sql_database, self.obj_index.as_retriever(similarity_top_k=3)) So far i have my code like this, But i'm not able to use my 'vector_store' inside the ObjectIndex.from_objects(). How to approach this. My usecase is I'm creating embeddings on entire databse(100+ tables) and storing them in a pgvector table. Now i just want to use that table to retrive the top 3 results. Any help here would be of a great help. I'm just following this example here https://gpt-index.readthedocs.io/en/latest/examples/index_structs/struct_indices/SQLIndexDemo-ManyTables.html
L
v
56 comments
I thiiiiiink it's possible, after looking at the source code

Since you have the vector_store setup, you just need to pass it into a storage context and pass that in

Plain Text
from llama_index import StorageContext

vector_store = ...
storage_context = StorageContext.from_defaults(vector_store=vector_store)

obj_index = ObjectIndex.from_objects(self.table_schema_objs, self.table_node_mapping, VectorStoreIndex, storage_context=storage_context)
Let me try that and see
Hi @Logan M, i tried that, but everytime when i run db_query_engine.query(question), it's generating the embeddings again, instead of just searching from the table name i provided in the PGVecorStore(). Here is my full code: self.llm = ChatOpenAI(temperature=temperature, model=model_name) self.engine = create_engine(db_uri) self.sql_database = SQLDatabase(self.engine) self.metadata_obj = MetaData() self.metadata_obj.reflect(bind=self.engine) all_table_names = list(self.metadata_obj.tables.keys()) self.table_node_mapping = SQLTableNodeMapping(self.sql_database) self.table_schema_objs = [] for table_name in all_table_names: self.table_schema_objs.append(SQLTableSchema(table_name=table_name)) logger.info("sql_database created: %s", self.sql_database) self.collection_table_name = f"user_{user_id}_dbconn_{connection_id}" logger.info("collection_table_name created: %s", self.collection_table_name) self.vector_store = PGVectorStore.from_params( database=DATABASE_NAME, host=DATABASE_HOST, password=DATABASE_PASSWORD, port=DATABASE_PORT, user=DATABASE_USER, table_name=self.collection_table_name) self.storage_context = StorageContext.from_defaults(vector_store=self.vector_store) # self.vector_store_index = VectorStoreIndex.from_vector_store(vector_store=self.vector_store) self.obj_index = ObjectIndex.from_objects(self.table_schema_objs, self.table_node_mapping, VectorStoreIndex, storage_context=self.storage_context) self.db_query_engine = SQLTableRetrieverQueryEngine( self.sql_database, self.obj_index.as_retriever(similarity_top_k=3), > )
My objective here is just retrieve the top 3 results from the already created vector collection table in my PGVectorStore for the specifc query. One thing i'm confused here is, Do i have to create and pass the variables self.table_schema_objs, self.table_node_mapping to the ObjectIndex.from_objects() ? . It's taking alot of time for a database with 300+ tables. Can you please provide any guidance here ?
I think they are currently needed to create/load the index yes. But normally in an app you would only do this once at startup and keep it all in memory
ohhh well i see one issue actually
Once you initially create the object index once, you'll need to avoid using from_objects() to create the object index

Instead, you can instantiate it directly

Plain Text
index = VectorStoreIndex.from_vector_store(vector_store)
object_index = ObjectIndex(index, table_node_mapping)
Hi @Logan M I did make the changes exactly as you mentioned above to query from the exisitng pgvector collection table. But now I'm getting an error. Here is my code: DATABASE_NAME = os.environ.get('DATABASE_NAME') DATABASE_USER = os.environ.get('DATABASE_USER') DATABASE_HOST = os.environ.get('DATABASE_HOST') DATABASE_PORT = os.environ.get('DATABASE_PORT') DATABASE_PASSWORD = os.environ.get('DATABASE_PASSWORD') DATABASE_URL2 = 'mysql+pymysql://magento-ai-user2:23RolsAj3c8R2@68.183.84.131/magento_ai_db1' engine = create_engine(DATABASE_URL) sql_database = SQLDatabase(engine) user_id = 5 connection_id = 59 collection_table_name = f"user_{user_id}_dbconn_{connection_id}" table_node_mapping1 = SQLTableNodeMapping(sql_database) vector_store = PGVectorStore.from_params( database=DATABASE_NAME, host=DATABASE_HOST, password=DATABASE_PASSWORD, port=DATABASE_PORT, user=DATABASE_USER, table_name=collection_table_name) vector_store_index = VectorStoreIndex.from_vector_store(vector_store=vector_store) object_index = ObjectIndex(vector_store_index, table_node_mapping1) # query_engine = index.as_query_engine() query_engine = SQLTableRetrieverQueryEngine( sql_database, object_index.as_retriever(similarity_top_k=3), ) response = query_engine.query("Give me all the tables in the database")
And this is the error I have with node.info["name"] : KeyError Traceback (most recent call last)
Cell In[322], line 30
23 # query_engine = index.as_query_engine()
25 query_engine = SQLTableRetrieverQueryEngine(
26 sql_database,
27 object_index.as_retriever(similarity_top_k=3),
28 )
---> 30 response = query_engine.query("Give me all the tables in the database")

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
21 if isinstance(str_or_query_bundle, str):
22 str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
24 return response

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py:78, in llm_token_counter..wrap..wrapped_llm_predict(_self, *args, kwargs) 76 def wrapped_llm_predict(_self: Any, *args: Any, kwargs: Any) -> Any:
77 with wrapper_logic(_self):
---> 78 f_return_val = f(_self, *args, **kwargs)
80 return f_return_val

File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/indices/struct_store/sql_query.py:260, in BaseSQLTableQueryEngine._query(self, query_bundle)
257 @llm_token_counter("query")
258 def _query(self, query_bundle: QueryBundle) -> Response:
...
61 return SQLTableSchema(
---> 62 table_name=node.node_info["name"], context_str=node.node_info["context"]
63 )

KeyError: 'name'
@Logan M Can you please help me here ?
So, the old version of llama-index did not store the node-info stuff in the vector db, which is why it's failing with a key error on node info

Looking at the code though, the latest version of llama_index should remedy this though, if you upgrade and create the db again πŸ™

Appologies for the trouble here, but we are in uncharted territory haha
Okay, Let me upgrade the Llama_index and try again
@Logan M I tried with latest version v0.6.35. Now i'm facing this error initiating the pgvector store with the table: InvalidRequestError Traceback (most recent call last) Cell In[7], line 25 22 for table_name in all_table_names: 23 table_schema_objs.append(SQLTableSchema(table_name=table_name)) ---> 25 vector_store = PGVectorStore.from_params( 26 database=DATABASE_NAME, 27 host=DATABASE_HOST, 28 password=DATABASE_PASSWORD, 29 port=DATABASE_PORT, 30 user=DATABASE_USER, 31 table_name=collection_table_name) 33 storage_context = StorageContext.from_defaults(vector_store=vector_store) 35 obj_index = ObjectIndex.from_objects( 36 table_schema_objs, 37 table_node_mapping, 38 storage_context=storage_context 39 ) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/vector_stores/postgres.py:80, in PGVectorStore.from_params(cls, host, port, database, user, password, table_name) 78 """Return connection string from database parameters.""" 79 conn_str = f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}" ---> 80 return cls(connection_string=conn_str, table_name=table_name) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/vector_stores/postgres.py:60, in PGVectorStore.__init__(self, connection_string, table_name) ... 1512 _undefer_column_name( 1513 k, self.column_copies.get(value, value) # type: ignore 1514 ) InvalidRequestError: Attribute name 'metadata' is reserved when using the Declarative API.
ha, right, just merged/fixed that a few hours ago πŸ˜‰ Didn't make it to a pypi release yet

do pip install --upgrade git+https://github.com/jerryjliu/llama_index
Logan, Even after pip3 upgrading from the above git repo, I dont see any verison increase. It's still "0.6.35". Is that supposted to be like that ? And i'm still facing the error: InvalidRequestError: Attribute name 'metadata' is reserved when using the Declarative API
The version will technically be the same, but it should have installed the latest version from source πŸ€”

It's no longer using the metadata name -- now it is metadata_ πŸ€” https://github.com/jerryjliu/llama_index/blob/a28402bbd928a0fed4b912490950d9f4af65fde3/llama_index/vector_stores/postgres.py#L25C8-L25C33

Not sure what happened during the install, maybe uninstall and re-install from git?
If you can get it, we can monkey patch the file manually lol
Monkey patch guide:

pip show llama-index will show where llama-index is installed

Plain Text
(llama-index) loganm@gamingpc:~/llama_index_proper/llama_index/examples/docstore$ pip show llama-index
Name: llama-index
Version: 0.6.35
Summary: Interface between LLMs and your data
Home-page: https://github.com/jerryjliu/llama_index
Author: Jerry Liu
Author-email: 
License: MIT
Location: /home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages
Requires: beautifulsoup4, dataclasses-json, fsspec, langchain, numpy, openai, pandas, sqlalchemy, tenacity, tiktoken, typing-extensions, typing-inspect, urllib3
Required-by: llama-agi, llama-hub


From the location there, we can edit the source file directly. For me, the file path is /home/loganm/miniconda3/envs/llama-index/lib/python3.11/site-packages/llama_index/vector_stores/postgres.py

In that file, you'll want to change:

line 25 to say metadata_ = Column(JSON)

line 115 to say metadata_=node_to_metadata_dict(

line 146 to say node = metadata_dict_to_node(item.metadata_)

line 153 to say metadata=item.metadata_,
I just realized the line 115 was not fixed in my PR, oof
doing that now
Okay, Should i wait until you fix issue with line115. Or can i go ahead to follow the steps you mentioend above ?
I just merged a fix, maybe try the pip install upgrade from git again πŸ™‚
if that doesn't work, then yea at least inspect the file to see if it has those changes... maybe there's something else missing if it still doesn't work πŸ€”
Okay, Let me reinstall and see
Hi @Logan M I tried it. The metadata issue is resolved, But now we have a different issue with creating with ObjectIndex creation. Here is the error I'm running into: TypeError Traceback (most recent call last) Cell In[13], line 35 25 vector_store = PGVectorStore.from_params( 26 database=DATABASE_NAME, 27 host=DATABASE_HOST, (...) 30 user=DATABASE_USER, 31 table_name=collection_table_name) 33 storage_context = StorageContext.from_defaults(vector_store=vector_store) ---> 35 obj_index = ObjectIndex.from_objects( 36 table_schema_objs, 37 table_node_mapping, 38 storage_context=storage_context 39 ) 41 query_engine = SQLTableRetrieverQueryEngine( 42 sql_database, 43 obj_index.as_retriever(similarity_top_k=1), 44 ) 46 # # llm = OpenAI(temperature=0) 47 # db_chain = SQLDatabaseChain(llm=llm, database=sql_database, return_intermediate_steps=True, verbose=True, top_k=200) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/llama_index/objects/base.py:51, in ObjectIndex.from_objects(cls, objects, object_mapping, index_cls, **index_kwargs) 49 object_mapping = SimpleObjectNodeMapping.from_objects(objects) 50 nodes = object_mapping.to_nodes(objects) ... 2135 "%r is an invalid keyword argument for %s" % (k, cls_.__name__) 2136 ) 2137 setattr(self, k, kwargs[k]) TypeError: 'id_' is an invalid keyword argument for Datauser_default_user_dbconn_default_connection Any idea why ?
Here user_default_user_dbconn_default_connection is my collection table name, i gave to the PGVectorStore.
Ugh, imma have to spin up postgres locally and figure this out. Too much back and forth lol

Give me a sec
The tracback is cut off, but lets see if I can reproduce
I know, I'm sorry for the trouble. Did you get a chance to reproduce the issue ? @Logan M
Took way too long to get postgres setup lol and also got distracted

But now I've managed to test, figure out the bug, and am about to commit a fix πŸ™‚
aaaand just merged it, phew
Perfect, Let me reinstall and try it again.
@Logan M Thanks a ton for your quick fixes. Appreciate it very much πŸ™‚
I have two quick questions 1. Is there a way to add custom prompts(a few shot examples) to the given example SQL Index guides ? 2. The extra_info option is not working to extact the sql_query from the response of the SQL index (response = query_engine.query("Give me all the tables in the database")
response.extra_info). Any idea why ?
One way to get the sql_query from the response is this, as it's available in metadata: response.metadata['sql_query']
Yes it is response.metadata now actually. Is there some stale documentation somewhere?
And you want to add few shot examples to the prompt when generating text to sql?
Also, @Logan M I think there is an issue with PGVectorStore. "as_retriver(similarity=5)" is no logner working with object _index. Here is the error: File "/app/langchain_ai.py", line 70, in __init__ self.object_index.as_retriever(similarity_top_k=3), talktodata-api-api-1 | File "/usr/local/lib/python3.10/site-packages/llama_index/objects/base.py", line 61, in as_retriever talktodata-api-api-1 | retriever=self._index.as_retriever(**kwargs), talktodata-api-api-1 | AttributeError: 'PGVectorStore' object has no attribute 'as_retriever'
I think there are some changes on PGVectorStore an hour ago, that might be causign this. Here is how i'm using this: self.vector_store = PGVectorStore.from_params( database=DATABASE_NAME, host=DATABASE_HOST, password=DATABASE_PASSWORD, port=DATABASE_PORT, user=DATABASE_USER, table_name=self.collection_table_name) self.table_node_mapping = SQLTableNodeMapping(self.sql_database) self.object_index = ObjectIndex(self.vector_store, self.table_node_mapping) self.db_query_engine = SQLTableRetrieverQueryEngine( self.sql_database, self.object_index.as_retriever(similarity_top_k=3), )
How to fix this ?
You have an error in your usage
You need to create a vector index with the vector store, and pass the index into the object index
Right now you are passing in the bare vector store, hence the error
I think there's an example in this thread actually lol
I will find it
That message πŸ‘
Hi @Logan M yes, You are right. I was able to solve the problem. ANd Do you have any insight on customizing the prompt with a few shot example while working while generating text to sql ?
yea, you can pass in the template here sql_index.as_query_engine(text_to_sql_prompt=my_prompt)

The default prompt is here for guidance
https://github.com/jerryjliu/llama_index/blob/b7697c006499f7e48e3164152092c07ba0327aa9/llama_index/prompts/default_prompts.py#L171-L198
What is the sql_index here ? Can you point me where should i add the text_to_sql_prompt=my_prompt in my below code ? engine = create_engine(db_uri) sql_database = SQLDatabase(engine) collection_table_name = f"user_{user_id}_dbconn_{connection_id}" table_node_mapping1 = SQLTableNodeMapping(sql_database) vector_store = PGVectorStore.from_params( database=DATABASE_NAME, host=DATABASE_HOST, password=DATABASE_PASSWORD, port=DATABASE_PORT, user=DATABASE_USER, table_name=collection_table_name) vector_store_index = VectorStoreIndex.from_vector_store(vector_store=vector_store) object_index = ObjectIndex(vector_store_index, table_node_mapping1) query_engine = SQLTableRetrieverQueryEngine( sql_database, object_index.as_retriever(similarity_top_k=3), ) response = query_engine.query("Give me all the tables in the database")
Plain Text
query_engine = SQLTableRetrieverQueryEngine(
                sql_database,
                object_index.as_retriever(similarity_top_k=3),
                text_to_sql_prompt=my_prompt
            )
Super. Thanks!!
Also, I’m very sorry for asking all these primitive questions. I’m having hard time figuring out a way to create table_node_mapping = SQLTableNodeMapping(sql_database) for large databases quickly. The usecase I’m working with should allow to create multiple database connections and switch between them to ask questions in plain English. Currently if the table has 300+ tables, it’s taking a very long time to get the answer, because the table node mapping creation itself takes a lot of time. Do you have any idea on how to approach this problem.
@Logan M Does adding multi threading/parallel processing helps here? Or any workaround do you have in mind ?
Add a reply
Sign up and join the conversation on Discord