Find answers from the community

Updated 3 months ago

I would like to run a query by my index, however, before running the query I do not know w

I would like to run a query by my index, however, before running the query I do not know whether the index has information about this query as a result I would like to fetch data from a database, add that data to the index and then run the query. Here is how it might look like:
Plain Text
query = 'xyz'
if index capable of answering the query:
    #run the query on index
else:
    #fetch data
    #add the data to the index
    #run the query by index

Has anyone done something like this?
k
L
R
9 comments
Please stand by @Rouzbeh as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
Yes, you can achieve this by using LlamaIndex with an SQL database. First, you need to build an SQL index using the GPTSQLStructStoreIndex.from_documents() method, as shown in this source (https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLRouterQueryEngine.html):

Plain Text
sql_database = SQLDatabase(engine, include_tables=["city_stats"])

sql_index = GPTSQLStructStoreIndex.from_documents(
    [], 
    sql_database=sql_database, 
    table_name="city_stats",
)


Then, you can run a query on the index using the query() method. If the index is not capable of answering the query, you can fetch the data from the database, add it to the index, and then run the query again. Here's an example of how you can achieve this:

Plain Text
query = "xyz"

# Check if the index is capable of answering the query
response = index.query(query, mode="default")

if response:  # If the index is capable of answering the query
    print(response)
else:
    # Fetch data from the database
    # Add the data to the index
    # Run the query on the index
    response = index.query(query, mode="default")
    print(response)


Keep in mind that this is a general example, and you might need to adapt it to your specific use case and data fetching process.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
I think you could run a sort of "test query" and check the similarity of the source nodes. If the similarity is too low, then add the data to the index (this assumes you are using a vector index though)

query_engine = index.as_query_engine(response_mode="no_text")

Then when you query, you can check the similarity of each source node

for source in response.source_nodes:
print(source.score)
Just an idea though, not sure how well it will work lol
With the default embedding model, anything below 0.77 similarity is probably not helpful. It might take some testing though
Thanks for the tip @Logan M I'll try this out 🀩
@Logan M I tried it and the response I get is
Plain Text
I'm sorry, I cannot answer the question as the context information provided is ...
 source_nodes: [
    { node: [Object], score: 0.8328059527937333 },
    { node: [Object], score: 0.8255972873586926 }
  ]

The score seems high a bit but the response is clearly saying it cannot answer the question. I need to figure out a programatic way to infer from the response that it cannot answer the question
Basically the it is confident that it doesn't know the answer
Yea, seems like the problem is a little more tricky then πŸ€”
Add a reply
Sign up and join the conversation on Discord