Find answers from the community

Updated 4 months ago

What's the best way to use llama-index

What's the best way to use llama-index to retrieve row(s) and cell value from a pandas dataframe based on a natural language user query?
L
k
41 comments
Thanks @Logan M . How is this different from tool use / function calling? https://discord.com/channels/1059199217496772688/1282840257800175616/1282844352929988663
This is just prompting the llm to write a pandas query, and then executing, and getting the LLM to interpret the result

You could certainly create a tool for an llm/agent that does the same thing
ok. any tips for improving pandas query writing based on user query? I implemented a vanilla version and the results are not good.
writing the prompts from scratch?
Writing the prompts from scratch is probably the way to go imo. The query engine does expose hooks for your own prompts etc., but I usually like to encourage from scratch when needed

I might adapt that query pipeline to use our new workflows abstraction though, query pipelines are an older way to do this sort of thing
gotcha, i will check out workflows
can you share a link with example workflow using pandas df, if it exists or something similar?
Sadly, we haven't gotten that example built yet, but we have a ton of other docs and examples
https://docs.llamaindex.ai/en/stable/module_guides/workflow/#examples
That entire page is pretty helpful
for working with pandas df
i am trying to figure out how workflows would work with a pandas df?
In my case, I am trying to retrieve a single value from a df column
basically, write a pandas query like df(['col1'] == 'val1' & ['col2'] == 'val2')['col3'] <-- this is what PandasQueryEngine was doing
not sure how i'd do this with workflow - do I pass the pandas df as a tool?
You'd have to actually execute that code, using eval()
Ok - the code generation itself would be done in workflows which will orchestrate prompt template, llm, response synthesis ?
@Logan M I followed the example to update the prompts: https://docs.llamaindex.ai/en/stable/examples/query_engine/pandas_query_engine/ and i am getting the error below:

Pandas Instructions:
Plain Text
df_uk[df_uk['Level 1'] == 'Business Travel'][df_uk['Level 2'] == 'Petrol car']['GHG Conversion Factor 2020']

Pandas Output: There was an error running the output as Python code. Error message: name 'df_uk' is not defined
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 54, in default_output_processor
output_str = str(safe_eval(module_end_str, global_vars, local_vars))
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval
return eval(source, _get_restricted_globals(globals), __locals)
File "<string>", line 1, in <module>
NameError: name 'df_uk' is not defined
is df_uk a valid variable name though? Isn't it passed in as df ?
it is passsed as a df
# Read an excel file and print all sheets df_uk = pd.read_excel(os.getcwd() + "/data/file.xlsx", sheet_name="data") df_uk.sample(5)
Plain Text
query_engine = PandasQueryEngine(df=df_uk, verbose=True)
prompts = query_engine.get_prompts()
Plain Text
new_prompt = PromptTemplate(
                                """\
                                You are working with a pandas dataframe in Python.
                                The name of the dataframe is `df_uk`.
                                This is the result of `print(df_uk.head())`:
                                {df_str}

                                Follow these instructions: {instruction_str}
                                
                                Query: {query_str}

                                Expression: 
                                """
    
                            ).partial_format(
                                                instruction_str = instruction_str,
                                                df_str = df_uk.head(5)
                                            )

query_engine.update_prompts({"pandas_prompt": new_prompt})
after i update the query_engine.update_prompt... it doesn't work, maybe i need to re-pass it ?
Seems like the llm is hallucinating the name of the df?
it's included in the PromptTemplate
Are you using an open source llm? Not totally unexpected
nope - gpt-4o-mini
@Logan M I am using workflows , in particular RAG with Re-ranking and vectorDBs. In the linked example, https://docs.llamaindex.ai/en/stable/examples/workflow/rag/ , instead of a vectorStoreIndex with I am using MilvusVectorStore and pass the new_index in def ingest in RAGWorkflow class.

Plain Text
storage_context = StorageContext.from_defaults(vector_store=vector_store)

vector_store = MilvusVectorStore(
                                    uri="http://localhost:19530",  # set local / docker / k8s
                                    dim=384, 
                                    collection_name = collection_name,
                                    overwrite=True
                                )

storage_context = StorageContext.from_defaults(vector_store=vector_store)

new_index = VectorStoreIndex.from_documents(
                                            documents,
                                            storage_context=storage_context
                                          )


Plain Text
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[23], line 4
      1 # Run a query
      2 result = await w.run(query="What is the conversion factor for Business Travel by Diesel car in miles?", index=uk_index)
----> 4 async for chunk in result.async_response_gen():
      5     print(chunk, end="", flush=True)

AttributeError: 'VectorStoreIndex' object has no attribute 'async_response_gen'

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The provided information does not include details about Business Travel by Diesel car, so a conversion factor for that specific category cannot be determined from the available data.
what object do I use to iterate over or extract from <llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x7fd7be5a1d50> ?
You can't iterate over an index πŸ‘€ you need to use a query engine and query it with aquery
this code works in the linked example using VectorStoreIndex

Plain Text
w = RAGWorkflow() 

result = await w.run(query="How was Llama2 trained?", index=index)

async for chunk in result.async_response_gen():
    print(chunk, end="", flush=True)
the only change is using MilvusVectorStore
what object would you call aquery on?
You'd attach milvus to a storage context, and use that in the index in the ingest step

Plain Text
VectorStoreIndex.from_documents(..,
., storage_context=storage_context)
That's the only change you'd need
Add a reply
Sign up and join the conversation on Discord