LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

What's the best way to use llama-index

What's the best way to use llama-index

At a glance

The post asks about the best way to use llama-index to retrieve rows and cell values from a pandas dataframe based on a natural language user query. The comments discuss various approaches, including using the PandasQueryEngine, writing prompts from scratch, and exploring the use of workflows. Community members share links to relevant documentation and examples, but there is no explicitly marked answer. The discussion highlights challenges around hallucination of variable names and the need to properly execute pandas queries within the llama-index framework.

Useful resources

·

What's the best way to use llama-index to retrieve row(s) and cell value from a pandas dataframe based on a natural language user query?

L

k

41 comments

https://docs.llamaindex.ai/en/stable/examples/query_engine/pandas_query_engine/

Thanks @Logan M . How is this different from tool use / function calling? https://discord.com/channels/1059199217496772688/1282840257800175616/1282844352929988663

This is just prompting the llm to write a pandas query, and then executing, and getting the LLM to interpret the result

You could certainly create a tool for an llm/agent that does the same thing

ok. any tips for improving pandas query writing based on user query? I implemented a vanilla version and the results are not good.

https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_pandas/

writing the prompts from scratch?

Writing the prompts from scratch is probably the way to go imo. The query engine does expose hooks for your own prompts etc., but I usually like to encourage from scratch when needed

I might adapt that query pipeline to use our new workflows abstraction though, query pipelines are an older way to do this sort of thing

gotcha, i will check out workflows

can you share a link with example workflow using pandas df, if it exists or something similar?

Sadly, we haven't gotten that example built yet, but we have a ton of other docs and examples
https://docs.llamaindex.ai/en/stable/module_guides/workflow/#examples

That entire page is pretty helpful

This one looks like it could be helpful: https://docs.llamaindex.ai/en/stable/examples/workflow/sub_question_query_engine/

for working with pandas df

i am trying to figure out how workflows would work with a pandas df?

In my case, I am trying to retrieve a single value from a df column

basically, write a pandas query like df(['col1'] == 'val1' & ['col2'] == 'val2')['col3'] <-- this is what PandasQueryEngine was doing

not sure how i'd do this with workflow - do I pass the pandas df as a tool?

You'd have to actually execute that code, using eval()

or similar

Ok - the code generation itself would be done in workflows which will orchestrate prompt template, llm, response synthesis ?

@Logan M I followed the example to update the prompts: https://docs.llamaindex.ai/en/stable/examples/query_engine/pandas_query_engine/ and i am getting the error below:

Pandas Instructions:

Plain Text

df_uk[df_uk['Level 1'] == 'Business Travel'][df_uk['Level 2'] == 'Petrol car']['GHG Conversion Factor 2020']

Pandas Output: There was an error running the output as Python code. Error message: name 'df_uk' is not defined

Traceback (most recent call last):
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/llama_index/experimental/query_engine/pandas/output_parser.py", line 54, in default_output_processor
output_str = str(safe_eval(module_end_str, global_vars, local_vars))
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/llama_index/experimental/exec_utils.py", line 159, in safe_eval
return eval(source, _get_restricted_globals(globals), __locals)
File "<string>", line 1, in <module>
NameError: name 'df_uk' is not defined

is df_uk a valid variable name though? Isn't it passed in as df ?

it is passsed as a df

# Read an excel file and print all sheets
df_uk = pd.read_excel(os.getcwd() + "/data/file.xlsx", sheet_name="data")
df_uk.sample(5)

Plain Text

query_engine = PandasQueryEngine(df=df_uk, verbose=True)
prompts = query_engine.get_prompts()

Plain Text

new_prompt = PromptTemplate(
                                """\
                                You are working with a pandas dataframe in Python.
                                The name of the dataframe is `df_uk`.
                                This is the result of `print(df_uk.head())`:
                                {df_str}

                                Follow these instructions: {instruction_str}
                                
                                Query: {query_str}

                                Expression: 
                                """
    
                            ).partial_format(
                                                instruction_str = instruction_str,
                                                df_str = df_uk.head(5)
                                            )

query_engine.update_prompts({"pandas_prompt": new_prompt})

after i update the query_engine.update_prompt... it doesn't work, maybe i need to re-pass it ?

Seems like the llm is hallucinating the name of the df?

weird

it's included in the PromptTemplate

Are you using an open source llm? Not totally unexpected

nope - gpt-4o-mini

@Logan M I am using workflows , in particular RAG with Re-ranking and vectorDBs. In the linked example, https://docs.llamaindex.ai/en/stable/examples/workflow/rag/ , instead of a vectorStoreIndex with I am using MilvusVectorStore and pass the new_index in def ingest in RAGWorkflow class.

Plain Text

storage_context = StorageContext.from_defaults(vector_store=vector_store)

vector_store = MilvusVectorStore(
                                    uri="http://localhost:19530",  # set local / docker / k8s
                                    dim=384, 
                                    collection_name = collection_name,
                                    overwrite=True
                                )

storage_context = StorageContext.from_defaults(vector_store=vector_store)

new_index = VectorStoreIndex.from_documents(
                                            documents,
                                            storage_context=storage_context
                                          )

Plain Text

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[23], line 4
      1 # Run a query
      2 result = await w.run(query="What is the conversion factor for Business Travel by Diesel car in miles?", index=uk_index)
----> 4 async for chunk in result.async_response_gen():
      5     print(chunk, end="", flush=True)

AttributeError: 'VectorStoreIndex' object has no attribute 'async_response_gen'

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
The provided information does not include details about Business Travel by Diesel car, so a conversion factor for that specific category cannot be determined from the available data.

what object do I use to iterate over or extract from <llama_index.core.indices.vector_store.base.VectorStoreIndex at 0x7fd7be5a1d50> ?

You can't iterate over an index 👀 you need to use a query engine and query it with aquery

this code works in the linked example using VectorStoreIndex

Plain Text

w = RAGWorkflow() 

result = await w.run(query="How was Llama2 trained?", index=index)

async for chunk in result.async_response_gen():
    print(chunk, end="", flush=True)

https://docs.llamaindex.ai/en/stable/examples/workflow/rag/

the only change is using MilvusVectorStore

what object would you call aquery on?

You'd attach milvus to a storage context, and use that in the index in the ingest step

Plain Text

VectorStoreIndex.from_documents(..,
., storage_context=storage_context)

That's the only change you'd need

Add a reply

Sign up and join the conversation on Discord