Hi I m trying to do this https gpt index

YYats

Hi, I'm trying to do this: https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLAutoVectorQueryEngine.html
The modification that I've done is: I replaced the OpenAI LLM with LlamaCPP (llama-2-13b-chat) and the VectorStore with Qdrant

I'm experiencing an error when querying

Plain Text

response = query_engine.query(
    "Tell me about the arts and culture of the city with the highest population"
)

Plain Text

JSONDecodeError                           Traceback (most recent call last)
./SQLAutoVectorQueryEngine.ipynb Cell 32 line 1
----> 1 response = query_engine.query(
      2     "Tell me about the arts and culture of the city with the highest population"
      3 )

File ./venv/lib/python3.8/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
     21 if isinstance(str_or_query_bundle, str):
     22     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
     24 return response

File ./venv/lib/python3.8/site-packages/llama_index/query_engine/sql_join_query_engine.py:286, in SQLJoinQueryEngine._query(self, query_bundle)
    284 # TODO: see if this can be consolidated with logic in RouterQueryEngine
    285 metadatas = [self._sql_query_tool.metadata, self._other_query_tool.metadata]
--> 286 result = self._selector.select(metadatas, query_bundle)
    287 # pick sql query
    288 if result.ind == 0:

File ./venv/lib/python3.8/site-packages/llama_index/selectors/types.py:77, in BaseSelector.select(self, choices, query)
     75 metadatas = [_wrap_choice(choice) for choice in choices]
     76 query_bundle = _wrap_query(query)
---> 77 return self._select(choices=metadatas, query=query_bundle)
...
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

I'm not sure as to why it's happening. Can someone help me? Thanks

16 comments

LLogan M

The auto-vector query engine relies on the LLM to write JSON

Open-source models are generally bad at this. Llama2 especially is very verbose

LLogan M

How did you setup the LLM?

YYats

Here's the code that I used

Plain Text

# define node parser and LLM
chunk_size = 1024
llm = LlamaCPP(
    model_path="./models/llama-2-13b-chat/ggml-model-q4_0.gguf"
    model_kwargs={"n_gpu_layers": 0, "streaming": True}
)
embed_model = HuggingFaceEmbedding()
service_context = ServiceContext.from_defaults(chunk_size=chunk_size, llm=llm, embed_model=embed_model)

and that's pretty much what I've change llm-wise

LLogan M

Aha. So, llama2 has veeeerry specific prompt requirements. We have utils to help format prompts. Take a look at this example (specifically the functions used from llama_utils)
https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html#setup-llm

YYats

ahhhhhhhhh I seeeee. I'll try this out. Thanks!!

LLogan M

hopefully adding those will help improve results 🙏

YYats

So I tried using the helpers, then finally just copied args from the docs

Plain Text

llm = LlamaCPP(
    model_path="./models/llama-2-13b-chat/ggml-model-q4_0.gguf",
    model_kwargs={"n_gpu_layers": 0, "streaming": True},
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

And it seems that pretty much for query engines, I would just need to pass the llm to the service_context. https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html#query-engine-set-up-with-llamacpp

Still the same error

Plain Text

response = query_engine.query(
    "Tell me about the arts and culture of the city with the highest population",
)

I wasn't able to try this earlier, but I tried to use the setup with the other queries

Not working for this one (same error)

Plain Text

response = query_engine.query("Tell me about the history of Berlin")

YYats

But there's some output for the last query

Plain Text

response = query_engine.query("Can you give me the country corresponding to each city?")

Here's the output:

Plain Text

Llama.generate: prefix-match hit
Querying SQL database: Can be used to translate natural language queries into SQL queries over a table containing city_stats
INFO:llama_index.query_engine.sql_join_query_engine:> Querying SQL database: Can be used to translate natural language queries into SQL queries over a table containing city_stats
> Querying SQL database: Can be used to translate natural language queries into SQL queries over a table containing city_stats
INFO:llama_index.indices.struct_store.sql_query:> Table desc str: Table 'city_stats' has columns: city_name (VARCHAR(16)), population (INTEGER), country (VARCHAR(16)), and foreign keys: .
> Table desc str: Table 'city_stats' has columns: city_name (VARCHAR(16)), population (INTEGER), country (VARCHAR(16)), and foreign keys: .

llama_print_timings:        load time = 128847.69 ms
llama_print_timings:      sample time =   148.87 ms /   177 runs   (    0.84 ms per token,  1188.96 tokens per second)
llama_print_timings: prompt eval time = 82293.66 ms /   279 tokens (  294.96 ms per token,     3.39 tokens per second)
llama_print_timings:        eval time = 52739.85 ms /   176 runs   (  299.66 ms per token,     3.34 tokens per second)
llama_print_timings:       total time = 135561.23 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time = 128847.69 ms
llama_print_timings:      sample time =    54.47 ms /    66 runs   (    0.83 ms per token,  1211.61 tokens per second)
llama_print_timings: prompt eval time = 70320.81 ms /   230 tokens (  305.74 ms per token,     3.27 tokens per second)
llama_print_timings:        eval time = 18608.56 ms /    65 runs   (  286.29 ms per token,     3.49 tokens per second)
llama_print_timings:       total time = 89121.04 ms

Then the error:

Plain Text

OperationalError                          Traceback (most recent call last)
File ./venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py:1965, in Connection._exec_single_context(self, dialect, context, statement, parameters)
   1964     if not evt_handled:
-> 1965         self.dialect.do_execute(
   1966             cursor, str_statement, effective_parameters, context
   1967         )
   1969 if self._has_events or self.engine._has_events:

File ./venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py:921, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
    920 def do_execute(self, cursor, statement, parameters, context=None):
--> 921     cursor.execute(statement, parameters)

OperationalError: near "Sure": syntax error

The above exception was the direct cause of the following exception:

OperationalError                          Traceback (most recent call last)
./SQLAutoVectorQueryEngine.ipynb Cell 35 line 1
----> 1 response = query_engine.query("Can you give me the country corresponding to each city?")

File ./venv/lib/python3.8/site-packages/llama_index/indices/query/base.py:23, in BaseQueryEngine.query(self, str_or_query_bundle)
     21 if isinstance(str_or_query_bundle, str):
     22     str_or_query_bundle = QueryBundle(str_or_query_bundle)
---> 23 response = self._query(str_or_query_bundle)
...

SQLQuery: SELECT country FROM city_stats WHERE city_name = ?

Please provide the city name as a parameter. I'll wait for your response before running the query.]
(Background on this error at: https://sqlalche.me/e/20/e3q8)

YYats

Based on the error given on the last one, it seems that the helpers either didn't help or I still need to be more specific 🤔

LLogan M

Much better now actually!

What version of llamaindex do you have? I feel like I fixed this recently for llama2

Now it's just a matter of properly parsing the sql from the response

LLogan M

Well actually, it didn't even write the fully query lol

LLogan M

It left the city name blank 😅

LLogan M

The joys of open source llms :PepeHands:

YYats

:crydead:

YYats

Maybe, might've had some trouble parsing stuff since Llama is still saying "Sure, blablabla"

YYats

0.8.33

Add a reply

Find answers from the community

Hi I m trying to do this https gpt index