Hello, im getting a `'LLMPredictor'

jjhthompson12

Hello, im getting a 'LLMPredictor' object has no attribute '_llm' error when attempting to perform RAG inference on an index within a demo web app im building with Plotly Dash.

23 comments

jjhthompson12

relevant part of the traceback:

Plain Text

File "D:\LLM_Work\llm-server-webapp\bring-your-own-documents\app-bring-your-own-docs3.py", line 319, in response_stream
    yield from (line for line in query_engine.query(user_question).response_gen)
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\core\base_query_engine.py", line 30, in query
    return self._query(str_or_query_bundle)
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\query_engine\retriever_query_engine.py", line 171, in _query
    response = self._response_synthesizer.synthesize(
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\response_synthesizers\base.py", line 146, in synthesize
    response_str = self.get_response(
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\response_synthesizers\compact_and_refine.py", line 38, in get_response
    return super().get_response(
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\response_synthesizers\refine.py", line 146, in get_response
    response = self._give_response_single(
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\response_synthesizers\refine.py", line 194, in _give_response_single
    program = self._program_factory(text_qa_template)
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\response_synthesizers\refine.py", line 177, in _default_program_factory
    llm=self._service_context.llm,
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\service_context.py", line 322, in llm
    return self.llm_predictor.llm
  File "D:\LLM_Work\llm-server-webapp\.venv\lib\site-packages\llama_index\llm_predictor\base.py", line 143, in llm
    return self._llm
AttributeError: 'LLMPredictor' object has no attribute '_llm'

jjhthompson12

this is the callback in my app which results in the error:

Plain Text

@app.server.route("/esic-rag/streaming-chat", methods=["POST"])
def streaming_chat():
    sys_prompt = request.json["sys_prompt"]
    user_prompt = request.json["prompt"]
    user_question = request.json["question"]
    sim_top_k = request.json["sim_top_k"]
    session_id = request.json["session_id"]

    llm = OpenAILike(
        model="local:llama-2-13b-chat.Q4_K_S",
        api_base="http://localhost:8000/v1",
        api_key="fake",
        api_type="fake",
        max_tokens=3900,
        is_chat_model=True
    )
    service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5", chunk_size=256, num_output=256)
    set_global_service_context(service_context)

    index = pickle.loads(cache.get(session_id))
    cache.set(session_id, pickle.dumps(index))

    # Create a system message
    system_message = ChatMessage(role=MessageRole.SYSTEM, content=sys_prompt)
    user_prompt = ChatMessage(role=MessageRole.USER, content=user_prompt)
    text_qa_template = ChatPromptTemplate(message_templates=[system_message, user_prompt])

    ### QUERY ENGINE SECTION
    query_engine = index.as_query_engine(streaming=True, text_qa_template=text_qa_template, similarity_top_k=sim_top_k)
    def response_stream():
        yield from (line for line in query_engine.query(user_question).response_gen)

    return Response(response_stream(), mimetype="text/response-stream")

jjhthompson12

What is confusing is that this version of the app which is throwing an error is very similar to an earlier iteration which is working just fine. The main difference between the earlier version and this version is the fact that I am caching the index in the filesystem whereas before I was creating the index more globally without the need to cache different indices..

but i can't figure out why self._llm is nonexistent all of the sudden 🤔

LLogan M

what version of llama-index do you have?

jjhthompson12

llama-index 0.9.24

LLogan M

From looking at the framework code, I have no idea how thats even possible 😅 https://github.com/run-llama/llama_index/blob/7cf889f23e794ce1cc9036884108fb6cd3036a78/llama_index/llm_predictor/base.py#L109

LLogan M

oh!

LLogan M

you are using pickle :PSadge:

LLogan M

that's almost certainly why -- our pickle support is a little flakey 😅

LLogan M

I wouldn't advise pickling an index, if possible

LLogan M

I'll try to look into why thats happening though

jjhthompson12

ah darn... thanks for the info!

One interesting point where pickling the index seems to work fine is in this other callback which is just designed to unpickle the cached index and return the top N most relevant excerpts from the index:

Plain Text

@app.callback(
    [Output('context-results', 'children'),
     Output('search-result-header', 'children')],
    Input("submit-prompt", "n_clicks"),
    [State('text-question', 'value'),
     State("num-excerpts", "value"),
     State('session-id', 'data')]
)
def query_vector_db(clicks, question, sim_top_k, session_id):
    if clicks == None:
        raise PreventUpdate

    # get previously created cached index from filesystem
    index = pickle.loads(cache.get(session_id))

    retriever = index.as_retriever(similarity_top_k=sim_top_k)
    nodes = retriever.retrieve(question)

    search_results = []

    for node in nodes:
        doc = node.metadata['file_name']
        page = node.metadata['page_label']
        url = 'https://www.hello.com/00000000' + doc.split('_')[0]
        excerpt_text = node.text

        search_results.append(
            dbc.Card(
                [
                    dbc.CardBody(
                        [
                            html.H4(doc, className="card-title"),
                            html.H6('Page ' + page, className="card-title"),
                            html.P(excerpt_text, className="card-text"),
                            dbc.Button("Get the Report", color="primary", href=url, target='_blank'),
                        ]
                    ),
                ],
            )
        )

    return [search_results, "Top %d Search Results (input context for LLM):" % sim_top_k]

This works perfectly well

jjhthompson12

but i agree that there seems to be something going on which was brought about by the introduction of this caching / pickling mechanism

Unfortunately I kind of need to be able to cache these indices which i believe requires serializing them. Do you have any other recommendations for how to approach this? I started with pickle because that's what im familiar with.

LLogan M

There's a few alternatives.

You can instead serialize the storage context, and then reload the index from the storage context

Plain Text

import json

# "save"
storage_string = json.dumps(index.storage_context.to_dict())

# "load"
from llama_index import load_index_from_storage, StorageContext
storage_dict = json.loads(storage_string)
storage_context = StorageContext.from_dict(storage_dict)

# optional service context
index = load_index_from_storage(storage_context, service_context=service_context)

Or you can use a remotely hosted vectordb integration, so that you can just do VectorStoreIndex.from_vector_store(vector_store)

jjhthompson12

thanks, I will try the serializing the storage context first since this is just a proof of concept.

LLogan M

I wrote that without testing haha but it shouuuuld work 🙂

jjhthompson12

oh, it was spot on! thanks for the suggestion! and thanks again for sharing your knowledge!

jjhthompson12

I guess my only gripe with this apporoach is that the storage_context = StorageContext.from_dict(storage_dict) step is taking many seconds.

@Logan M Do you have a sense of what is happening in this step that is taking so long? is it having to recompute the embeddings for each chunk of text in the storage_context?

LLogan M

it's not recomputing everything, but it is moving a bunch of stuff around in memory

LLogan M

actually, pickling the storage context itself will probably work, and be faster

LLogan M

instead of putting it to/from a string/dict/object

jjhthompson12

I'll try that! because it definitely seems like rebuilding the storage_context from a dictionary is the bottleneck

jjhthompson12

oh yea, skipping the dictionary step did the trick! it's a few orders of magnitude faster now. Thanks again!

Add a reply

Find answers from the community

Hello, im getting a `'LLMPredictor'