Niels

Validation Error With Multimodal Azure Openai Package

We are randomly getting this error with the multimodal azure openai package:

Plain Text

vision_service-1        | pydantic_core._pydantic_core.ValidationError: 2 validation errors for ChatMessage
vision_service-1        | blocks.0
vision_service-1        |   Unable to extract tag using discriminator ‘block_type’ [type=union_tag_not_found, input_value={‘type’: ‘text’, ‘text’: ‘Describe what you see’}, input_type=dict]
vision_service-1        |     For further information visit https://errors.pydantic.dev/2.9/v/union_tag_not_found
vision_service-1        | blocks.1
vision_service-1        |   Unable to extract tag using discriminator ‘block_type’ [type=union_tag_not_found, input_value={‘type’: ‘image_url’, ‘im...gg==’, ‘detail’: ‘low’}}, input_type=dict]

Anyone has an idea what is up? Nothing changed in our code.

17 comments

NNiels

Optimizing JSON mode for Pydantic models with OpenAI

Hey guys, is there a way to force JSON mode over function calling for JSON with Pydantic models (OpenAI)? We are noticing 4o-mini is way worse at function calling compared to 3.5-turbo (which we want to deprecate).

Also sources taking about this: https://news.ycombinator.com/item?id=41173223

9 comments

NNiels

Ci testing pipelines failing due to import error

Our CI testing pipelines using Llama index are suddenly failing:

Plain Text

ImportError while importing test module '/home/runner/work/xxx/test_prompts.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test_prompts.py:10: in <module>
    from index import add_documents_to_index
../../index.py:22: in <module>
    from llama_cloud import FilterCondition
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/llama_cloud/__init__.py:3: in <module>
    from .types import (
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/llama_cloud/types/__init__.py:21: in <module>
    from .base import Base
E   ImportError: cannot import name 'Base' from 'llama_cloud.types.base'

Anyone has an idea why?

12 comments

NNiels

Best practices for querying structured output with Pydantic and GPT-4o

Hey guys, just wondering if there are any best practices querying as a structured output with Pydantic. We are running a structured output call where the class has one property that is a union (can be one of two pydantic classes). This is causing a lot of validation errors because it seems the LLM doesn't understand this.

Would love to hear if anyone has ideas or has also experienced this. For context, we are using GPT-4o.

11 comments

NNiels

Best alternative for deprecated query engine

Hey guys, i see this is deprecated. What is the best alternative now?

https://docs.llamaindex.ai/en/stable/module_guides/querying/structured_outputs/query_engine/

12 comments

NNiels

Seeing this SyntaxWarning locally a lot

Seeing this SyntaxWarning locally a lot, is this an issue?

/usr/local/lib/python3.12/site-packages/llama_cloud/types/metadata_filter.py:20: SyntaxWarning: invalid escape sequence '*'

6 comments

NNiels

Are there any helpers in Llama index

Are there any helpers in Llama index that we can utilize to check for rate limits for specific LLMs?

We are trying to utilize our Azure OpenAI rate limits to the fullest and we want to use OpenAI directly as a backup solution, would this be possible?

6 comments

NNiels

Type

Why do I not get type safety here if I explicitly pass the output_cls?

9 comments

NNiels

What do these options mean?

What do these options mean?

class PydanticProgramMode(str, Enum):
"""Pydantic program mode."""

DEFAULT = "default"
OPENAI = "openai"
LLM = "llm"
GUIDANCE = "guidance"
LM_FORMAT_ENFORCER = "lm-format-enforcer"

(llama_index/core/types.py)

5 comments

NNiels

Is there any way I can improve the

Is there any way I can improve the performance of the Retrieve call Llama index does? It is very slow in out chat application in production:

_CBEventType.RETRIEVE -> 7.412136 seconds

26 comments

NNiels

Hi there, are there any ways to force

Hi there, are there any ways to force the LLM to return a response in markdown format?

I have the following function and it does not return a response in .md format:

Plain Text

def initialize_index(self, namespace: str, model_name="gpt-3.5-turbo-1106"):
    service_context = ServiceContext.from_defaults(
        chunk_size=512,
        llm=OpenAI(temperature=0.7, model_name=model_name),
        callback_manager=self.callback_manager,
    )
    pinecone_index = pinecone.Index(PINECONE_INDEX_ID)
    vector_store = PineconeVectorStore(
        pinecone_index=pinecone_index,
        namespace=namespace,
    )
    storage_context = StorageContext.from_defaults(
        docstore=self.docstore,
        index_store=self.index_store,
        vector_store=vector_store,
    )
    self.index = VectorStoreIndex.from_documents(
       [], storage_context=storage_context, service_context=service_context
    )


def query_stream(self, query: str, namespace: str, model: str):
    full_query = 'Please make sure to respond ONLY with content in the .md format as the response, here is my prompt: ' + query
    self.initialize_index(namespace, model)
    streaming_response = self.index.as_query_engine(
        streaming=True, similarity_top_k=20,
    ).query(full_query)

    for text in streaming_response.response_gen:
        yield text

1 comment

NNiels

[Question]: How to add previous Chat con...

Can someone help me with this issue?

https://github.com/jerryjliu/llama_index/issues/7296

2 comments

NNiels

4o-mini

Anyone else had experience that 4o mini was a lot slower than 3.5 turbo?

7 comments

NNiels

Structured

Does/will llama index use this? https://openai.com/index/introducing-structured-outputs-in-the-api/

16 comments

NNiels

How do i set the output_cls and

How do i set the output_cls and similarity_top_k with the retry query engine?

Plain Text

# this is what i want but output_cls and similarity_top_k are not accepted as args
base_query_engine = index.as_query_engine(llm=llm, filters=filters)

query_engine_presentation_content = RetryQueryEngine(
    query_engine=base_query_engine,
    output_cls=PresentationContentListV1,
    similarity_top_k=10,
)
query_engine_presentation_outline = RetryQueryEngine(
    query_engine=base_query_engine,
    output_cls=PresentationOutlineV1,
    similarity_top_k=10,
)

13 comments

NNiels

Is there a way to directly create a

Is there a way to directly create a query engine object/instance from OpenAI without having an index? I want to make sure my APIs for different use cases are the same.

16 comments

NNiels

Hi, has anyone else been suddenly

Hi, has anyone else been suddenly running into issues where the Llama index returned Pydantic models are based on v1 and empty? Our app randomly starts to throw AttributeError: 'BaseModel' object has no attribute 'model_dump' which comes from pydantic.v1.

Versions:
pydantic>=2.6.4
llama-index==0.10.20
llama-index-embeddings-openai==0.1.6
llama-index-llms-openai==0.1.12
llama-index-program-openai==0.1.4
llama-index-vector-stores-postgres==0.1.3

--

We follow this exact example and are running into this issue:

https://docs.llamaindex.ai/en/stable/examples/query_engine/pydantic_query_engine/

2 comments

NNiels

We're getting a lot of Pydantic

We're getting a lot of Pydantic validation errors in our prod environment for queries using GPT-3-turbo. Is this a common thing? Is there anything we can do to make sure this happens less? I would assume llama index uses JSON mode for OpenAI under the hood?

11 comments

NNiels

Keyword

Is it a known issue that when using the keyword extractor for document with some english terms but mostly other content it stores all of the keywords in english?

Or maybe if we wanted to fix that we would need to make a custom extractor?

10 comments

NNiels

Hi there, can someone help me debug why

Hi there, can someone help me debug why my app is so slow?

For a basic "Summarize this document" query it takes 10 seconds to even start the streaming response (for a doc that has a few words of content and one node):

Plain Text

@app.post("/document/query")
def query_stream(
    query: str = Body(...),
    uuid_filename: str = Body(...),
    email: str = Body(...),
) -> StreamingResponse:
    subscription = get_user_subscription(email)
    model = MODEL_BASIC if subscription == "FREE" else MODEL_PREMIUM
    with token_counter(model, query_stream.__name__):
        filename_without_ext = uuid_filename.split(".")[0]

        # Create index
        index = initialize_index(model)

        document_is_indexed = does_document_exist_in_index(filename_without_ext)

        if document_is_indexed is False:
            logging.info("Re-adding to index...")
            reindex_document(filename_without_ext)

        if is_summary_request(query):
            query = modify_query_for_summary(query, filename_without_ext, model)

        chat_engine = initialize_chat_engine(index, filename_without_ext)
        streaming_response = chat_engine.stream_chat(query) # takes 10 seconds!!

        def generate() -> Generator[str, any, None]:
            yield from streaming_response.response_gen

        return StreamingResponse(generate(), media_type="text/plain")

19 comments

NNiels

Hi @Logan M. We noticed something

Hi @Logan M. We noticed something strange whilst removing our document store and index store from our Llama index storage context. So basically we had the idea that our Docstore and index store were not really useful since our vectore store (PgVector) already contains all of the nessesary info for our app to function.

So basically our question is: Why does our app still fully work after removing the Document store and Index store that uses MongoDB from our service context?

We changed:

Plain Text

document_store = MongoDocumentStore.from_uri(uri=MONGO_DB_URL)
index_store = MongoIndexStore.from_uri(uri=MONGO_DB_URL)

vector_store = PGVectorStore.from_params(
    async_connection_string=f"postgresql+asyncpg://{user}:{password}@{host}:{port}/{database}",
    connection_string=f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}?sslmode=require",
    table_name=PG_VECTOR_DATABASE_DOC_TABLE_NAME,
    embed_dim=1536,
    hybrid_search=True,
    use_jsonb=True,
)

storage_context = StorageContext.from_defaults(
    docstore=document_store,
    index_store=index_store,
    vector_store=vector_store,
)

Plain Text

vector_store = PGVectorStore.from_params(
    async_connection_string=f"postgresql+asyncpg://{user}:{password}@{host}:{port}/{database}",
    connection_string=f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}?sslmode=require",
    table_name=PG_VECTOR_DATABASE_DOC_TABLE_NAME,
    embed_dim=1536,
    hybrid_search=True,
    use_jsonb=True,
)

storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
)

It would be really nice if you could maybe help us learn why the docstore and index store are relevant in the first place and what the implications could be of removing this?

3 comments

NNiels

Hi, what does `num_workers` do in the

Hi, what does num_workers do in the keyword extractor? Is this the amount of CPU threads used?

Example:

Plain Text

KeywordExtractor(llm, keywords=5, num_workers=24),

4 comments

NNiels

Opemai

Is there a way to make sure that Llama index does not send a message to ChatGPT that is bigger than the context window allowed for a specific LLM?

For example, we have some queries that have a top_k_similarity thats high which results in a lot of data as context. This works fine for GPT-4 but exceeds the GPT-3-turbo context window.

24 comments

NNiels

Just tried upgrading llama-index (python

Just tried upgrading llama-index (python) from 0.9.30 to 0.9.40 and got this error. Is this common and how do i fix this?

Plain Text

web-1          | [2024-02-01 10:55:21 +0000] [8] [ERROR] Exception in worker process
web-1          | Traceback (most recent call last):
web-1          |   File "/usr/local/lib/python3.11/site-packages/llama_index/storage/kvstore/mongodb_kvstore.py", line 69, in from_uri
web-1          |     from motor.motor_asyncio import AsyncIOMotorClient
web-1          | ModuleNotFoundError: No module named 'motor'
web-1          | 
web-1          | During handling of the above exception, another exception occurred:

Edit: Seems to be that the Mongo driver changed from pymongo to motor for some reason. Is this correct? I needed to change my requirements.txt unfortunately.

16 comments

NNiels

Hi there, i was curious if it is

Hi there, i was curious if it is possible to quickly check if a specific document is indexed or not. We currently do this in the following way but it is very slow (4sec).

Plain Text

filename_without_ext = "bla":
index = initialize_index(model)
filters = MetadataFilters(filters=[ExactMatchFilter(key="doc_id", value=filename_without_ext)])
document_is_not_indexed = len(
    index.as_retriever(filters=filters, similarity_top_k=1).retrieve("some text"),
) == 0

6 comments

Find answers from the community

Validation Error With Multimodal Azure Openai Package

Optimizing JSON mode for Pydantic models with OpenAI

Ci testing pipelines failing due to import error

Best practices for querying structured output with Pydantic and GPT-4o

Best alternative for deprecated query engine

Seeing this SyntaxWarning locally a lot

Are there any helpers in Llama index

Type

What do these options mean?

Is there any way I can improve the

Hi there, are there any ways to force

[Question]: How to add previous Chat con...

4o-mini

Structured

How do i set the output_cls and

Is there a way to directly create a

Hi, has anyone else been suddenly

We're getting a lot of Pydantic

Keyword

Hi there, can someone help me debug why

Hi @Logan M. We noticed something

Hi, what does `num_workers` do in the

Opemai

Just tried upgrading llama-index (python

Hi there, i was curious if it is