Find answers from the community

N
Niels
Offline, last seen 2 weeks ago
Joined September 25, 2024
We are randomly getting this error with the multimodal azure openai package:

Plain Text
vision_service-1        | pydantic_core._pydantic_core.ValidationError: 2 validation errors for ChatMessage
vision_service-1        | blocks.0
vision_service-1        |   Unable to extract tag using discriminator ‘block_type’ [type=union_tag_not_found, input_value={‘type’: ‘text’, ‘text’: ‘Describe what you see’}, input_type=dict]
vision_service-1        |     For further information visit https://errors.pydantic.dev/2.9/v/union_tag_not_found
vision_service-1        | blocks.1
vision_service-1        |   Unable to extract tag using discriminator ‘block_type’ [type=union_tag_not_found, input_value={‘type’: ‘image_url’, ‘im...gg==’, ‘detail’: ‘low’}}, input_type=dict]

Anyone has an idea what is up? Nothing changed in our code.
17 comments
L
N
C
Hey guys, is there a way to force JSON mode over function calling for JSON with Pydantic models (OpenAI)? We are noticing 4o-mini is way worse at function calling compared to 3.5-turbo (which we want to deprecate).

Also sources taking about this: https://news.ycombinator.com/item?id=41173223
9 comments
L
W
N
Our CI testing pipelines using Llama index are suddenly failing:

Plain Text
ImportError while importing test module '/home/runner/work/xxx/test_prompts.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
test_prompts.py:10: in <module>
    from index import add_documents_to_index
../../index.py:22: in <module>
    from llama_cloud import FilterCondition
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/llama_cloud/__init__.py:3: in <module>
    from .types import (
/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/llama_cloud/types/__init__.py:21: in <module>
    from .base import Base
E   ImportError: cannot import name 'Base' from 'llama_cloud.types.base' 


Anyone has an idea why?
12 comments
W
N
L
Hey guys, just wondering if there are any best practices querying as a structured output with Pydantic. We are running a structured output call where the class has one property that is a union (can be one of two pydantic classes). This is causing a lot of validation errors because it seems the LLM doesn't understand this.

Would love to hear if anyone has ideas or has also experienced this. For context, we are using GPT-4o.
11 comments
W
N
L
12 comments
N
L
Seeing this SyntaxWarning locally a lot, is this an issue?

/usr/local/lib/python3.12/site-packages/llama_cloud/types/metadata_filter.py:20: SyntaxWarning: invalid escape sequence '*'
6 comments
N
L
Are there any helpers in Llama index that we can utilize to check for rate limits for specific LLMs?

We are trying to utilize our Azure OpenAI rate limits to the fullest and we want to use OpenAI directly as a backup solution, would this be possible?
6 comments
L
N
N
Niels
·

Type

Why do I not get type safety here if I explicitly pass the output_cls?
9 comments
L
N
What do these options mean?

class PydanticProgramMode(str, Enum):
"""Pydantic program mode."""

DEFAULT = "default"
OPENAI = "openai"
LLM = "llm"
GUIDANCE = "guidance"
LM_FORMAT_ENFORCER = "lm-format-enforcer"

(llama_index/core/types.py)
5 comments
W
L
N
Is there any way I can improve the performance of the Retrieve call Llama index does? It is very slow in out chat application in production:

_CBEventType.RETRIEVE -> 7.412136 seconds
26 comments
L
k
N
Hi there, are there any ways to force the LLM to return a response in markdown format?

I have the following function and it does not return a response in .md format:

Plain Text
def initialize_index(self, namespace: str, model_name="gpt-3.5-turbo-1106"):
    service_context = ServiceContext.from_defaults(
        chunk_size=512,
        llm=OpenAI(temperature=0.7, model_name=model_name),
        callback_manager=self.callback_manager,
    )
    pinecone_index = pinecone.Index(PINECONE_INDEX_ID)
    vector_store = PineconeVectorStore(
        pinecone_index=pinecone_index,
        namespace=namespace,
    )
    storage_context = StorageContext.from_defaults(
        docstore=self.docstore,
        index_store=self.index_store,
        vector_store=vector_store,
    )
    self.index = VectorStoreIndex.from_documents(
       [], storage_context=storage_context, service_context=service_context
    )


def query_stream(self, query: str, namespace: str, model: str):
    full_query = 'Please make sure to respond ONLY with content in the .md format as the response, here is my prompt: ' + query
    self.initialize_index(namespace, model)
    streaming_response = self.index.as_query_engine(
        streaming=True, similarity_top_k=20,
    ).query(full_query)

    for text in streaming_response.response_gen:
        yield text
1 comment
W
N
Niels
·

4o-mini

Anyone else had experience that 4o mini was a lot slower than 3.5 turbo?
7 comments
N
L
16 comments
N
L
How do i set the output_cls and similarity_top_k with the retry query engine?

Plain Text
# this is what i want but output_cls and similarity_top_k are not accepted as args
base_query_engine = index.as_query_engine(llm=llm, filters=filters)

query_engine_presentation_content = RetryQueryEngine(
    query_engine=base_query_engine,
    output_cls=PresentationContentListV1,
    similarity_top_k=10,
)
query_engine_presentation_outline = RetryQueryEngine(
    query_engine=base_query_engine,
    output_cls=PresentationOutlineV1,
    similarity_top_k=10,
)
13 comments
L
N
Is there a way to directly create a query engine object/instance from OpenAI without having an index? I want to make sure my APIs for different use cases are the same.
16 comments
W
L
N
Hi, has anyone else been suddenly running into issues where the Llama index returned Pydantic models are based on v1 and empty? Our app randomly starts to throw AttributeError: 'BaseModel' object has no attribute 'model_dump' which comes from pydantic.v1.

Versions:
pydantic>=2.6.4
llama-index==0.10.20
llama-index-embeddings-openai==0.1.6
llama-index-llms-openai==0.1.12
llama-index-program-openai==0.1.4
llama-index-vector-stores-postgres==0.1.3

--

We follow this exact example and are running into this issue:

https://docs.llamaindex.ai/en/stable/examples/query_engine/pydantic_query_engine/
2 comments
L
N
We're getting a lot of Pydantic validation errors in our prod environment for queries using GPT-3-turbo. Is this a common thing? Is there anything we can do to make sure this happens less? I would assume llama index uses JSON mode for OpenAI under the hood?
11 comments
L
N
N
Niels
·

Keyword

Is it a known issue that when using the keyword extractor for document with some english terms but mostly other content it stores all of the keywords in english?

Or maybe if we wanted to fix that we would need to make a custom extractor?
10 comments
N
L
Hi there, can someone help me debug why my app is so slow?

For a basic "Summarize this document" query it takes 10 seconds to even start the streaming response (for a doc that has a few words of content and one node):

Plain Text
@app.post("/document/query")
def query_stream(
    query: str = Body(...),
    uuid_filename: str = Body(...),
    email: str = Body(...),
) -> StreamingResponse:
    subscription = get_user_subscription(email)
    model = MODEL_BASIC if subscription == "FREE" else MODEL_PREMIUM
    with token_counter(model, query_stream.__name__):
        filename_without_ext = uuid_filename.split(".")[0]

        # Create index
        index = initialize_index(model)

        document_is_indexed = does_document_exist_in_index(filename_without_ext)

        if document_is_indexed is False:
            logging.info("Re-adding to index...")
            reindex_document(filename_without_ext)

        if is_summary_request(query):
            query = modify_query_for_summary(query, filename_without_ext, model)

        chat_engine = initialize_chat_engine(index, filename_without_ext)
        streaming_response = chat_engine.stream_chat(query) # takes 10 seconds!!

        def generate() -> Generator[str, any, None]:
            yield from streaming_response.response_gen

        return StreamingResponse(generate(), media_type="text/plain")
19 comments
W
N
L
Hi @Logan M. We noticed something strange whilst removing our document store and index store from our Llama index storage context. So basically we had the idea that our Docstore and index store were not really useful since our vectore store (PgVector) already contains all of the nessesary info for our app to function.

So basically our question is: Why does our app still fully work after removing the Document store and Index store that uses MongoDB from our service context?

We changed:

Plain Text
document_store = MongoDocumentStore.from_uri(uri=MONGO_DB_URL)
index_store = MongoIndexStore.from_uri(uri=MONGO_DB_URL)

vector_store = PGVectorStore.from_params(
    async_connection_string=f"postgresql+asyncpg://{user}:{password}@{host}:{port}/{database}",
    connection_string=f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}?sslmode=require",
    table_name=PG_VECTOR_DATABASE_DOC_TABLE_NAME,
    embed_dim=1536,
    hybrid_search=True,
    use_jsonb=True,
)

storage_context = StorageContext.from_defaults(
    docstore=document_store,
    index_store=index_store,
    vector_store=vector_store,
)


To

Plain Text
vector_store = PGVectorStore.from_params(
    async_connection_string=f"postgresql+asyncpg://{user}:{password}@{host}:{port}/{database}",
    connection_string=f"postgresql+psycopg2://{user}:{password}@{host}:{port}/{database}?sslmode=require",
    table_name=PG_VECTOR_DATABASE_DOC_TABLE_NAME,
    embed_dim=1536,
    hybrid_search=True,
    use_jsonb=True,
)

storage_context = StorageContext.from_defaults(
    vector_store=vector_store,
)


It would be really nice if you could maybe help us learn why the docstore and index store are relevant in the first place and what the implications could be of removing this?
3 comments
N
L
W
Hi, what does num_workers do in the keyword extractor? Is this the amount of CPU threads used?

Example:
Plain Text
KeywordExtractor(llm, keywords=5, num_workers=24),
4 comments
N
L
W
N
Niels
·

Opemai

Is there a way to make sure that Llama index does not send a message to ChatGPT that is bigger than the context window allowed for a specific LLM?

For example, we have some queries that have a top_k_similarity thats high which results in a lot of data as context. This works fine for GPT-4 but exceeds the GPT-3-turbo context window.
24 comments
N
L
Just tried upgrading llama-index (python) from 0.9.30 to 0.9.40 and got this error. Is this common and how do i fix this?

Plain Text
web-1          | [2024-02-01 10:55:21 +0000] [8] [ERROR] Exception in worker process
web-1          | Traceback (most recent call last):
web-1          |   File "/usr/local/lib/python3.11/site-packages/llama_index/storage/kvstore/mongodb_kvstore.py", line 69, in from_uri
web-1          |     from motor.motor_asyncio import AsyncIOMotorClient
web-1          | ModuleNotFoundError: No module named 'motor'
web-1          | 
web-1          | During handling of the above exception, another exception occurred:


Edit: Seems to be that the Mongo driver changed from pymongo to motor for some reason. Is this correct? I needed to change my requirements.txt unfortunately.
16 comments
L
N
Hi there, i was curious if it is possible to quickly check if a specific document is indexed or not. We currently do this in the following way but it is very slow (4sec).

Plain Text
filename_without_ext = "bla":
index = initialize_index(model)
filters = MetadataFilters(filters=[ExactMatchFilter(key="doc_id", value=filename_without_ext)])
document_is_not_indexed = len(
    index.as_retriever(filters=filters, similarity_top_k=1).retrieve("some text"),
) == 0
6 comments
L
N
m