We use PydanticProgramExtractor to get a

At a glance

We use PydanticProgramExtractor to get a list of tags aswell as a summary and we see a strange error where content is repeated endlessly. This then causes the validation to fail.

This is our code:

Plain Text

EXTRACT_TEMPLATE_STR = """\
Here is the content of a section:
----------------
{context_str}
----------------
Given the contextual information, extract out a {class_name} object.\
"""

openai_program_summary = OpenAIPydanticProgram.from_defaults(
    llm=get_llm(model=MODEL_BASIC),
    output_cls=NodeSummaryMetadata,
    prompt_template_str="You must answer in the same language as the context given. {input}",
    extract_template_str=EXTRACT_TEMPLATE_STR,
)

openai_program_keywords = OpenAIPydanticProgram.from_defaults(
    llm=get_llm(model=MODEL_BASIC),
    output_cls=NodeKeywordsMetadata,
    prompt_template_str="You must answer in the same language as the context given. {input}",
    extract_template_str=EXTRACT_TEMPLATE_STR,
)

summary_extractor = PydanticProgramExtractor(program=openai_program_summary, input_key="input", num_workers=12)
keywords_extractor = PydanticProgramExtractor(program=openai_program_keywords, input_key="input", num_workers=12)

8 comments

MMike

And here is an example of an error:

Plain Text

1 validation error for NodeKeywordsMetadata
__root__
  Unterminated string starting at: line 1 column <NUMBER> (char <NUMBER>) (type=value_error.jsondecode; msg=Unterminated string starting at; doc={"excerpt_keywords":["MMS-feasibility","EFX-feasible","strong envy","reallocation","valuation function","PR algorithm","2-partition","3-partition","invariants","Lemma 4","allocation","agents","bundle","scenario","allocation scenario","allocation X","agent 1","agent 2","agent 3","valuation functions","valid partition","output","favourite bundle","favourite","max","min","feasibility","case","observation","proof","analysis","valid","run","pick","choose","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe","observe; pos=<NUMBER>; lineno=1; colno=<NUMBER>)

Anything we are doing wrong here?

LLogan M

Nothing you are doing wrong, just the LLM having a freak out it seems lol

LLogan M

Something to do witht the content it is reading I guess?

LLogan M

What LLM are you using?

LLogan M

Try playing with the temperature a bit

MMike

we use a temp of 0.0, this is using Azure GPT 3.5 Turbo 0125. But we also saw this same situation using the same model with OpenAI's API

MMike

It is quite rare though so can't really reproduce it easily...

LLogan M

yea seems like something in the input prompt is causing the LLM to just freak out -- not much can be done I think, besides maybe figuring out what piece of text caused this and if it needs to be cleaned?

Add a reply

Find answers from the community

We use PydanticProgramExtractor to get a