Structured input to output

At a glance

Hey my dudes, I'm back at it again. What's the best tool to use to do a specific data retrieval in a unstructured intput defined at first in input with a specific structured input.

38 comments

VVaylonn

I thought of smth like this

VVaylonn

Attachment

VVaylonn

Either with a program who knows what to search for and fill many values

at first in input:

{
    "languages": [
    {
        "Type" : "" ,
        "years" : "" 
    },
    {
        "Type" : "",
        "years" : ""
    },
    ]
}

VVaylonn

and in output

VVaylonn

{
    "languages": [
    {
        "Type" : "Python" (value found in the doc),
        "years" : "3" (here too, at the same time ?)
    },
    {
        "Type" : "java",
        "years" : "1"
    },
    ]
}

VVaylonn

Is it possible to do a query who can fill values at the same time in one query ? Or do I need to retrieve 1 value at a time and fill it in my structured ouptut ?

VVaylonn

Since I want to retrieve information and not to imagine or "create an album exemple", what's the best way to do that ?

VVaylonn

I don't think any existent tool match my request for now

VVaylonn

Or even if i defined the language type at first and I only want the years

VVaylonn

Structured input to output

LLogan M

I think this is possible with the pydantic programs (or the newer pydantic output parsers)

For example

Plain Text

class Experience(BaseModel):
  skill_name: str
  num_years_experience: int

Class Experiences(BaseModel):
  experiences: List[Experience]

VVaylonn

isn't it just hallucinating infos here ?

Attachment

VVaylonn

just to have a fake album who match the structured request ?

VVaylonn

or these infos are from a specific data ?

LLogan M

Nah it's hallucinating. The only input it has is that input string a cell or two above. You could modify that string to have details from the document you are parsing

Or, you can use the output parser to use this with an index in a more normal manner lol

LLogan M

actually nvm

LLogan M

I thought we merged the output parser

LLogan M

guess not

LLogan M

lol

VVaylonn

yeah so the schema I created on the first screen isnt possible for now on

VVaylonn

haha lets go

VVaylonn

https://gpt-index.readthedocs.io/en/latest/how_to/structured_outputs/output_parser.html#langchain and that ?

VVaylonn

Can I have more details on the json format ? like having this

class Experience(BaseModel):
  skill_name: str
  num_years_experience: int

Class Experiences(BaseModel):
  experiences: List[Experience]

LLogan M

I mean, I still think it's possible imo

LLogan M

Yea this might work too! I haven't tried this yet either

LLogan M

So using pydantic, you can convert to/from json super easily. But constructing as class is just really easy (and it's what the function call API from openai expects under the hood - a pydnatic class that was converted to pydantic json output)

VVaylonn

I tried to use pydantic but i couldn't use it with Azure :/

VVaylonn

So I couldnt use the class I created

VVaylonn

The only thing that "matched" what I wanted was the langchain thing

VVaylonn

Plain Text

# define output schema
response_schemas = [
    ResponseSchema(name="doc_name", description="nom de document"),
    ResponseSchema(name="data", description="date de création du document"),
    ResponseSchema(name="site", description="qui est l'organisme qui a créé le document ?"),
    ResponseSchema(name="apave", description="quel est le nom de la personne représentante de l'APAVE"),
    ResponseSchema(name="8.5", description="quel est le nom de la partie 8.5"),
    ResponseSchema(name="page8.5", description="page de début de la partie 8.5"),
    ResponseSchema(name="19011", description="a quelle norme se référer pour les lignes directrices de la partie 9.2.2"),
]

# define output parser
lc_output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
output_parser = LangchainOutputParser(lc_output_parser)

# Prompt de base du chatbot
# from llama_index import Prompt
template = (
...
)

# format each prompt with output parser instructions
fmt_qa_tmpl = output_parser.format(template)
fmt_refine_tmpl = output_parser.format(DEFAULT_REFINE_PROMPT_TMPL)
qa_prompt = QuestionAnswerPrompt(fmt_qa_tmpl, output_parser=output_parser)
refine_prompt = RefinePrompt(fmt_refine_tmpl, output_parser=output_parser)

# query index
query_engine = index.as_query_engine(
    similarity_top_k=3,
    text_qa_template=qa_prompt, 
    refine_template=refine_prompt, 
)
response = query_engine.query(
    "remplis les questions d'une manière détaillée et precise, soit sur a 100%", 
)

print(str(response))

with open("donnes.json", "w") as dt:
    dt.write(str(response))

VVaylonn

doing somthing like that

LLogan M

Yea, it needs support for the function calling api. Has azure added that yet? If they have, we should patch that on our end.

We are actually also about to merge our own azure LLM class, that is hopefully less error prone compared to langchain lol

VVaylonn

I'll take a look, I'm not sure but I think so

VVaylonn

https://stackoverflow.com/questions/76543136/how-to-do-function-calling-using-azure-openai I guess I was wrong someone saying it's "supposed" to do it but in fact it's not working

LLogan M

Lol sheesh, sounds like mircosoft 🥲

VVaylonn

Classic microsoft hahah

VVaylonn

one day it's working, one day it's not

VVaylonn

we can only pray with them

Add a reply

Find answers from the community

Structured input to output