LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

@Logan M I am just gonna make a thread.

@Logan M I am just gonna make a thread.

At a glance

The community member is working on getting the "unstructured" library to use PDF instead of HTML, and is running into an issue with OpenAI requests. The comments discuss the community member's attempts to pass their own language model (LLM) to the UnstructuredElementNodeParser, but they are encountering issues with the LLM not being able to output the expected JSON structure. The community members suggest trying a different LLM, such as Zephyr-7b-beta, but the issues persist. Ultimately, the consensus is that there may not be a reliable way to do this with open-source LLMs at the moment, and the library maintainers are working on improvements to the structured output functionality.

Useful resources

·

@Logan M I am just gonna make a thread. I am working on getting unstructured to use pdf instead of html. And i am running into an openai request issue again

N

L

27 comments

Embeddings have been explicitly disabled. Using MockEmbedding.
0%| | 0/34 [00:00<?, ?it/s]INFO:openai._base_client:Retrying request to /chat/completions in 0.945186 seconds
Retrying request to /chat/completions in 0.945186 seconds
This is where i am currerntly sititng

There's a part where we call the llm to summarize a table

Are you using openai? (I forget lol)

If not, you need to pass in your own llm class

ah ok where do i pass my llm ? I am doing it locally

I saw it mentioned in base_element
so i am trying to pass in the llm like this
node_parser = UnstructuredElementNodeParser(llm=llm)

------------------_-----__

File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/output_parsers/utils.py", line 69, in extract_json_str
raise ValueError(f"Could not extract json string from output: {text}")
ValueError: Could not extract json string from output:

OK cool, step closer

Why is it parsing json tho....

Oh interesting
https://github.com/run-llama/llama_index/blob/d02d59d934df19e66fa91cc6baea8550b370595c/llama_index/node_parser/relational/base_element.py#L140

It's wanting the LLM to output this structure
https://github.com/run-llama/llama_index/blob/d02d59d934df19e66fa91cc6baea8550b370595c/llama_index/node_parser/relational/base_element.py#L36

This probably won't work super well with smaller llms

Yeah i am using a finetuned llama 2 7b parameter model so it is ultimately small but I wonder what model is printing out that structure

but I wonder what model is printing out that structure -- it will be whatever llm you pass into the parser 👍

Shouldnt the llama 7b model be able to spit that out.

Eh, in theory. Llama2 is very bad at structured outputs in my experience, it talks too much. Very hard to parse something out

Whats a good one to use ? I can finetune another model

I would use something like zephyr-7b-beta personally

it works pretty well even out of the box

(assuming the prompt prompt setup and whatnot)

ill give it a shot

@Logan M
I tried with zephyr and i am still getting an error

O wow it didnt just copy and paste as text

dang, it didn't even try to write json either :PSadge:

I don't think there will be a reliable way to do this with open-source LLMs at the moment imo

Working on some improvements to our structured output stuff to make it easier to include few-shot examples in the prompt, among other things

no ETA yet though

it atleast printed out the summary.

Add a reply

Sign up and join the conversation on Discord