The community member is working on getting the "unstructured" library to use PDF instead of HTML, and is running into an issue with OpenAI requests. The comments discuss the community member's attempts to pass their own language model (LLM) to the UnstructuredElementNodeParser, but they are encountering issues with the LLM not being able to output the expected JSON structure. The community members suggest trying a different LLM, such as Zephyr-7b-beta, but the issues persist. Ultimately, the consensus is that there may not be a reliable way to do this with open-source LLMs at the moment, and the library maintainers are working on improvements to the structured output functionality.
@Logan M I am just gonna make a thread. I am working on getting unstructured to use pdf instead of html. And i am running into an openai request issue again
Embeddings have been explicitly disabled. Using MockEmbedding. 0%| | 0/34 [00:00<?, ?it/s]INFO:openai._base_client:Retrying request to /chat/completions in 0.945186 seconds Retrying request to /chat/completions in 0.945186 seconds This is where i am currerntly sititng
File "/home/adctme/.local/lib/python3.10/site-packages/llama_index/output_parsers/utils.py", line 69, in extract_json_str raise ValueError(f"Could not extract json string from output: {text}") ValueError: Could not extract json string from output: