Node parser

At a glance

Hello, when using the get_nodes_from_documents method from UnstructuredElementNodeParser together with OpenAI I get

Plain Text

Embeddings have been explicitly disabled. Using MockEmbedding. 0it [00:00, ?it/s]

And then whenever I try to get the node_mappings dictionary, it is always empty, no matter which html file I use.
Below is the full code and the output:

Plain Text

from llama_index.readers.file.flat_reader import FlatReader
from llama_index.node_parser import UnstructuredElementNodeParser
from llama_index.llms import OpenAI
from pathlib import Path

llm = OpenAI(model="gpt-3.5-turbo", api_key="sk-")

# !wget "https://www.dropbox.com/scl/fi/mlaymdy1ni1ovyeykhhuk/tesla_2021_10k.htm?rlkey=qf9k4zn0ejrbm716j0gg7r802&dl=1" -O tesla_2021_10k.htm

reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))

node_parser = UnstructuredElementNodeParser(llm=llm)
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)
base_nodes_2021, node_mappings_2021 = node_parser.get_base_nodes_and_mappings(raw_nodes_2021)
print(len(node_mappings_2021))

Plain Text

Embeddings have been explicitly disabled. Using MockEmbedding. 0it [00:00, ?it/s]
0

12 comments

LLogan M

I don't think that block of code is creating that warning?

That block of code specifically happens when you set embed_model=None somewhere, like in the service context

CChris

Thats the only block of code I run, and I get the warning from the get_nodes_from_documents method. I didnt set embed_model=None anywhere.
Also even if I explicitly set the embedding model I still get the warning:

Plain Text

embed_model = OpenAIEmbedding()
os.environ['OPENAI_API_KEY'] = 'sk-'
llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
set_global_service_context(service_context)

reader = FlatReader()
docs_2021 = reader.load_data(Path("tesla_2021_10k.htm"))

node_parser = UnstructuredElementNodeParser(llm=llm)
raw_nodes_2021 = node_parser.get_nodes_from_documents(docs_2021)

Seems like get_nodes_from_documents doesnt recognize any embeddings model and maybe that causes the problem with node_mappings being always an empty dict.

SStelios

@Chris
@Logan M
I also tried to reproduce the error, using this notebook.
https://docs.llamaindex.ai/en/stable/examples/query_engine/sec_tables/tesla_10q_table.html
I also face the same issue...

LLogan M

somehow, unstructured is not finding any tables... not sure if they updated their package or what. Trying to debug :PSadge:

LLogan M

unstructured is just failing hard I think. All the tables it finds are irregular, and we can't convert into dataframes. I might try downgrading my unsctrucred version a bit and see if something changed..

LLogan M

aha figured it out

LLogan M

pip install unstructured==0.10.30 seems to work. Something after that changed their table parsing 🤔

CChris

Nice, this version seems to be working fine but It stills doesn't recognize my embed models, did you also have this warning Embeddings have been explicitly disabled. Using MockEmbedding. when testing?

LLogan M

Yea that's expected actually

LLogan M

It's fine. It's using a summary index under the hood to create summaries of tables

LLogan M

And it's disabling the embed model specifically at that step

LLogan M

So it's all good

Add a reply

Find answers from the community

Node parser