Hey. I'm getting this weird key error

At a glance

The community member is experiencing a "key error" when creating an index from HTML documents using the unstructured module. They have tried rebuilding the index, but the key keeps changing from -1 to 0 and now is stuck on 1. Other community members have provided suggestions, such as trying different HTML files, using Chroma DB instead, and checking the OpenAI API key configuration. However, there is no explicitly marked answer to the issue.

Useful resources

FFried cheese

Hey. I'm getting this weird key error after creating an index from documents extracted via unstructured module (html here). if rebuilt the index, and the key changed from -1 to 0, and now its stuck on 1. Any idea?

Attachment

18 comments

LLogan M

How did you build your index?

FFried cheese

Attachment

FFried cheese

Attachment

FFried cheese

https://github.com/run-llama/llama_index/issues/1769

i referred to this, but i made the index in a fresh notebook

LLogan M

Seems to work for me

https://colab.research.google.com/drive/1vmeIUYmlUHodixrsYYzviHAUcaoLTTvl?usp=sharing

FFried cheese

It worked for me with a pdf. I was thinking it was a issue with html's. I tried another file, but i got the same error. Currently trying on another html file in a new notebook, it takes time to load llama

LLogan M

I'll try with a few other files

LLogan M

Hmm tried with a random html file as well, still worked

Plain Text

!wget 'https://www.engadget.com/hyperloop-one-is-shutting-down-030049106.html'

LLogan M

If I could reproduce, I could look into it further

FFried cheese

alright. I will try a file or two more, if I get it again, share you the notebook in your DM if you don't mind?

If this doesn't work, I'm planning to get text from html directly(beautifulsoup), and then find a way to create embeddings and store it in my index.

LLogan M

Yea feel free to share any notebooks that reproduce the issue 🙏 Would love to know why that happens.

The error to me somehow indicates that the vector store and docstore do not contain the same data (somehow)

FFried cheese

Hey. Tried all of it in a fresh notebook, turns out it gives the error only with that specfic file. Wasnt expecting this though

FFried cheese

A different html file worked

FFried cheese

made index with both the files back and forth in the same session, to confirm

FFried cheese

Hey. So does just defining my openapi key in env will make the query engine use gpt 3.5? (because i dont see any llm defined here)

LLogan M

yea 3.5 is the default -- and text-embedding-ada-002 is the default embedding model

FFried cheese

understood!

FFried cheese

Just so that anyone stumbles here for help, I used the same document but in chroma db and didnt get any errors

Add a reply

Find answers from the community

Hey. I'm getting this weird key error