Find answers from the community

s
F
Y
a
P
Updated 2 years ago

Hello I m getting random crashes when

Hello, I'm getting random crashes when using GPT3 to calculate embeddings using GPTSimpleVectorIndex. This is the error msg:
01:17:38.145 error_code=None error_message="[''] is not valid under any of the given schemas - 'input'" error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False

Note that I upgraded GPT-Index to the latest version and that sometimes when the document store contains less documents it works fine (so no issues with my OpenAI's API KEY)
j
y
.
25 comments
Hey @yoelk , hmm could you post a full stack trace?
Hey @jerryjliu0 Sure. Here's the full output (note that I've printed the document store which contains only a single 70KB text file)
what's the code that's triggering this error ?
documents = SimpleDirectoryReader(folder).load_data() print(documents) index = GPTSimpleVectorIndex(documents) index.save_to_disk(filename) st.session_state[f'{filename}'] = index
I guess the issue here is with creating the embeddings
When trying other text files, after I print the document store I get the below output from the embedding process which works fine:

2023-02-19 09:05:11.276 > [build_index_from_documents] Total LLM token usage: 0 tokens 2023-02-19 09:05:11.276 > [build_index_from_documents] Total embedding token usage: 98947 tokens
oh hm. under the hood by default we just call openai embedding api. do you think it's just not able to recognize certain chars?
It might be the case
I can send you the original doc
Unfortunately I didn't manage to debug it
@jerryjliu0 I came across this thread
https://community.openai.com/t/embeddings-create-improve-invalidrequesterror-message-a-is-not-valid-under-any-of-the-given-schemas-input-for-large-arrays/48982

Could it be that when chunking up the text you allow empty chunks to be created and forwarded to Open AI's embedding?
@yoelk nice find, that's very possible...
would you be able to send me some sample data/code? i can try to look into a fix
@jerryjliu0 Took me some time to find the example but I randomly tried different documents I found on the net and found the attached one which causes the same error
thanks @yoelk ! i'll try taking a stab at this
Thanks, @jerryjliu0 I basically used the directory reader with only this file in the folder and then I used the GPTSimpleVectorIndex with chunk size=256
Hey @jerryjliu0 , did you get the chance to look at it? I'm still getting the same error on some files
INFO:openai:error_code=None error_message="[''] is not valid under any of the given schemas - 'input'" error_param=None error_type=invalid_request_error message='OpenAI API error received' stream_error=False
@yoelk that usually means that you're sending a blank string as a doc to openai's embedding api
can you double check your Document objects and make sure that none of them contain blank strings?
@jerryjliu0 I agree, but that happens with the default chunking in GPTSimpleVectorIndex (tried different chunk sizes and issue reproduced in some of them). When I added Langchain's text splitter I had no issues.
Got it. You're saying you can repro this with the text above right? I can try it out
sounds good. i was able to repro. will look into a fix!
in the meantime yeah you can manually try plugging in a langchain text splitter
can confirm this also worked to fix it for me
Add a reply
Sign up and join the conversation on Discord