Find answers from the community

Updated 4 months ago

hi there! i am getting embeddings from a

At a glance

hi there! i am getting embeddings from a 20k pages pdf but suddenly i got this error:
BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Why is this happening?

8 comments

LLogan M

hmm that usually means there was a node/document without any content

LLogan M

I thought i fixed this when doing from_documents() 🤔 Unless you are embedding another way

aalvarojauna

embed_model = OpenAIEmbedding()
splitter = SemanticChunker(
buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
)

nodes = splitter.get_nodes_from_documents(documents)

print(nodes[6].get_content())

aalvarojauna

im doing it this way

LLogan M

ahhh

aalvarojauna

an data loading this way: documents = SimpleDirectoryReader("/content/data").load_data()

LLogan M

Yea I'm betting either a) semnatic chunker made a chunk with no text, or b) one of your initial documents has no text

a) needs a PR to fix

b) you can probably fix yourself

Plain Text

filtered_docs = [doc for doc in documents if doc.text]

aalvarojauna

okay i will try👍

Add a reply