Find answers from the community

Updated 2 months ago

hi there! i am getting embeddings from a

hi there! i am getting embeddings from a 20k pages pdf but suddenly i got this error:
BadRequestError: Error code: 400 - {'error': {'message': "'$.input' is invalid. Please check the API reference: https://platform.openai.com/docs/api-reference.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

Why is this happening?
L
a
8 comments
hmm that usually means there was a node/document without any content
I thought i fixed this when doing from_documents() ๐Ÿค” Unless you are embedding another way
embed_model = OpenAIEmbedding()
splitter = SemanticChunker(
buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
)

nodes = splitter.get_nodes_from_documents(documents)

print(nodes[6].get_content())
im doing it this way
an data loading this way: documents = SimpleDirectoryReader("/content/data").load_data()
Yea I'm betting either a) semnatic chunker made a chunk with no text, or b) one of your initial documents has no text

a) needs a PR to fix

b) you can probably fix yourself

Plain Text
filtered_docs = [doc for doc in documents if doc.text]
okay i will try๐Ÿ‘
Add a reply
Sign up and join the conversation on Discord