I am running into a weird issue when trying to parse a CSV File, hitting the OpenAI token limit when trying to generate embeddings using text-embedding-3-large.. Anything stand out int his code that would cause the issue ?
embedding = OpenAIEmbedding(api_key="XXX", model="text-embedding-3-large")
node_parser = SentenceWindowNodeParser.from_defaults(window_size=3)
dir_reader = SimpleDirectoryReader(input_files=[tmpfile])
docs = dir_reader.load_data(show_progress=True)
for doc in docs:
doc.metadata["external_id"] = external_id
nodes = node_parser.get_nodes_from_documents(docs, show_progress=True)
print("Getting batched embeddings for nodes from embedding " + embedding.model_name + "..")
text_chunks = [node.get_content(metadata_mode=MetadataMode.EMBED) for node in nodes]
embeddings = embedding.get_text_embedding_batch(text_chunks, show_progress=True)
This then errors out with
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 71420 tokens (71420 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}