The community member is experiencing an issue in production where they are unable to determine if the problem is due to their chunks being too large when trying to get embeddings or if their batch size is too big in the Qdrant client. They are currently using the SemanticSplitterNodeParser and are receiving an error message indicating that the maximum context length has been exceeded. The comments suggest that the Semantic splitter does not account for chunk size and that the community member may need to run the outputs through a secondary splitter to ensure a maximum size.
Hi guys, i'm running into an issue in production, i can't tell if this because my chunks are too big when trying to get embeddings or if my batch size is too big in qdrant client. (right now the batch size is 16). If the chunks are too big shouldn't that be handled by the NodeParser, I'm currently using SemanticSplitterNodeParser?
"2024-07-17 14:03:22,630 - ERROR - 13 - ThreadPoolExecutor-0_0 - root - index_asset - index_asset.py:39 - index_asset() >>> Error indexing asset into Qdrant: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 10125 tokens (10125 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}"