Find answers from the community

Updated 4 months ago

Token indices sequence length is longer

At a glance

The community member is encountering an error where the token indices sequence length is longer than the specified maximum sequence length for the model. They have updated the library to the latest version but are still seeing this error. The comments suggest that this error is due to using the GPT2 tokenizer with Python 3.8, and that it is a benign error that does not affect the functioning of the chat engine. The community members discuss the cause of the error and provide a solution, but there is no explicitly marked answer.

Token indices sequence length is longer than the specified maximum sequence length for this model (1215 > 1024). Running this sequence through the model will result in indexing errors. I updaetd the library to latest one and saw this error. WOnder where I could specify the length to be more than 1024.
L
M
5 comments
Are you using python3.8?

This error is very benign, it's due to using a transformers tokenizer for python 3.8
At least, the last time I tracked this down that was the cause lol
It's using the GPT2 tokenizer, which is why it gives that warning (but we aren't using a GPT2 model, so it doesn't matter)
Oh, no wonder the chat engine was working alright.
Add a reply
Sign up and join the conversation on Discord