KasparTr

Hi, new here and learning to use the GPTSimpleVectorIndex.
I am attempting to index a larger documentation file (consists of multiple files).

When using SimpleDirectoryReader and getting the documents array, the GPTSimpleVectorIndex(documents) will throw this error

Plain Text

Token indices sequence length is longer than the specified maximum sequence length for this model

I know its too much content for the model, but what I don't understand, is how do you index the entire documentation.

I attempted to break the information into chunks and index them separately, but indexing in the following way throw the same warning

Plain Text

for doc in documents:
    indexes.append(GPTSimpleVectorIndex([doc]))

Been trying to debug this with GPT4 but I think I need some human intellect here 🙂

Any tips how to index a larger dataset?

Find answers from the community

Hi new here and learning to use the