Hello everyone I am trying to read in

rrse

Hello everyone, I am trying to read in and index a set of large pdfs, but I get this error, do you know how I could fix this? InvalidRequestError: This model's maximum context length is 8191 tokens, however you requested 38416 tokens (38416 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.

The above exception was the direct cause of the following exception:

RetryError Traceback (most recent call last)
/tmp/ipykernel_5663/105381258.py in <module>
----> 1 index = GPTVectorStoreIndex(documents)

7 comments

TTeemu

How are you chunking the text?

TTeemu

@rse Which version are you on? Should just work like this:

Plain Text

from llama_index import VectorStoreIndex, SimpleDirectoryReader

# Load documents
documents = SimpleDirectoryReader('data').load_data()

# Create an index
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

TTeemu

https://gpt-index.readthedocs.io/en/stable/deprecated_terms.html#gptvectorstoreindex

rrse

I installed it just last week, so it must be the latest version. But VectorStoreIndex gives the same error. Do I need to manually chunk the data then?

TTeemu

The chunk size should be 1024 by default, what does your code look like?

rrse

I re-ran my notebook and it worked now. Thanks Teemu! My code looks like this:

rrse

from llama_index import VectorStoreIndex, SimpleDirectoryReader, download_loader

data_directory = '../data/raw/'

documents = SimpleDirectoryReader(data_directory).load_data()

index = VectorStoreIndex.from_documents(documents)

index.storage_context.persist(persist_dir='index')

Add a reply

Find answers from the community

Hello everyone I am trying to read in