Find answers from the community

Updated 2 months ago

Hey guys I am new for toolkits When i

Hey guys, I am new for toolkits. When i run the TestEssay.ipynb example from "Starter Tutorial" and used default setting(LlamaCPP=llama2-13b-chat and model_name=BAAI/bge-small-en), then I got Error "ValueError: Requested tokens (4005) exceed context window of 3900". Thanks so much. The loading is successful:
**
llama.cpp: loading model from ~/Library/Caches/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 3900
llama_model_load_internal: n_embd = 5120
...
llama_model_load_internal: mem required = 6983.72 MB (+ 3046.88 MB per state)
llama_new_context_with_model: kv self size = 3046.88 MB
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
llama_new_context_with_model: compute buffer total size = 336.03 MB
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-small-en
Load pretrained SentenceTransformer: BAAI/bge-small-en
~/Documents/University/llamaIndex/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cpu
Use pytorch device: cpu
INFO:llama_index.indices.common_tree.base:> Building index from nodes: 1 chunks
Building index from nodes: 1 chunks
llama_tokenize_with_model: too many tokens
L
m
4 comments
Do you have the full traceback?
Thanks, here is the full traceback. I think the error is triggered by llama_cpp, but i am curious if there are any solutions by change the settings in llamaIndex. To reproduce, just run TestEssay.ipynb without OpenAPI API.
Hmm yea my guess is the token counting is causing some minor issue ๐Ÿ˜”

Easiest fix is just lowering context_window slightly in the service context, like to 3800
Thanks for that ๐Ÿ˜„ after reading the documentation page, i think it is easy to solve.
Add a reply
Sign up and join the conversation on Discord