so what are you saying?
if the code would make a single LLM call, you think the index could influence it?
i think i havent updated llamaindex for a while, i could try that too.
but here is more of my code including the index:
embed_model_name = 'BAAI/bge-small-en-v1.5'
embed_model = HuggingFaceEmbedding(
model_name=embed_model_name,
device='cuda',
normalize='True'
)
service_context = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context, storage_context=storage_context)
from llama_index.memory import ChatMemoryBuffer
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
i even tried
before prompting the first question, because i noticed if i dont call the reset after a while it gets buggy, but starting with it didnt change anything (i do end with a reset after the 3 questions to keep it bug free)