Same thing with
StorageContext
, I haven't gotten a chance to try, but I have that importing like this:
from llama_index.core.storage.storage_context import StorageContext
Are you running in a notebook or just py scripts?
in fact the automatic upgrade fails too:
C:\Users\thecr>llamaindex-cli upgrade Z:\Documents\GitHub\FrogBot\modules\utils
Traceback (most recent call last):
File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\Scripts\llamaindex-cli.exe\__main__.py", line 4, in <module>
File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\core\command_line\command_line.py", line 4, in <module>
from llama_index.core.command_line.rag import RagCLI, default_ragcli_persist_dir
File "C:\Users\thecr\AppData\Local\Programs\Python\Python310\lib\site-packages\llama_index\core\command_line\rag.py", line 9, in <module>
from llama_index.core import (
ImportError: cannot import name 'Response' from 'llama_index.core' (unknown location)
I see you aren't using a venv
python -m venv venv
source venv/bin/activate
pip install -U llama-index
I've confirmed this works both locally (in a fresh venv) and in google colab
10-4 Iโll give it a shot when I can
I actually took this chance to update/upgrade from python 3.10 to 3.12, this fixed the import issues. Got a new error with the auto update for llama index:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Scripts\llamaindex-cli.exe\__main__.py", line 7, in <module>
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Lib\site-packages\llama_index\core\command_line\command_line.py", line 269, in main
args.func(args)
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Lib\site-packages\llama_index\core\command_line\command_line.py", line 227, in <lambda>
upgrade_parser.set_defaults(func=lambda args: upgrade_dir(args.directory))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Lib\site-packages\llama_index\core\command_line\upgrade.py", line 283, in upgrade_dir
upgrade_file(str(file_ref))
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Lib\site-packages\llama_index\core\command_line\upgrade.py", line 267, in upgrade_file
upgrade_py_md_file(file_path)
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Lib\site-packages\llama_index\core\command_line\upgrade.py", line 249, in upgrade_py_md_file
lines = f.readlines()
^^^^^^^^^^^^^
File "C:\Users\thecr\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 1798: character maps to <undefined>
yes, I'm not using veuv still, I know...
Hmmm, seems like a bug reading a file/directory actually
I did each file manually, and it was 'successful'. I'm setting up a venv for vs code right now
jk to python 3.12, some of the packages ya'll use have to be above 3.7 and below 3.12
yea was going to say, I've never even tried 3.12 lol
nice, everything is working for imports, now to fix everything that broke
hopefully not too much work! Feel free to ask any questions!
the blog says that service context is no more, yet I'm looking at the docs and seeing it mentioned, and I'm using it for pass the llm still for my chat engine, am I supposed to do it another way? cause when I pass the llm directly for the chat engine, it uses 3.5 and not 4 turbo like I wanted.
Service context is deprecated, but is supposed to still work ๐
What does your code look like?
this one works:
client = QdrantClient(os.getenv('QDRANT_URL'), api_key=os.getenv('QDRANT_API'))
vector_store = QdrantVectorStore(client=client, collection_name="openpilot-data-nochunk")
llm = OpenAI(model="gpt-4-turbo-preview", max_tokens=1000)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)
async def process_message_with_llm(message, client):
content = message.content.replace(client.user.mention, '').strip()
if content:
try:
async with message.channel.typing():
memory = ChatMemoryBuffer.from_defaults(token_limit=8000)
context = await fetch_context_and_content(message, client, content)
memory.set(context + [HistoryChatMessage(f"{content}", Role.USER)])
chat_engine = index.as_chat_engine(
chat_mode="condense_plus_context",
memory=memory,
similarity_top_k=5,
context_prompt=(
f"You are {client.user.name}, a Discord bot, format responses as such."
"\nTopic: OpenPilot and its various forks."
"\n\nRelevant documents for the context:\n"
"{context_str}"
"\n\nInstruction: Use the previous chat history or the context above to interact and assist the user."
)
)
chat_response = chat_engine.chat(content)
if not chat_response or not chat_response.response:
await message.channel.send("There was an error processing the message." if not chat_response else "I didn't get a response.")
return
response_text = chat_response.response
response_text = re.sub(r'^[^:]+:\s(?=[A-Z])', '', response_text)
await send_long_message(message, response_text)
except Exception as e:
await message.channel.send(f"An error occurred: {str(e)}")
This one doesn't:
client = QdrantClient(os.getenv('QDRANT_URL'), api_key=os.getenv('QDRANT_API'))
vector_store = QdrantVectorStore(client=client, collection_name="openpilot-data-nochunk")
llm = OpenAI(model="gpt-4-turbo-preview", max_tokens=1000)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model)
async def process_message_with_llm(message, client):
content = message.content.replace(client.user.mention, '').strip()
if content:
try:
async with message.channel.typing():
memory = ChatMemoryBuffer.from_defaults(token_limit=8000)
context = await fetch_context_and_content(message, client, content)
memory.set(context + [HistoryChatMessage(f"{content}", Role.USER)])
chat_engine = index.as_chat_engine(
chat_mode="condense_plus_context",
llm=llm,
memory=memory,
similarity_top_k=5,
context_prompt=(
f"You are {client.user.name}, a Discord bot, format responses as such."
"\nTopic: OpenPilot and its various forks."
"\n\nRelevant documents for the context:\n"
"{context_str}"
"\n\nInstruction: Use the previous chat history or the context above to interact and assist the user."
)
)
chat_response = chat_engine.chat(content)
if not chat_response or not chat_response.response:
await message.channel.send("There was an error processing the message." if not chat_response else "I didn't get a response.")
return
response_text = chat_response.response
response_text = re.sub(r'^[^:]+:\s(?=[A-Z])', '', response_text)
await send_long_message(message, response_text)
except Exception as e:
await message.channel.send(f"An error occurred: {str(e)}")
will make a PR -- thanks for explaining!
Just an update, it only breaks when you do index.as_chat_engine
, if you do something like, CondensePlusContextChatEngine
, it works as intended
Gotcha ๐ซก Yea, as_chat_engine isn't passing in the llm as needed
uhh, I might have lied, CondensePlusContextChatEngine
might not be working with llm=llm
either...
Actually, I don't think chat engine is working with any passed llm in anyway right now...
I threw service_context back in to test, and it's not passing the llm either from the looks of it
client = QdrantClient(os.getenv('QDRANT_URL'), api_key=os.getenv('QDRANT_API'))
vector_store = QdrantVectorStore(client=client, collection_name="openpilot-data-nochunk")
llm = OpenAI(model="gpt-4-turbo-preview", max_tokens=1000)
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
index = VectorStoreIndex.from_vector_store(vector_store, embed_model=embed_model, llm=llm, service_context=service_context)
async def process_message_with_llm(message, client):
content = message.content.replace(client.user.mention, '').strip()
if content:
try:
async with message.channel.typing():
memory = ChatMemoryBuffer.from_defaults(token_limit=8000)
context = await fetch_context_and_content(message, client, content)
memory.set(context + [HistoryChatMessage(f"{content}", Role.USER)])
chat_engine = CondensePlusContextChatEngine.from_defaults(
retriever=index.as_retriever(similarity_top_k=5, llm=llm),
llm=llm,
memory=memory,
context_prompt=(
f"You are {client.user.name}, a Discord bot, format responses as such."
"\nTopic: OpenPilot and its various forks."
"\n\nRelevant documents for the context:\n"
"{context_str}"
"\n\nInstruction: Use the previous chat history or the context above to interact and assist the user."
)
)
chat_response = chat_engine.chat(content)
if not chat_response or not chat_response.response:
await message.channel.send("There was an error processing the message." if not chat_response else "I didn't get a response.")
return
response_text = chat_response.response
response_text = re.sub(r'^[^:]+:\s(?=[A-Z])', '', response_text)
await send_long_message(message, response_text)
except Exception as e:
await message.channel.send(f"An error occurred: {str(e)}")
I added
llm=llm
everywhere I could, and none of them are passing to the chat engine for use:
An error occurred: Error code: 400 - {'error': {'message': "This model's maximum context length is 4097 tokens. However, your messages resulted in 4572 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
ugh, we got a case of the "million merge conflicts" here I think
We had a branch for v0.10 and a branch for service context, and merged them.... seems like some changes got washed away :PSadge: Slightly scary to think about
Let me fix that chat engines first
I'm only on my test env, my 'production' is on a server and isn't being touched for a while
good news is, this is only for CondensePlusContext
Basically, it had llm = llm_from_settings_or_context(Settings, service_context)
What it needed was llm = llm or llm_from_settings_or_context(Settings, service_context)
Its less bad than I thought, i was looking at an old checkout when I got scared lol
actual v0.10.0 is good (besides this)