ChromaDB doesnt save etc

Hi,
So i tried everything to get ChromaDB running and saving its DB.

But it still doesnt save.

Codes in Threadd:

91 comments

Saving:

Plain Text

# import
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.embeddings import HuggingFaceEmbedding
from IPython.display import Markdown, display
import chromadb
from llama_index.llms import LlamaCPP
print("Calling LLM")
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path="./models/em_german_13b_v01.Q8_0.gguf",
    temperature=0.1,
    max_new_tokens=4048,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=8128,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    # model_kwargs={"n_gpu_layers": 1},
    # transform inputs into Llama2 format
    # messages_to_prompt=messages_to_prompt,
    # completion_to_prompt=completion_to_prompt,
    verbose=True,
    
)
print("Called LLM")
print("Making PersistentClient")

part 1

TTay

Plain Text

db = chromadb.PersistentClient(path="./chroma_db")
print("Made PersistentClient")
print("Making client")
print("Made client")
print("Creating collection")
chroma_collection = db.get_or_create_collection("sampledata")#
print(chroma_collection)
print("Created collection")
# define embedding function
print("Creating embedding model")
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")
print("Created embedding model")
# load documents
print("Loading documents")
documents = SimpleDirectoryReader("./sample_data").load_data()
print("Loaded documents")

# set up ChromaVectorStore and load in data
print("Creating vector store")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
print("Created vector store")
print("Loading data into vector store")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
print("Loaded data into vector store")
print("Making ServiceContext")
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-base-en-v1.5")

print("Made ServiceContext")
print("Making index")
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

print("Made index")

# Query Data
print("Querying data")
query_engine = index.as_query_engine()
print("Queried data")
print("Getting response")
response = query_engine.query("Was ist der Prozess \"Düngen\"? Bitte nutze Leerzeichen zwischen den Wörtern. Und nutze Satzzeichen.")
print("Got response")
print("Printing response")
print(response)

print("Printed response")
print("Printing response source nodes")
print(response.source_nodes)
print("Printed response source nodes")

Part 2

TTay

Calling it in another file

Plain Text

import chromadb
from llama_index import VectorStoreIndex
from llama_index.vector_stores import ChromaVectorStore
from llama_index.storage.storage_context import StorageContext
from llama_index.llms import LlamaCPP
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext

# initialize client
db = chromadb.PersistentClient(path="./chroma_db")
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path="./models/em_german_13b_v01.Q8_0.gguf",
    temperature=0.1,
    max_new_tokens=4048,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=8128,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    # model_kwargs={"n_gpu_layers": 1},
    # transform inputs into Llama2 format
    # messages_to_prompt=messages_to_prompt,
    # completion_to_prompt=completion_to_prompt,
    verbose=True,
    
)
# get collection
chroma_collection = db.get_or_create_collection("sampledata")
print(chroma_collection)
# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-base-en-v1.5")

# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context, service_context=service_context
)


# create a query engine
query_engine = index.as_query_engine()
response = query_engine.query("Was ist der Prozess \"Düngen\"? Bitte nutze Leerzeichen zwischen den Wörtern. Und nutze Satzzeichen.")
print(response)
print(response.source_nodes)

LLogan M

@Tay you never used the storage context in the first part

Should be
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)

TTay

In the first part I got a response tho, in the second I didnt

LLogan M

Because in the first, it just created the default in-memory vector db

Then in the second part, you actually use the storage context properly, but it's empty becasue it was unused in part 1

TTay

Okay, let me tesz

TTay

Part 1 and part 2 are the same files btw

TTay

Ayyy it worksss

TTay

Thank you @Logan M

TTay

Heya @Logan M,

I now have it in a FastAPI, i would like to make a Chat History per IP, How do i add the Chat History to the LLM as a Context if i have my Chat History as a Array?

TTay

Ah and atm i get this.
Can i request the Filenames instead of the DocIDs?

Attachment

LLogan M

chat_engine.chat("hello", chat_history=chat_history) will let you pass in the chat history as a list

LLogan M

or you can call the LLM directly with llm.chat(chat_history)

TTay

I have a query engine. Which is my index. Would this work?:

Plain Text

chat_engine = index.as_chat_engine()

TTay

Instead of as_chat_engine

LLogan M

Yea if you want chat history, you'll need to use a chat engine or agent 👍

You might have to try a few different chat modes to find one that works best for you

TTay

Is there a Layout of how the Chat History has to look=

LLogan M

Hmm just a list of ChatMessage objects 👀

TTay

Is it somewhere in the Doc?

TTay

found it

TTay

Does the Assistant provide its sourcetext? Like where it got the Information from

TTay

now its wierd

Attachments

TTay

aaand now its fully broken

TTay

Error:

Plain Text

TypeError: Object of type ChatMessage is not JSON serializable

Code:
Attatched

LLogan M

Yea, need to convert it to JSON first (can't leave it as pydantic for the API response)

Plain Text

response_data = {
        "message": message.json(),
        "answer": responsestuff.response,
        "timeinfo": time,
        #"sourcetext": responsestuff.get_formatted_sources()
        "chat_history": [x.json() for x in custom_chat_history],
    }

Then, if you need to go from JSON to ChatMessage object

Plain Text

from llama_index.llms import ChatMessage

chat_history = [ChatMessage.parse_raw(x) for x in json_chat_history]

TTay

ahh, okay lemme try

TTay

I think it'd be responsestuff.json().response cuz message is a string

TTay

AttributeError: 'AgentChatResponse' object has no attribute 'json'

TTay

So no, its none of em

LLogan M

whoops -- probably just the chat_history that needs to be modified then

LLogan M

I didn't look close enough 😅

LLogan M

Plain Text

response_data = {
        "message": message,
        "answer": responsestuff.response,
        "timeinfo": time,
        #"sourcetext": responsestuff.get_formatted_sources()
        "chat_history": [x.json() for x in custom_chat_history],
    }

TTay

That worked somehow

TTay

now the thing is

TTay

Attachment

TTay

It doesnt respond

TTay

as seen here it responds to the initial message after 2 more questions

LLogan M

For that --- I don't have an answer 😅 Maybe try print query_engine.chat_history after each chat, to make sure it looks correct?

LLogan M

tbh it could be llamacpp too -- if you have access to openai or similar, maybe confirm the behaviour with it

TTay

Sadly I don't. So I can't test it

TTay

If it's the same as custom_chat_history, then it shouldn't be an issue

TTay

I could try appending the users message to the chat history before it asks the chat engine tho

LLogan M

maaaybe, although it should already be doing that under the hood. But worth a shot

LLogan M

Yea it should be the same, but just a sanity check

TTay

Lemme try that in a sec, rebooting for an update

TTay

okay its back, now imma try

TTay

FUCK YES!!!!

Attachment

LLogan M

ayyyy it works

TTay

Not really nvm

TTay

Just got that after the second message

LLogan M

Something is going into response_data that's a chat message object

TTay

its fr a rollercoaster

LLogan M

you kept the x.json() stuff?

Nah

lol

Only in Chat History

If you meant that

But the responsestuff.json or sth game me the error so i didnt

LLogan M

ah yea that's what I meant -- might have to manually debug what part of response_data is a chat message object

TTay

OH WAIT I AM AN IDIOT

TTay

Plain Text

if responsestuff.response == "":
        response_data = {
            "message": message,
            "answer": "Ich weiß nicht, es tut mir leid.",
            "timeinfo": time,
            # "sourcetext": responsestuff.get_formatted_sources()
            "chat_history": custom_chat_history,
        }
        responsestuff.response = "Ich weiß nicht, es tut mir leid."

I FORGOT TO ADD THE x.json() STUFF ON THE NO RESPONSE FALLBACK

LLogan M

ooooo

TTay

Sooo can i somehow implement a Prompt?

TTay

As it doesnt know abt what i asked before

TTay

Attachment

TTay

It might has gotten dumber tho idk

LLogan M

Good thing I recently went to Austria, somehow I am understanding some of these messages 😆

responsestuff.response = "Ich weiß nicht, es tut mir leid." is some hardcoded message when the response is empty right?

TTay

Ooo thats good. Yes. Its just so the Bot doesnt just send "insert void here"

LLogan M

So the real issue is that it's responding with a blank on the second message 🤔

TTay

yes

TTay

but it does that every time when it doesnt know an answer

TTay

even tho that was the question we always used. And it worked

TTay

Everytime it doesnt know an answer it just gives me a void

LLogan M

That seems weird 🤔 Did you tell it to do that?

TTay

it just likes doing that

LLogan M

interesting 😅 Hmm not sure then. I think its another case where we need to double check the chat history?

The default as_chat_engine() is a react agent. You might have better luck with another mode -- maybe try index.as_chat_engine(chat_mode="condense_plus_context")

TTay

Kemme try

TTay

Nope, didnt help in any way

TTay

dunno what to do lol

TTay

@Logan M Imma let my laptop run through the big DB, maybe that helps. ATM I use a small portion of my whole data

TTay

Later during the morning

TTay

nvm, misread where to add thst. buuut now its telling me ValueError: Unknown chat mode: condense_plus_context

LLogan M

Hmm it's in the code, but it's a little new so you might not have it installed yet

https://github.com/run-llama/llama_index/blob/f1a4fca0ed098f70868a896e39625ecc6f89a737/llama_index/chat_engine/types.py#L258

TTay

Okay, then Imma look to update llama_index

TTay

Now it just doesn't reply after the first message

LLogan M

I feeeeel like this is an LLM error -- working with llamacpp is really annoying tbh 😅 If this was me, I'd be putting a break point in the actual LLM code, or making sure the inputs aren't too big

TTay

Yea

Add a reply

Find answers from the community

ChromaDB doesnt save etc