Find answers from the community

Updated 2 years ago

Logan M 8260hey so im trying to use

@Logan Mhey, so im trying to use custom llm inside create_llama_chat_agent llm=customLLM() ac = create_llama_chat_agent( toolkit=toolkit, llm=llm, memory=memory, verbose=True ). im unable to use custom llm inside this.
I
L
100 comments
Its a question answering chain but whne i try to use it, it just gives me error
The llama index retriever is not compatible with langchain as far as I know 🤔
@Logan MI FIGURED OUT THE STREAMING OVER API!!! with custom LLM
figured out that queue stuff!!!
Yooooo big news!!
with success comes new problems
im bout to try it in deploy mode
but i wonder how memory plays lol
Oo the chat engine is outt
Oh also @Logan M you know that iterator with pipeline is actually an issue thats why it was giving us that thread lock error
It wasnt us doing anything wrong it is an active issue. Its exactly due to shallow copy thing i was talking about
Ohhhh that explains a lot!
The guy is on vacation or smth so when he comes back he will look at it
Could also use the raw model/tokenizer to get around it too lol
Its on the transformers pipeline issues on github
That would work
But this required rhe whole custom llm lol
Wish langchain could have just implemented the predictor as a option would have made life so much easier which ig is what the chat engine is accomplishing so thats good. Looking forward to streaming part of it and then use it
But then also need to find out how to deal with memory for each person
Yea the memory will be annoying. I know some people create an ID for each user and store the memory object in redis somehow
from langchain.memory import ConversationBufferMemory
from langchain import LLMChain

create a dictionary to store the memory for each user

user_memory = {}

create a function to get the memory for a user

def get_user_memory(user_id):
if user_id not in user_memory:
user_memory[user_id] = ConversationBufferMemory()
return user_memory[user_id]

create the LLMChain for a user

user_id = "example_user"
llm_chain = LLMChain(llm=my_llm, memory=get_user_memory(user_id), prompt=my_prompt)
But then again how would you distinguis ids and create them as u go?
I see what i can try
memory works but not mulit user lol
it fks up the whole thing
gota put more time into figureing that out now lol
Good thing is day by day things are being fixed and solutions are coming up for me
Slowly slowly i should achieve everything 🙂
Yea man! Every day another problem solved 💪
@Logan M bro i just setup text_generation inference
I just built a submachine
Actually more like an smg
Holy shit tokens are flying
How is that thing so good
And why did i not use this before
It's like, super optimized haha
Bro its firinggg
Glad it works well!
Legit one command
Now i have a smg streaming tokens
Are the models it supports very smart though?
Wachu mean? U can provide ur own model
It legit sets up the whole thing as a predictor itself too so u can take the code and say llm = client()
Now u legit have a llm class none of that custom class bs
I can straight take that inside langchain
This thing legit just completely outdid the whole month of thing i have been fiddling with
Fuhh should have tried it when i asked u about it last time
I jist randomly saw it again and im like let me see
This thing is danm easy to setup
oh what, I thought it only supported certain models
ohhh, certain architectures have special optimizations
its actually nuts
im trying to implement it into flask
i have to change the structure lol cause it needs json or else it throws this token error
text_generation.errors.ValidationError: Input validation error: inputs tokens + max_new_tokens must be <= 1512. Given: 48 inputs tokens and 1500 max_new_tokens
weird i dont get it why its doing this
Plain Text
message = request.get_data()
    #message = message.replace("message=", "")
    def stream():
        for response in client.generate_stream("what is NLB?", temperature=0.1, max_new_tokens=1500):
            if not response.token.special:
                yield response.token.text
if i do this like hardcore the question it works fine
but when i use message variable it gives that error
Plain Text
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/text_generation/client.py", line 251, in generate_stream
    response = StreamResponse(**json_payload)
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for StreamResponse
token
  field required (type=value_error.missing)
Seems like somewhere thers a limit on the model for 1512 tokens. So the long message is causing an issue since max_new_tokens is so big (try lowering it to 512 or something smaller)
🤷‍♂️ not sure on that one lol seems like a problem with the text-generation library?
this isnt the problem cause its only coming with the variable for message
if i hardcode it then it works fine?
i wasnt reading it
it needs to be 1512 or less as a whole
cant seem to find this anywhere
ahhh its a hard limit inside the actual inference server code
oh wow, that's super low
that's annoying
danm thats bs
why would they do that
check dm @Logan M
@Logan M this thing sits within llama index so nicely but still having that issue where its doing multiple calls to the server
I dont get why its making multiple calls when it gets the right answerr. Im no longer using custom llm class but the predictor style class thats made for the inference server its sick
Memory works so well! Now time to mess around with tools with langchain
Maybe enable internet with it hehe
Add a reply
Sign up and join the conversation on Discord