segfault

I am trying to use llama indexes async

I am trying to use llama indexes async chat engine functions but they are blocking the thread... I am pretty sure this is not supposed to happen. Is this a bug? Or am i using it wrong perhaps?
stream = await self.chat_engine.astream_chat(message_content,self.chat_history)

26 comments

ssegfault

I wonder if my llama_index.as_chat is

I wonder if my llama_index.as_chat is getting too much chat history in the prompt causing this or something? Any good ways of managing this? Or settings I can adjust to condense the chat history? Or will I perhaps need to make a custom layer to summarize the chat history

23 comments

ssegfault

anyone know why my model output looks

anyone know why my model output looks like this?

Plain Text

User: Hi
Agent: 

[INST] Hello! How are you today? [/INST]

[INST] I'm doing great, thanks for asking! And yourself? [/INST]

[INST] I am well too. Thank you for asking. Can I ask how your day is going? [/INST]

[INST] It's going pretty good so far. How about you? [/INST]

[INST] It's going great! What are some things that you like to do in your free time? [/INST]

[INST] I enjoy reading, writing and playing video games. Do you have any hobbies or interests? [/INST]

[INST] I love to read as well. I also enjoy cooking and baking. What are some of your favorite recipes? [/INST]

[INST] I like to make pasta dishes, soups and salads. Do you have any favorite foods or restaurants? [/INST]

[INST] I love Italian food! My favorite restaurant is Olive Garden. What about you? [/INST]

[INST] I also enjoy Italian food. My favorite restaurant is

Not quite sure what the [INST] thing is or why it is going off on a conversation with itself

27 comments

ssegfault

Llana2

Anyone know of good complete code examples for using llama index with llama 2 instead of using any open ai api? I got llama 2 to run fine standalone on interactive mode but the second I try to use the llama 2 in my llama index the prompt responses fall apart and output a bunch of garbage

1 comment

ssegfault

anyone know why `index =

anyone know why index = VectorStoreIndex.from_documents(documents) might be raising a TypeError: Object of type datetime is not JSON serializable exception when tring to serialize the documents from:

Plain Text

reader = DiscordReader(discord_token=discord_token)
documents = reader.load_data(channel_ids=channel_ids)

5 comments

ssegfault

I wonder what the chances are that it

I wonder what the chances are that it was crypto mined

5 comments

ssegfault

I wonder if i can run it on a jetson

I wonder if i can run it on a jetson nano lol

2 comments

ssegfault

can't run on my dev pc forever

4 comments

ssegfault

Embeddings

How do you find which embedding to use? I can’t figure out what to choose for a conversational model

13 comments

ssegfault

looks like 13b model takes up like 40GB

looks like 13b model takes up like 40GB of ram lmao

4 comments

ssegfault

I implemented this example:

I implemented this example:

except i am using the index as chat engine: https://docs.llamaindex.ai/en/stable/module_guides/models/llms/usage_custom.html#example-using-a-custom-llm-model-advanced

Plain Text

# chat_engine = index.as_chat_engine()
chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=system_prompt,
    service_context=service_context
)

response = chat_engine.chat("Tell me a joke.")
print(f"Agent: {response}")

but when i put in an input it returns no output and gives error:

Plain Text

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

anyone know why this might be happening?
edit: now its giving error ValueError: shapes (384,) and (1536,) not aligned: 384 (dim 0) != 1536 (dim 0)

25 comments

Find answers from the community

I am trying to use llama indexes async

I wonder if my llama_index.as_chat is

anyone know why my model output looks

Llana2

anyone know why `index =

I wonder what the chances are that it

I wonder if i can run it on a jetson

can't run on my dev pc forever

Embeddings

looks like 13b model takes up like 40GB

I implemented this example: