Find answers from the community

Updated 3 months ago

Bedrock

Hi, anybody has tried AWS Bedrock with llamaindex? I have tried it and it does not give any error but it doesn't take the prompt template nor it interacting with the results:

this is a piece of code of how I use it up:

Plain Text
from llama_index.llms.bedrock import Bedrock

llm = Bedrock(model="meta.llama2-13b-chat-v1", profile_name="machineuser1")

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

service_context = ServiceContext.from_defaults(
    llm = llm,
    embed_model = embed_model,
    chunk_size=256,
)
L
d
15 comments
I'm not sure what you mean, what's the issue?
the result has a format of "Human: and Assitant:" dialog but the doesn't have that format
Attachments
image.png
image.png
if i use my llama2 local model the result matches what's expected acording to the template
Plain Text
 llm = Ollama(model="llama2",base_url="http://192.168.1.245:11435") 
You are using ollama now? I thought this was bedrock ๐Ÿ˜…

Ollama will convert completion prompts to chat templates

Ollama handles the proper template formatting for chat models, but when it's printed to console, it just uses str(message) which makes both the role and content into a string
hi @Logan M, ah no, I have just put both the ouput of the RAG when I use Bedrock and then when I use Ollama. With Ollama the template is used well but with Bedrock it can be seen how is not using the template but it's using a "Human" and "Assistant" dialog that I haven't specified. This is the issue I don't how to tackle
You can use the messages_to_prompt and completion_to_prompt hooks to have more control on the input format

Plain Text
def completion_to_prompt(completion):
  return completion

def messages_to_prompt(messages):
  return "\n".join([str(x) for x in messages])

llm = Bedrock(..., completion_to_prompt=completion_to_prompt, messages_to_prompt=messages_to_prompt)
Hi @Logan M , thanks but I don't get it how to do it. With Ollama llama2 I have to use the synthethizer for exmaple:

response_synthesizer = get_response_synthesizer( ##try compact? text_qa_template=qa_template, refine_template=new_summary_tmpl, #streaming=True )

why is with bedrock different? and how can I translate what I have for local llama 2 to bedrock?
what are doing these two functions: completion_to_prompt and messages_toprompt? completion will append something to the output of the completion and the messages_ will perform some text treatment to the retrieved text fragments before sending them? Is there any example somewhere? I only found out this in the source code:
https://github.com/run-llama/llama_index/blob/0ae69d46e3735a740214c22a5f72e05d46d92635/llama-index-core/llama_index/core/llms/chatml_utils.py#L26
Ollama automatically handles the prompt formatting in their library. Bedrock does not, I guess thats a decision bedrock made ๐Ÿคทโ€โ™‚๏ธ
The prompt formatting depends on the LLM being used

For llama2, there is one here
https://github.com/run-llama/llama_index/blob/0ae69d46e3735a740214c22a5f72e05d46d92635/llama-index-integrations/llms/llama-index-llms-llama-cpp/llama_index/llms/llama_cpp/llama_utils.py

For something like zephyr, it would look like this

Plain Text
def messages_to_prompt(messages):
  prompt = ""
  for message in messages:
    if message.role == 'system':
      prompt += f"<|system|>\n{message.content}</s>\n"
    elif message.role == 'user':
      prompt += f"<|user|>\n{message.content}</s>\n"
    elif message.role == 'assistant':
      prompt += f"<|assistant|>\n{message.content}</s>\n"

  # ensure we start with a system prompt, insert blank if needed
  if not prompt.startswith("<|system|>\n"):
    prompt = "<|system|>\n</s>\n" + prompt

  # add final assistant prompt
  prompt = prompt + "<|assistant|>\n"

  return prompt


def completion_to_prompt(completion):
  return "<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"
Hi @Logan M , ok more or less I get the point, but if I want just to do a query answer without the roles of assistant, system and user? let's say I just want to provide the text fragments and also specify that multiple completions have to be taken into account because not all fragments will fit the LLM?

example:

template = ( "We have provided trusted context information below. \n" "---------------------\n" "{context_str}" "\n---------------------\n" "Given this trusted and cientific information, please answer the question: {query_str}. Remember that the statements of the context are verfied and come from trusted sources.\n" ) qa_template = Prompt(template) new_summary_tmpl_str = ( "The original query is as follows: {query_str}" "We have provided an existing answer: {existing_answer}" "We have the opportunity to refine the existing answer (only if needed) with some more trusted context below. Remember that the statements of the context are verfied and come from trusted sources." "------------" "{context_msg}" "------------" "Given the new trusted context, refine the original answer to better answer the query. If the context isn't useful, return the original answer. Remember that the statements of the new context are verfied and come from trusted sources." "Refined Answer: sure thing! " ) new_summary_tmpl = PromptTemplate(new_summary_tmpl_str)
yea that looks fine to me. But if your model is in this list, any prompts will be converted to chat, because they are a chat model
Attachment
image.png
ok @Logan M , I'll user first to simplify this:
COMPLETION_MODELS = {
"amazon.titan-tg1-large": 8000,

but how do I convert from the local llama templates to the templates or what is required in bedrock?
Add a reply
Sign up and join the conversation on Discord