Find answers from the community

Updated 3 months ago

Max tokens

I have chunk_size=512 and max_tokens=512, context_window=2048 on model and PromptHelper
L
С
10 comments
Max tokens means that you've set the max input size on the model to 512 👀

Can you share the code? I can help correct it
I'm sorry, I wasn't clear. By max_tokens I meant the parameter, which in PromptHelper is called num_output.

Plain Text
from llama_index import (
    StorageContext,
    load_index_from_storage,
    Prompt,
    LLMPredictor,
    ServiceContext,
    SimpleDirectoryReader,
    VectorStoreIndex,
    LangchainEmbedding,
    Document,
    ListIndex,
    PromptHelper
)
from os import listdir
from os.path import isfile, join
import json
from llama_index.optimization.optimizer import SentenceEmbeddingOptimizer
from langchain.llms import LlamaCpp
from langchain.embeddings import HuggingFaceEmbeddings
import os
import sys
import logging

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


MODEL = LlamaCpp(
    model_path="models/wizardlm-30B-uncensored.ggmlv3.q4_0.bin",
    verbose=False,
    max_tokens=512,
    n_ctx=2048
)

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
EMBEDDINGS_MODEL = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
    cache_folder="models/transformers/"
)

template = (
    "Context: \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    " ### Human: {query_str}\n"
    "### Assistant: "
)

QA_TEMPLATE = Prompt(template)

service_context = ServiceContext.from_defaults(
    llm_predictor=LLMPredictor(llm=MODEL),
    embed_model=LangchainEmbedding(EMBEDDINGS_MODEL),
    # In exceptional cases with these values, the program sometimes went beyond the 2048 context. The value context_window=1536 worked flawlessly.
    prompt_helper=PromptHelper(context_window=2048, num_output=512),
    chunk_size=512
)

storage_context = StorageContext.from_defaults()

topic_indexes = []
topic_index_summaries = []

mypath = "data"

files = [...]
Plain Text
for i, filename in enumerate(files):
    with open(join(mypath, filename), "r") as file:
        print(f"### Processing file {filename} ({i + 1}/{len(files)}) ###")
        topic = json.load(file)
        docs = transform_dataset_topic(topic)
        index = ListIndex.from_documents(docs, service_context=service_context, storage_context=storage_context)
        topic_indexes.append(index)
        summary = index.as_query_engine(
#            optimizer=SentenceEmbeddingOptimizer(percentile_cutoff=0.5, embed_model=LangchainEmbedding(EMBEDDINGS_MODEL)),
            text_qa_template=QA_TEMPLATE,
            response_mode="refine"
        ).query(
            "Provide a detailed summary of the topic."
        )
        topic_index_summaries.append(str(summary))
        print(f"### Summary for {filename}: {str(summary)} ###")


...
I've shortened the code.
Right. Since you are using a list index, it will use the template no matter what, since the list index sends every node in the index to the LLM

You can avoid this though, if you use index.as_query_engine(response_mode="tree_summarize"), which is the ideal mode for creating summaries
It produces false summaries in such case.
I'm more interested in the reason why the response is lost during the refining process.
Yea, the refine prompt is complex, like I think you mentioned earlier. OpenSource models are not good at following it

You could try customizing index.as_query_engine(refine_template=my_refine_template)

The default refine template is here
https://github.com/jerryjliu/llama_index/blob/0cf7f9983b6ec0528a327e8bc0e64bf0321b73fc/llama_index/prompts/default_prompts.py#L81
@Logan M Thank you very much for your help. Seems like the following template works:
Plain Text
     HumanMessagePromptTemplate.from_template(
        "-----------\n"
        "Complete the answer to the question \"{query_str}\" based on the following context.\n"
        "Original answer:\n"
        "------------\n"
        "{existing_answer}\n"
        "------------\n"
        "Context provided:\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n",
    ),
Add a reply
Sign up and join the conversation on Discord