Chunk size

At a glance

The community member is trying to use a Hugging Face model in the LLamaIndex tutorial, but is encountering an error related to chunk overlap. The community members suggest adjusting the num_output and chunk_size_limit parameters to resolve the issue. The community member then tries to use the Hugging Face LLM in a chat engine, but encounters a parsing error from LangChain. The community members explain that this is expected with a custom LLM, especially the OPT-1.5b model, which may not follow LangChain's parsing expectations well. The community members also discuss the limitations of using the GPT4All model, which has a maximum input size of only 512 tokens.

ddatum

Hey everyone,

I am trying to follow the Full-stack web app tutorial on LLamaIndex but using Hugging Face Model. But whenever I am trying to run the model this is the error I am getting:

Got a larger chunk overlap (-3) than chunk size (-39), should be smaller.

and here's a snippet of my code:

Plain Text

def initialize_index():
    global index


    llm_predictor = HuggingFaceLLMPredictor(
        max_input_size=512, 
        max_new_tokens=512,
        tokenizer_name="facebook/opt-iml-max-1.3b",
        model_name="facebook/opt-iml-max-1.3b",
        model_kwargs={"load_in_8bit": True},
        generate_kwargs={
            "do_sample": True,
            "top_k": 4,
            "penalty_alpha": 0.6, 
        }
    )

    prompt_helper = PromptHelper(context_window=512, chunk_size_limit=256, num_output=512)
    embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model, 
                                                   prompt_helper=prompt_helper)

    
    if os.path.exists("../indices"):
        storage_context = StorageContext.from_defaults(persist_dir="../indices")
        index = load_index_from_storage(storage_context=storage_context, 
                                        service_context=service_context)

    else:
        storage_context = StorageContext.from_defaults()
        documents = SimpleDirectoryReader("../data").load_data()
        index = GPTListIndex.from_documents(documents=documents, service_context=service_context, storage_context=storage_context)
        
        index.set_index_id("paul_graham_essay")
        index.storage_context.persist("../indices")

    return index, service_context

Would appreciate your help in solving this error.

13 comments

LLogan M

Yea few errors/issues

Num output can't be the same size as max_input_size.
Your max input size is 512. This isn't an error per say, but if the model only supports up to 512 input tokens, then it's not a great fit for llama index

You can likely solve you error by setting num_output to something like 100

LLogan M

You might also want to lower the chunk size a bit too, maybe 200.

That small input size is difficult to work with 😅

ddatum

Aah I see, will do that

ddatum

So, I have applied the above setting and it worked amazingly 😊 . But right now, I am trying to use the chat engine using Hugging Face LLM and this my code:

Custom LLM

Plain Text

class LocalOptModel(LLM):

    model_name = "facebook/opt-iml-max-1.3b"
    generation_pipeline = pipeline("text-generation", model=model_name, model_kwargs={"load_in_8bit": True, "device_map":"auto"})
    def _call(self, prompt: str, stop: List[str]|None=None) -> str:
        
        prompt_len = len(prompt)
        response = self.generation_pipeline(prompt, do_sample=True, max_new_tokens=256, top_k=4, penalty_alpha=.6)[0]["generated_text"]

        return response[prompt_len:]
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model":self.model_name}
    
    @property
    def _llm_type(self) -> str:
        return "custom"

ddatum

index initialization

Plain Text

def initialize_index():
    global index, service_context

    llm_predictor = LLMPredictor(llm=LocalOptModel(verbose=True))

    prompt_helper = PromptHelper(context_window=512, chunk_size_limit=200, num_output=100)
    embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, embed_model=embed_model, 
                                                   prompt_helper=prompt_helper)

    
    if os.path.exists("../indices"):
        storage_context = StorageContext.from_defaults(persist_dir="../indices")
        index = load_index_from_storage(storage_context=storage_context, 
                                        service_context=service_context)

    else:
        storage_context = StorageContext.from_defaults()
        documents = SimpleDirectoryReader("../data").load_data()
        index = GPTVectorStoreIndex.from_documents(documents=documents, service_context=service_context, storage_context=storage_context)
        
        index.set_index_id("paul_graham_essay")
        index.storage_context.persist(persist_dir="../indices")

    return index, service_context

ddatum

and my query function:

Plain Text

def generateResponse():
    global index, service_context

    try:

        chat_engine = index.as_chat_engine(
            chat_mode="react",
            verbose=True,
            service_context=service_context
        )

        response = chat_engine.chat(request.args.get("q", None))
        print(response)

        return {"message": "Success", 
                "status": 200,
                "data" : response}
    except Exception as e:
        print(traceback.format_exc())
        return {"message": "Request could not be processed", "status":503}

ddatum

and this is the error I am getting:

Plain Text

langchain.schema.OutputParserException: Could not parse LLM output:
Previous conversation history:

ddatum

Can you please help me with this?

LLogan M

Yea, that's coming from langchain, because the model did not follow their parsing expectations 😅 tbh this is pretty expected with a custom LLM, especially since opt-1.5b is not great at following complex instructions like langchain needs

ddatum

Aah I see, thanks a lot @Logan M 😊

ddatum

Just one question though, can we use the GPT4All model?

LLogan M

You can! But it's actually super limited, because it's max input size is only 512 😢

ddatum

Yeah I saw that but the thing is other than this every other model that I try to fit in my GPU is taking way too much time.

Add a reply

Find answers from the community

Chunk size