hmm

53 comments

LLogan M

Attachment

LLogan M

Yea, definitely nodes 👀

AAkinus21

I see....so when I do

Plain Text

 docstore.get_document(d)

it actually returns nodes?

LLogan M

yea, seems like it

LLogan M

you can also do docstore.docs to pull a dict of every id -> node

AAkinus21

Alright, so I figured that piece out. Finally, after the changes with the new update and re-writing everything, I got the query to run, but now I'm getting some of the same errors I was getting before. Not so much the "Larger Chunk" error, but this one:

Plain Text

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

LLogan M

Ooo closer!

What's your prompt helper settings/chunk size settings look like again?

AAkinus21

so, for the new changes, I'm not using prompt_helper. My old prompt_helper was:

Plain Text

 # define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 1500
# set maximum chunk overlap
max_chunk_overlap = 20
# Set the chunk size limit
chunk_size_limit = 100
prompt_helper = PromptHelper(max_input_size,
                             num_output,
                             max_chunk_overlap,
                             chunk_size_limit=chunk_size_limit)

AAkinus21

my custom LLM class is:

Plain Text

# Custom LLM Class
class CustomLLM(LLM):

    model_name = "EleutherAI/pythia-70m"
    pipeline = pipeline(model=model_name,
                        model_kwargs={'pad_token_id': 0},
                        # torch_dtype=torch.bfloat16,
                        trust_remote_code=True,
                        max_new_tokens=1026,
                        device_map="auto")

    def _call(self, prompt, stop=None):
        prompt_length = len(prompt)
        response = self.pipeline(prompt)[0]['generated_text']
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

AAkinus21

I changed the 1026 from previously 1028. As a test. Seems promising.

AAkinus21

welp, nevermind

AAkinus21

didn't work

LLogan M

Hmm make sure your max new tokens is the same as num_output

AAkinus21

testing now

AAkinus21

with the prompt_helper enabled, I got this error:
ValueError: Got a larger chunk overlap (20) than chunk size (-1489), should be smaller.

LLogan M

A classic lol

Might have to tweak the numbers a bit.

Im not sure if you actually need num_output that big, but i would try maybe something like this

Plain Text

# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=512)

LLogan M

Maybe even try removing the chunk size limit... it's so finicky haha

AAkinus21

Now I'm here again:

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3

AAkinus21

Ok, I'm going to post my build_index function, my query function and my LLM class and prompt helper portions. . The problems I am having:

When I use the argument "refresh_documents" it clears all the docstore and index store, and rebuilds but then I get the error above (tensor size).

When I don't use that argument, it is supposed to load the index from storage, but when I do THAT, I get this....

Plain Text

2023-05-07 18:38:47 Building Attachments Index...

                --- Hashing /home/gabri/AkoGPT/attachments


2023-05-07 18:38:47 Building Base Knowledge Index...

                --- Hashing /home/gabri/AkoGPT/base


INFO:llama_index.indices.loading:Loading all indices.


Querying...


Batches: 100%|█████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.93it/s]
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 13 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens


Query Complete...


None

I am clearly missing some kind of understanding here....

AAkinus21

Build_index:

Plain Text

def build_index(prompt):
    additions = False
    documents_array = []
    parser = SimpleNodeParser()
    docstore = MongoDocumentStore.from_uri(uri=MONGO_URI)
    index_store = MongoIndexStore.from_uri(uri=MONGO_URI)

    storage_context = StorageContext.from_defaults(
        index_store = index_store,
        docstore = docstore
    )

    if arg_present('show_index'):
        for i, v in enumerate(index_store.index_structs()):
            print(f'\nIndex {i}:\n{v}\n')

    if arg_present('refresh_documents'):

        log(f'Refresh documents called', True, False)

        log(f'Clearing Document Store...', False, True)
        for d in docstore.docs:
            storage_context.docstore.delete_document(d)

        log(f'Clearing Index Store...', False, True)
        client = pymongo.MongoClient("mongodb://localhost:27017/")
        db = client["db_docstore"]
        col = db["index_store/data"]
        col.drop()
        client.close()

        log(f'Removing Hash files...', False, True)
        os.remove(attachments_hash)
        os.remove(base_knowledge_hash)

    
    # Check if attachments index file exists, if not, build it.
    log(f'Building Attachments Index...', True, False)
    
    if os.path.exists(attachments_hash):
        if not compare_hashes(attachments_folder, attachments_hash, attachments_hash):
            storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(attachments_folder))
            additions = True
    else:
        storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(attachments_folder))
        hash_folder(attachments_folder, attachments_hash, True)
        additions = True

AAkinus21

Plain Text

    # Base Knowledge Folder Index
    log(f'Building Base Knowledge Index...', True, False)
    if os.path.exists(base_knowledge_hash):
        if not compare_hashes(base_knowledge_folder, base_knowledge_hash, base_knowledge_hash):
            storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(base_knowledge_folder))
            additions = True
    else:
        storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(base_knowledge_folder))
        hash_folder(base_knowledge_folder, base_knowledge_hash, True)
        additions = True
    
    # build index from folders
    if additions:
        for d in docstore.docs:
            documents_array.append(storage_context.docstore.get_document(d))

        index = GPTVectorStoreIndex.from_documents(
            documents_array,
            storage_context=storage_context,
            service_context=service_context
        )
    else:
        index = load_index_from_storage(storage_context=storage_context,
                                        service_context=service_context)
    
    return index

AAkinus21

Query:

Plain Text

def ask_gpt_custom(prompt):
    index = build_index(prompt)
    print(f'\n\nQuerying...\n\n')

    query_engine = index.as_query_engine(
        verbose=True,
        service_context=service_context
    )
    response = query_engine.query(prompt)

    
    print(f'\n\nQuery Complete...\n\n')

    print(f'{response}')

    return f'{response}'

AAkinus21

LLM and Prompt Helper:

Plain Text

class CustomLLM(LLM):

    model_name = "EleutherAI/pythia-70m"
    pipeline = pipeline(model=model_name,
                        model_kwargs={'pad_token_id': 0},
                        # torch_dtype=torch.bfloat16,
                        trust_remote_code=True,
                        max_new_tokens=256,
                        device_map="auto")

    def _call(self, prompt, stop=None):
        prompt_length = len(prompt)
        response = self.pipeline(prompt)[0]['generated_text']
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
# Set the chunk size limit
chunk_size_limit = 512
prompt_helper = PromptHelper(max_input_size,
                             num_output,
                             max_chunk_overlap)

# define our LLM
llm_predictor = LLMPredictor(llm=CustomLLM())

# build service context
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                               prompt_helper=prompt_helper,
                                               embed_model=embed_model)

LLogan M

sorry man, totally forgot to respond to this lol did you ever figure it out?

AAkinus21

no, not yet

AAkinus21

It's beyond my knowledge set... I'm completely stuck

LLogan M

oof I know. Seems super hard to debug too.

I just merged a new huggingface LLM predictor today

Maybe it's worth installing from source and trying it out? It should also come out in the next release too

https://github.com/jerryjliu/llama_index/blob/main/docs/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb

LLogan M

oh it just got released

LLogan M

39 minutes ago lol so pip should pick it up

AAkinus21

I'm on 0.6.4. That the right one?

LLogan M

thats the one

LLogan M

Here's another slightly different demo, in case you need more examples lol https://github.com/jerryjliu/llama_index/blob/main/docs/examples/customization/llms/SimpleIndexDemo-Huggingface_stablelm.ipynb

AAkinus21

Plain Text

query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

llm_predictor = HuggingFaceLLMPredictor(
    max_input_size=4096, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=False,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="EleutherAI/pythia-160m",
    model_name="EleutherAI/pythia-160m",
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                               embed_model=embed_model)

index = build_index(prompt)

query_engine = index.as_query_engine(
        retriever_mode="embedding",
        service_context=service_context
    )
    response = query_engine.query(prompt)

Seems like the prompt helper variables are wrapped into the LLM definition. Nice. IS the query_wrapper_prompt mandatory? Also, what are the stopping ids?

LLogan M

Query wrapper prompt is not mandatory (some llms just require a specific format)

Stopping ids are extra IDs to stop the model from generating. (By default there is a single special token that stops the generation early) They are specific to each tokenizer

I just took that from the stable lm huggingface page

AAkinus21

gotcha

AAkinus21

Rewrote using that LLM, it's giving me the old openAI authentication issues

LLogan M

Whaaat even with the embeddings set hey?

If you set your openai key to something random, does it work or fail somewhere?

LLogan M

If it fails somewhere, we are back to figuring out why it's being used lol

AAkinus21

I set it to empty - fail
Set it to my actual key - worked but of course queried OpenAI

LLogan M

Did it fail during index construction or the query?

LLogan M

I think your build index function also needs the service context (not sure what's in there)

AAkinus21

Index construction...looks like in the embeddings for the GPTVectorStoreIndex

LLogan M

And you passed the service context into the constructor too?

AAkinus21

I think I figured it out. I had changed the type of index to test, and changed the service context to match the example, but forgot to add the embed_model.

Good news is, I'm actually getting a response now. Not a correct one, and it repeats like, 5 times, but still. Better than before. Progress!

LLogan M

Nice!! Glad it works 💪

LLogan M

(That model you are using is probably too small to be very good btw)

AAkinus21

Hmm, what's the smallest you'd recommend? The server I am hosting this on has a monetary limitation as far as power is concerned.

LLogan M

From what I've seen, I wouldn't expect logical answers from anything less than 700M parameters.

Maybe that will improve in the future at some point, people are going wild trying to make these things smaller lol

LLogan M

Most open source ones seem to hover around 3-7 billion ish

AAkinus21

I'm finally getting answers. Thank you so much for the help. Next tasks for me are:
--Figure out how to speed it all up.
--Figure out why it repeats it's answers 4 times.
-- Add loaders to grab as much info as possible.

LLogan M

Niceee :dotsHARDSTYLE:

AAkinus21

Need to add to the above list: figure out why loading the index from the MongoIndexStore is returning no documents...

LLogan M

Oof 🥲

Find answers from the community

hmm