Find answers from the community

Updated 2 years ago

hmm

At a glance
L
A
53 comments
Yea, definitely nodes πŸ‘€
I see....so when I do
Plain Text
 docstore.get_document(d) 
it actually returns nodes?
yea, seems like it
you can also do docstore.docs to pull a dict of every id -> node
Alright, so I figured that piece out. Finally, after the changes with the new update and re-writing everything, I got the query to run, but now I'm getting some of the same errors I was getting before. Not so much the "Larger Chunk" error, but this one:
Plain Text
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3
Ooo closer!

What's your prompt helper settings/chunk size settings look like again?
so, for the new changes, I'm not using prompt_helper. My old prompt_helper was:
Plain Text
 # define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 1500
# set maximum chunk overlap
max_chunk_overlap = 20
# Set the chunk size limit
chunk_size_limit = 100
prompt_helper = PromptHelper(max_input_size,
                             num_output,
                             max_chunk_overlap,
                             chunk_size_limit=chunk_size_limit)
my custom LLM class is:
Plain Text
# Custom LLM Class
class CustomLLM(LLM):

    model_name = "EleutherAI/pythia-70m"
    pipeline = pipeline(model=model_name,
                        model_kwargs={'pad_token_id': 0},
                        # torch_dtype=torch.bfloat16,
                        trust_remote_code=True,
                        max_new_tokens=1026,
                        device_map="auto")

    def _call(self, prompt, stop=None):
        prompt_length = len(prompt)
        response = self.pipeline(prompt)[0]['generated_text']
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"
I changed the 1026 from previously 1028. As a test. Seems promising.
welp, nevermind
Hmm make sure your max new tokens is the same as num_output
with the prompt_helper enabled, I got this error:
ValueError: Got a larger chunk overlap (20) than chunk size (-1489), should be smaller.
A classic lol

Might have to tweak the numbers a bit.

Im not sure if you actually need num_output that big, but i would try maybe something like this

Plain Text
# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap, chunk_size_limit=512)
Maybe even try removing the chunk size limit... it's so finicky haha
Now I'm here again:

RuntimeError: The size of tensor a (2048) must match the size of tensor b (2049) at non-singleton dimension 3
Ok, I'm going to post my build_index function, my query function and my LLM class and prompt helper portions. . The problems I am having:

When I use the argument "refresh_documents" it clears all the docstore and index store, and rebuilds but then I get the error above (tensor size).

When I don't use that argument, it is supposed to load the index from storage, but when I do THAT, I get this....
Plain Text
2023-05-07 18:38:47 Building Attachments Index...

                --- Hashing /home/gabri/AkoGPT/attachments


2023-05-07 18:38:47 Building Base Knowledge Index...

                --- Hashing /home/gabri/AkoGPT/base


INFO:llama_index.indices.loading:Loading all indices.


Querying...


Batches: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  7.93it/s]
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 13 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens


Query Complete...


None
I am clearly missing some kind of understanding here....
Build_index:
Plain Text
def build_index(prompt):
    additions = False
    documents_array = []
    parser = SimpleNodeParser()
    docstore = MongoDocumentStore.from_uri(uri=MONGO_URI)
    index_store = MongoIndexStore.from_uri(uri=MONGO_URI)

    storage_context = StorageContext.from_defaults(
        index_store = index_store,
        docstore = docstore
    )

    if arg_present('show_index'):
        for i, v in enumerate(index_store.index_structs()):
            print(f'\nIndex {i}:\n{v}\n')

    if arg_present('refresh_documents'):

        log(f'Refresh documents called', True, False)

        log(f'Clearing Document Store...', False, True)
        for d in docstore.docs:
            storage_context.docstore.delete_document(d)

        log(f'Clearing Index Store...', False, True)
        client = pymongo.MongoClient("mongodb://localhost:27017/")
        db = client["db_docstore"]
        col = db["index_store/data"]
        col.drop()
        client.close()

        log(f'Removing Hash files...', False, True)
        os.remove(attachments_hash)
        os.remove(base_knowledge_hash)

    
    # Check if attachments index file exists, if not, build it.
    log(f'Building Attachments Index...', True, False)
    
    if os.path.exists(attachments_hash):
        if not compare_hashes(attachments_folder, attachments_hash, attachments_hash):
            storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(attachments_folder))
            additions = True
    else:
        storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(attachments_folder))
        hash_folder(attachments_folder, attachments_hash, True)
        additions = True
Plain Text
    # Base Knowledge Folder Index
    log(f'Building Base Knowledge Index...', True, False)
    if os.path.exists(base_knowledge_hash):
        if not compare_hashes(base_knowledge_folder, base_knowledge_hash, base_knowledge_hash):
            storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(base_knowledge_folder))
            additions = True
    else:
        storage_context.docstore.add_documents(get_nodes_from_documents_in_folder(base_knowledge_folder))
        hash_folder(base_knowledge_folder, base_knowledge_hash, True)
        additions = True
    
    # build index from folders
    if additions:
        for d in docstore.docs:
            documents_array.append(storage_context.docstore.get_document(d))

        index = GPTVectorStoreIndex.from_documents(
            documents_array,
            storage_context=storage_context,
            service_context=service_context
        )
    else:
        index = load_index_from_storage(storage_context=storage_context,
                                        service_context=service_context)
    
    return index
Query:
Plain Text
def ask_gpt_custom(prompt):
    index = build_index(prompt)
    print(f'\n\nQuerying...\n\n')

    query_engine = index.as_query_engine(
        verbose=True,
        service_context=service_context
    )
    response = query_engine.query(prompt)

    
    print(f'\n\nQuery Complete...\n\n')

    print(f'{response}')

    return f'{response}'
LLM and Prompt Helper:
Plain Text
class CustomLLM(LLM):

    model_name = "EleutherAI/pythia-70m"
    pipeline = pipeline(model=model_name,
                        model_kwargs={'pad_token_id': 0},
                        # torch_dtype=torch.bfloat16,
                        trust_remote_code=True,
                        max_new_tokens=256,
                        device_map="auto")

    def _call(self, prompt, stop=None):
        prompt_length = len(prompt)
        response = self.pipeline(prompt)[0]['generated_text']
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 256
# set maximum chunk overlap
max_chunk_overlap = 20
# Set the chunk size limit
chunk_size_limit = 512
prompt_helper = PromptHelper(max_input_size,
                             num_output,
                             max_chunk_overlap)

# define our LLM
llm_predictor = LLMPredictor(llm=CustomLLM())

# build service context
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                               prompt_helper=prompt_helper,
                                               embed_model=embed_model)
sorry man, totally forgot to respond to this lol did you ever figure it out?
It's beyond my knowledge set... I'm completely stuck
oof I know. Seems super hard to debug too.

I just merged a new huggingface LLM predictor today

Maybe it's worth installing from source and trying it out? It should also come out in the next release too

https://github.com/jerryjliu/llama_index/blob/main/docs/examples/customization/llms/SimpleIndexDemo-Huggingface_camel.ipynb
oh it just got released
39 minutes ago lol so pip should pick it up
I'm on 0.6.4. That the right one?
thats the one
Plain Text
query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

llm_predictor = HuggingFaceLLMPredictor(
    max_input_size=4096, 
    max_new_tokens=256,
    temperature=0.7,
    do_sample=False,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="EleutherAI/pythia-160m",
    model_name="EleutherAI/pythia-160m",
    device_map="auto",
    stopping_ids=[50278, 50279, 50277, 1, 0],
    tokenizer_kwargs={"max_length": 4096},
)

embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor,
                                               embed_model=embed_model)

index = build_index(prompt)

query_engine = index.as_query_engine(
        retriever_mode="embedding",
        service_context=service_context
    )
    response = query_engine.query(prompt)

Seems like the prompt helper variables are wrapped into the LLM definition. Nice. IS the query_wrapper_prompt mandatory? Also, what are the stopping ids?
Query wrapper prompt is not mandatory (some llms just require a specific format)

Stopping ids are extra IDs to stop the model from generating. (By default there is a single special token that stops the generation early) They are specific to each tokenizer

I just took that from the stable lm huggingface page
Rewrote using that LLM, it's giving me the old openAI authentication issues
Whaaat even with the embeddings set hey?

If you set your openai key to something random, does it work or fail somewhere?
If it fails somewhere, we are back to figuring out why it's being used lol
I set it to empty - fail
Set it to my actual key - worked but of course queried OpenAI
Did it fail during index construction or the query?
I think your build index function also needs the service context (not sure what's in there)
Index construction...looks like in the embeddings for the GPTVectorStoreIndex
And you passed the service context into the constructor too?
I think I figured it out. I had changed the type of index to test, and changed the service context to match the example, but forgot to add the embed_model.

Good news is, I'm actually getting a response now. Not a correct one, and it repeats like, 5 times, but still. Better than before. Progress!
Nice!! Glad it works πŸ’ͺ
(That model you are using is probably too small to be very good btw)
Hmm, what's the smallest you'd recommend? The server I am hosting this on has a monetary limitation as far as power is concerned.
From what I've seen, I wouldn't expect logical answers from anything less than 700M parameters.

Maybe that will improve in the future at some point, people are going wild trying to make these things smaller lol
Most open source ones seem to hover around 3-7 billion ish
I'm finally getting answers. Thank you so much for the help. Next tasks for me are:
--Figure out how to speed it all up.
--Figure out why it repeats it's answers 4 times.
-- Add loaders to grab as much info as possible.
Niceee :dotsHARDSTYLE:
Need to add to the above list: figure out why loading the index from the MongoIndexStore is returning no documents...
Oof πŸ₯²
Add a reply
Sign up and join the conversation on Discord