Find answers from the community

Home
Members
paragoniq
p
paragoniq
Offline, last seen 3 months ago
Joined September 25, 2024
I'm trying to use either HuggingFaceLLM or LlamaCPP .

On both tries, it keeps hanging with the following:

Plain Text
llama_print_timings:        load time = 159918.60 ms
llama_print_timings:      sample time =   240.40 ms /   256 runs   (    0.94 ms per token,  1064.88 tokens per second)
llama_print_timings: prompt eval time = 550695.06 ms /  3328 tokens (  165.47 ms per token,     6.04 tokens per second)
llama_print_timings:        eval time = 63265.28 ms /   255 runs   (  248.10 ms per token,     4.03 tokens per second)
llama_print_timings:       total time = 614952.10 ms
Llama.generate: prefix-match hit


This is running on an EC2 machine (that's pretty beefy).

It does work locally on my Mac M2 (with LlamaCPP).

Ideas or suggestions?
5 comments
L
p
I want to persist an index to storage (directory). I also want to be able to load that index from storage when I rerun my script.

But what happens on the first run, when no index has been created? what will load_index_from_storage(storage_context) return?

In other words, I want to:

Plain Text
    index = load_index_from_storage(storage_context)
    if(#WHEN INDEX IS NOT TRUTHY / ON FIRST RUN#):
      index = ListIndex.from_defaults(...)
      index.persist()


so I can load the index, or create it on first run.

Suggestions?
1 comment
L
hi, newb here. what's the difference between ListIndex, VectorStoreIndex, and other similar index creation classes?
2 comments
p
L
I'm on a MacBook Pro M2 Max, and when I run index.as_query_engine().query('question') I get this error:

Plain Text
RuntimeError: MPS does not support cumsum op with int64 input
6 comments
L
p
Currently using Llama2 via HuggingFace.

Encountering this issue:

Plain Text
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.


Tracked the huggingface error to this: https://github.com/huggingface/transformers/issues/22222#issuecomment-1477171703

but not sure how to fix that llamaindex-wise.
6 comments
p
L
I'm trying to use meta-llama/Llama-2-70b-chat-hf via the llama_index.llms.HuggingFaceLLM abstraction.

I need to authenticate with HuggingFace (via a user token) in order to use that model.

how can I authenticate/pass the auth token in to HugginfFaceLLM ?
7 comments
L
p
p
paragoniq
·

Hi all

Hi all.

I have an api endpoint that does the following:

Plain Text
    ...
    parser = SimpleNodeParser()
    nodes = parser.get_nodes_from_documents(documents)

    # create (or load) docstore and add nodes
    docstore = MongoDocumentStore.from_uri(uri=URI, db_name='db', namespace='public')
    docstore.add_documents(nodes)

    # create storage context
    storage_context = StorageContext.from_defaults(
        docstore=docstore,
        index_store=MongoIndexStore.from_uri(uri=URI, db_name='db', namespace='public')
    )

    # build index
    index = GPTVectorStoreIndex(nodes, storage_context=storage_context)


On a separate endpoint, I wanna load the index and query it:

Plain Text
    storage_context = StorageContext.from_defaults(
        docstore=MongoDocumentStore.from_uri(uri=URI, db_name='db', namespace='public'),
        index_store=MongoIndexStore.from_uri(uri=URI, db_name='db', namespace='public')
    )
    index = load_index_from_storage(storage_context)
    query_engine = index.as_query_engine()
    result = query_engine.query(data.prompt)


this errors out with KeyError('1').

I believe that stems from a method on the storage context that load_index_from_storage calls deep in the code: storage_context.index_store.index_structs().
This is what errors out with KeyError('1'); I take it that means it couldn't find an index/index_struct? But https://gpt-index.readthedocs.io/en/latest/how_to/storage/index_stores.html says that, if using MongoDBIndexStore, you don't have to persist storage.

am I loading the index correctly? Am I fundamentally doing something wrong?
5 comments
p
d