Find answers from the community

Home
Members
thoraxe
t
thoraxe
Offline, last seen 3 months ago
Joined September 25, 2024
is there a place to manipulate the cache settings to prevent llamaindex from checking "upstream" to find a newer embedding model version?
5 comments
L
t
hm... with llama_index.set_global_handler("simple") I am not seeing any verbosity/debug messages for building the vector store index
16 comments
e
t
L
t
thoraxe
·

Gpu

it looks like it's been asked previously but no clear answer and the docs aren't super clear either -- how do I make the embedding/indexing use GPU? or, how do I know if it used/might use GPU?
6 comments
t
L
t
thoraxe
·

Huggingface


Plain Text
Traceback (most recent call last):
  File "/opt/app-root/src/llamaindex-rag-example/starter.py", line 8, in <module>
    embed_model = HuggingFaceEmbedding(model_name="Cohere/Cohere-embed-english-v3.0")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_index/embeddings/huggingface.py", line 82, in __init__
    model = AutoModel.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1132, in from_pretrained
    raise ValueError(
ValueError: Unrecognized model in Cohere/Cohere-embed-english-v3.0. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: ...


looks like you don't support all embeding models from the mteb leaderboard.
7 comments
t
L
stupid question - looking at https://docs.llamaindex.ai/en/stable/module_guides/observability/callbacks/token_counting_migration.html#token-counting-migration-guide and thinking about token counting.
the callback manager is explicitly using tiktoken, which is counting tokens for openai. but what if i'm not using openai? is it "close enough"?
also, how does the embedding model (eg: BAAI/bge-base-en-v1.5) relate? or does it maybe not relate?
5 comments
t
L
probably user error. I am using the HF TEI server to embed, but then when I try to do a lookup, I get this error:
ValueError: shapes (1024,) and (768,) not aligned: 1024 (dim 0) != 768 (dim 0)
https://gist.github.com/thoraxe/583ee9f8d2a21a562f42535da47cee0d
36 comments
t
L
I'm having a problem using the TextEmbeddingInterface remotely:
https://gist.github.com/thoraxe/14d34e2aa85f602f7af89813a13ce010

When I index the same documents using local embedding with the same embedder even, I don't get this error.
100 comments
t
L
sooooo i'm not sure what i'm doing wrong here with Milvus, but it seems like it just keeps throwing everything away that I recently indexed. if I index some documents, and then try to set up an index from that same vector store, i don't get any results
6 comments
L
t
i'm trying to follow the Milvus tutorial but I am using Azure OpenAI and not OpenAI, but it keeps trying to talk to OpenAI during the embedding step:
Plain Text
Retrying llama_index.embeddings.openai.base.get_embeddings in 0.275948059787665 seconds as it raised AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: xxx. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}.
7 comments
L
t
are older versions of the llama docs available on RTD or how can I see them? eg: 0.9.39
4 comments
L
t
t
thoraxe
·

React

have you come across any ReAct-specific datasets? I've not found any of the open models to be good at it. I'm about to try Zephyr
40 comments
L
t
interesting situation which is probably a weird edge case --
my index only has one document. I asekd it a stupid question ("how do I do it?") and it returned that one document. It's not REALLY relevant, but I suppose it's not entirely irrelevant either.
Is it a guarantee that with only one indexed document, you'll always get it? I used "chicken?" as the query and it still retrieved that document -- there is nothing to do with even animals in it.
3 comments
A
t
T
rather dumb question -- if I simply want to query the LLM directly (service_context) without sending any index/reference information, how do I do that?
2 comments
t
t
thoraxe
·

Outpht

is there a convenient way to override the output formatting?
https://github.com/run-llama/llama_index/blob/v0.8.38/llama_index/callbacks/simple_llm_handler.py
Since I'm going to be post-processing stuff in python and then doing later things, I'd like to change the ** Prompt ** and ** Completion ** stuff
12 comments
L
t
so it's unclear where this part of the prompt is coming from. it's not the system prompt, and it's not the query wrapper. when passing context, I am seeing this in the total prompt:

Plain Text
Context information is below.
---------------------
file_name: summary-docs/cluster-autoscaling.md


now, i understand that i have metadata on the file, so that likely explains why that is passed. But I'm curious about the context information is below part and how to alter that
8 comments
t
L
it looks like storing multiple indexes locally in the same folder is unsupported. despite defining multiple index ids, it looks like only the last written persisted index "wins"
6 comments
t
L
hmm... llamaindex appears to be looking at ALL indices in redis and not restricting itself to only the specified index
59 comments
t
L
FWIW, llama2-13b-chat is terrible at being a ReAct agent
6 comments
D
t
L
UnboundLocalError: local variable 'default_template' referenced before assignment

hmm
24 comments
L
t
so i'm looking at the logs of the tgis server and the output of print(summary.response) and it looks like it's doing a double-summary
79 comments
t
L
I'm trying to use SummaryIndex via a TGIS server (and not run the LLM locally) but llamaindex seems like it's ignoring the TGIS predictor. Maybe I'm using this wrong?

Plain Text
service_context = ServiceContext.from_defaults(chunk_size=512,
                                               llm=tgis_predictor, 
                                               context_window=2048,
                                               prompt_helper=prompt_helper,
                                               embed_model=embed_model)

# Load data
documents = SimpleDirectoryReader('private-data').load_data()

index = SummaryIndex.from_documents(documents)
summary = index.as_query_engine(response_mode="tree_summarize").query("Summarize the text, describing what it might be most useful for")


but then it tries to download an HF model:
Plain Text
Downloading url https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin to path /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin
total size (MB): 7323.31


And ultimately blows up my machine trying to use this model via CPU
8 comments
L
t
looking at https://docs.llamaindex.ai/en/stable/examples/vector_stores/SimpleIndexDemoLlama-Local.html if I don't import torch or set the torch kwargs, does it default to using CPU, or will it automatically use GPU regardless?
9 comments
L
t
More of a pure langchain question but llamaindex may also have its own solution here.
trying to wrap my head around how to do something:
I'm submitting a question via RAG which gets turned into a list of tasks:
  1. do a thing
  2. do some other thing
for each step I am trying to:
  1. figure out if the original question has enough information to complete the task
  2. if it does, perform the task (via the LLM)
  3. pass the original question, the output of the task, and the next step along
  4. see if there is enough information to complete the next task
and kind of stay in that loop until everything is complete. then put itall together and send back to the user.
33 comments
L
t
so i'm trying to think through how to effectively make this assistant. I think react out of the gate is probably not quite going to work.
What I'm envisioning is a chain where the first step is "Task breakdown", then each task is separately processed, and then the final result is either summarized or just spit out at the end.
For example, trying to set up cluster autoscaling in OpenShift (Kubernetes) involves 2 steps - creating a clusterautoscaler object and then creating a machineautoscaler object.

  • I have various forms of docs that can be queried/indexed to spit out that task list
  • I have docs that can be queried/indexed that describe both cluster and machine autoscaler objects
I haven't tried setting up multiple "tools" (task breakdown tool, documentation search) yet, but was just curious about people's thoughts here
5 comments
t
D
L