Find answers from the community

d
davidp
Offline, last seen 3 months ago
Joined September 25, 2024
d
davidp
·

Delete

hi, is there any way to delete documents from an index? but by specifying the documentId, instead what I'd need is to delete all documents from a given file. For instance to tell delete all documents with file_name= "Top Screwups...."
1 comment
L
d
davidp
·

Bedrock

Hi, anybody has tried AWS Bedrock with llamaindex? I have tried it and it does not give any error but it doesn't take the prompt template nor it interacting with the results:

this is a piece of code of how I use it up:

Plain Text
from llama_index.llms.bedrock import Bedrock

llm = Bedrock(model="meta.llama2-13b-chat-v1", profile_name="machineuser1")

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

service_context = ServiceContext.from_defaults(
    llm = llm,
    embed_model = embed_model,
    chunk_size=256,
)
15 comments
d
L
Hi, does somebody know if I can you llamaindex with Ollama right with the truLens recorder. I'm trying to evaluate the RAG but I get an error:

Plain Text
`
tru = Tru()
tru.reset_database()

llm = Ollama(model="wizard-vicuna-uncensored",base_url="http://192.168.1.232:11435")

response = completion(
    model="ollama/wizard-vicuna-uncensored", 
    messages=[{ "content": "respond in 20 words. who are you?","role": "user"}], 
    api_base="http://192.168.1.232:11435"
)

LiteLLM.set_verbose=True

litellm_provider = LiteLLM(model_engine="ollama/wizard-vicuna-uncensored", endpoint="http://192.168.1.232:11435")

grounded = Groundedness(groundedness_provider=litellm_provider)

f_groundedness = (
    Feedback(grounded.groundedness_measure_with_cot_reasons, name = "Groundedness")
    .on(Select.RecordCalls.retrieve.rets.collect())
    .on_output()
    .aggregate(grounded.grounded_statements_aggregator)
)

f_qa_relevance = (
    Feedback(litellm_provider.relevance_with_cot_reasons, name = "Answer Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on_output()
)
f_context_relevance = (
    Feedback(litellm_provider.qs_relevance_with_cot_reasons, name = "Context Relevance")
    .on(Select.RecordCalls.retrieve.args.query)
    .on(Select.RecordCalls.retrieve.rets.collect())
    .aggregate(np.mean)
)

from trulens_eval import TruLlama
from trulens_eval import FeedbackMode

tru_recorder = TruLlama(
    query_engine3,
    app_id="App_1",
    feedbacks=[
        f_qa_relevance,
        f_context_relevance,
        f_groundedness
    ]
)

for question in eval_questions:
    with tru_recorder as recording:
        print(question)
        query_engine3.query(question)
`
I'm adapting this from the example in the course:
https://learn.deeplearning.ai/building-evaluating-advanced-rag/lesson/3/rag-triad-of-metrics
where I want to use Ollama instead of OpenAI API
1 comment
d
Hi, is there any way to make the index in RAM faster? I'm using this call:

index_finance = load_index_from_storage(storage_context)

but for a file of around 5GB it takes too long. I think it only uses one core and it changes the core for each node loaded
17 comments
L
d
r
d
davidp
·

Top_k

Hi, I'd like to know if it's possible to set a custom index for a chat_engine. I'd need to retrieve more than the default 2 documents for each interaction but it seems as it can't be done...
48 comments
d
L
W
Hi, I'm trying to use chroma with llama-index. I'm loading some json documents into a documents object. The issue comes when I call the:
index_finance = VectorStoreIndex.from_documents( documents, storage_context=storage_context, service_context=service_context )
Any idea of what I'm missing for the jsons? if I do the same with chroma but with a simpledirectoryreader it works:
documentsNassim = SimpleDirectoryReader("/mnt/nasmixprojects/books/nassimTalebDemo").load_data()
23 comments
d
W
Hi, I'd like to expose my chat that I use with repl chat_engine.chat_repl() to the Internet. The idea is to have a React app for the frontend and a node.js app in the middle that would make the petitions to the llamaindex python code. What's the best way to do it? I'd say there has to be some socket to support the interaction with the chatbot but I'm lost at how to serve the python code to the node.js or directly on over the Internet.
6 comments
d
c
any idea if how bing makes the autogenerated next question proposals? is there any way to do it with llamaindex?
1 comment
L
hi, is there any way to load json documents from a directory with for example:
documents = SimpleDirectoryReader("./transcriptions_test_json/").load_data()
and then say I only want to vectorize/create the index by taking one of the fields of each json?

for example a json is:

"c49c7a9b-6a12-5f1f-ba76-b81d986e5bc7": { "video_name": "videoplayback2.mp4", "video_path": "/mnt/nas/videos/0-ops/videoplayback2.mp4", "original_text": " Good evening and welcome to T...", "length_characters": 7585, "original_lang": "en", "video_section": "0-ops" }

and I'd only like to vectorize the original_text file, but when I retrieve with the query before genrrating the final answer I'd like to use the rest of the files potentially for statistics.

The SimpleDirectoryReader can ingest the json and I can access to each of the ingested json inside of the documents read but it's getting the json as a string...

print(documents) print ("\n") for doc in documents: print (doc.text) print ("\n")
78 comments
d
k
v
L
Hi, I'm following the example of the llamacpp in the documentation but I get an error when trying to use a Huggingfacemodel. I'm running on intel CPU

https://gpt-index.readthedocs.io/en/v0.9.2/examples/llm/llama_2_llama_cpp.html

model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"

llm = LlamaCPP( # You can pass in the URL to a GGML model to download it automatically model_url=model_url, # optionally, you can set the path to a pre-downloaded model instead of model_url model_path=None, temperature=0.1, max_new_tokens=256, # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room context_window=3900, # kwargs to pass to __call__() generate_kwargs={}, # kwargs to pass to __init__() # set to at least 1 to use GPU model_kwargs={"n_gpu_layers":0},<------------I put this to 0 as I don't have GPU # transform inputs into Llama2 format messages_to_prompt=messages_to_prompt, completion_to_prompt=completion_to_prompt, verbose=True, )

gguf_init_from_file: invalid magic characters tjgg.error loading model: llama_model_loader: failed to load model from /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.binllama_load_model_from_file: failed to load modelAVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |

anybody know if I should change the version of the model or the llamacpp python package?
I've tried for instance with this version but it also didn't work:
!pip install llama-cpp-python==0.1.78
20 comments
d
W