Find answers from the community

Updated 8 months ago

llama index people dont update the

WWeep \OwO/

llama index people dont update the documentations when updating anything ?

70 comments

LLogan M

whats the issue my guy

WWeep \OwO/

thnx bro

WWeep \OwO/

already 3-4days no one helping

WWeep \OwO/

this part u see

Attachment

WWeep \OwO/

Attachment

WWeep \OwO/

one more issue is like using docs is forcing us to use embedded model after similarity been found out

WWeep \OwO/

why we need embedded model for index = VectorStoreIndex.from_vector_store(vector_store)

WWeep \OwO/

Attachment

LLogan M

Yea, its saying its deprecated. Should be more like

Plain Text

from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)


filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", operator=FilterOperator.EQ, value="Mafia"),
    ]
)

https://docs.llamaindex.ai/en/stable/examples/vector_stores/chroma_metadata_filter/?h=metadatafilter

LLogan M

I'm not sure what you mean here. The index needs an embed model to

query
handle inserting additional data

WWeep \OwO/

i mean embedded job is to just make an array which use to check similarity

LLogan M

yes... so you index all your data by embedding it.

Then at query time, you need to embed your query as well, to perform the search

WWeep \OwO/

why we need it at this stage

Attachments

LLogan M

Yea because .query() needs to embed the query, otherwies you cant perform a search

LLogan M

A vector index compares the embeded query to all your embeded data

WWeep \OwO/

by in ollama we use text directly on prompt

WWeep \OwO/

isnt query of llama doing same thing as this ?

LLogan M

Yea, thats for generating a response once you have some data

But to get that data, you need to embed your query and perform a search

The overall process goes like:

Create an index, by chunking and embedding your data (uses an embed model)
retrieve from your index, by embedding the query and performing a search (uses an embed model)
using the retrieved nodes/text chunks and your query text, generate a response with the LLM

WWeep \OwO/

yah this is what it should be

WWeep \OwO/

i have done the second step correctly // i use it on my 1st summary embedded and 2nd on question embedded to check it then i put the most similar text query why it reask for embedded model when its job is already been done

WWeep \OwO/

3rd stage still need embedded model ?

WWeep \OwO/

i just want to use ollama embedded model

LLogan M

If you want to use ollama as the embedding model, then use it 🙂

Plain Text

pip install llama-index-llms-ollama llama-index-embeddings-ollama

Plain Text

from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm = Ollama(model="<some model>", request_timeout=3600.0)
Settings.embed_model = OllamaEmbedding(model_name="<some model>")

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(similarity_top_k=2)

response = query_engine.query("hello world")

WWeep \OwO/

tried it earlier the performance was really worse

WWeep \OwO/

6 sec it take while ollama itself take 0.2sec

WWeep \OwO/

embedded performance on llama was pretty low

LLogan M

that sounds about right to me

LLogan M

Ollama is not really that fast

WWeep \OwO/

why

LLogan M

Especially when you pack the prompt, or need to embed a lot of text

LLogan M

because its running locally, and running locally requires powerful hardware

LLogan M

its fine enough for local dev

WWeep \OwO/

i didnt got it what u mean by local isnt ur local as well

LLogan M

like, you are running the LLM and embedding model on your computer directly, when you use ollama

LLogan M

other APIs (openai, anthropic, etc.) are running models on huge serverfarms, not on your computer, so they are much faster

WWeep \OwO/

ah i have notice that they use online way to work

WWeep \OwO/

is their way to make it completely offline

WWeep \OwO/

i really didnt understand this one as well when u mention it faster but it seems to be slower then 5sec compare to local run

LLogan M

I don't really know what you mean anymore 😆 A little lost.

In summary though

ollama runs locally on your computer
ollama will be slower than openai
BUT with ollama, its free, and the data doesn't leave your computer

WWeep \OwO/

that correct point if it extensively used

WWeep \OwO/

if just 20people use i dont think local would cause any harm

WWeep \OwO/

anyway bro

Plain Text

index = VectorStoreIndex.from_vector_store(vector_store)

WWeep \OwO/

how can i avoid embedded model on it?

WWeep \OwO/

is it un avoidable ?

WWeep \OwO/

should i directly feed the similarity on to ollama ? it does work if i do it like that but i dont know if i use any benefit of llama index

WWeep \OwO/

i am bit curious what all the trouble using llama index in the first place

LLogan M

its unavoidable. The embedding model is intialized because you can still insert more data index.insert() and that same embedding model is also used when index.as_query_engine() or index.as_retriever() is called

The sample above just sets global defaults for the LLM and embed model, so you don't really have to think about it

LLogan M

I don't know what this means

LLogan M

I think there is just some confusion here on how RAG works, and what happens at each step

WWeep \OwO/

first of all bro thnx alot for giving me such precious time of urs 🤍

WWeep \OwO/

find the similarity create an best result onto context given to ollama

LLogan M

If you are using that low-level of the API, then yea, you could just give the nodes to the LLM in a prompt, or use a response synthesizer

WWeep \OwO/

response synthesizer what that

WWeep \OwO/

havent heard about it

WWeep \OwO/

can u show me an example of using response syntheszier with this

WWeep \OwO/

Plain Text

from llama_index.core.data_structs import Node
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core import get_response_synthesizer


query_result = vector_store.query(query_obj)
response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT
)

response = response_synthesizer.synthesize(
    "who is moshin", nodes=[query_result]
)

is this correct ?

yea thats exactly it

well

from llama_index.core.schema import NodeWithScore

WWeep \OwO/

bit seeing this line make me wonder but it return an array who do we assign each one of this after setting tuple of score of similarity

LLogan M

Plain Text

nodes = []
for node, score in zip(query_result.nodes, query_result.similarities):
  nodes.append(NodeWithScore(node=node, score=score)

response = response_synthesizer.synthesize(
    "who is moshin", nodes=nodes
)

WWeep \OwO/

ah i see score is similarity knew it

WWeep \OwO/

LLogan M

You are using some very low-level APIs right now, normally this is all handled for you lol

WWeep \OwO/

thnx this is amazing i think the filter is bit same work as well OwO again thnx a million

WWeep \OwO/

https://docs.llamaindex.ai/en/stable/module_guides/querying/response_synthesizers/response_synthesizers/

Attachment

LLogan M

Yea thats python -- await has to be used in an async function

LLogan M

async doesn't make it go faster or anything, it only makes sense for when you are running in a sever, or are executing multiple async things at the same time

WWeep \OwO/

all right >.<

Add a reply