Find answers from the community

Updated 7 months ago

llama index people dont update the

llama index people dont update the documentations when updating anything ?
L
W
70 comments
whats the issue my guy
already 3-4days no one helping
this part u see
Attachment
image.png
one more issue is like using docs is forcing us to use embedded model after similarity been found out
why we need embedded model for index = VectorStoreIndex.from_vector_store(vector_store)
Yea, its saying its deprecated. Should be more like

Plain Text
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)


filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", operator=FilterOperator.EQ, value="Mafia"),
    ]
)


https://docs.llamaindex.ai/en/stable/examples/vector_stores/chroma_metadata_filter/?h=metadatafilter
I'm not sure what you mean here. The index needs an embed model to
  • query
  • handle inserting additional data
i mean embedded job is to just make an array which use to check similarity
yes... so you index all your data by embedding it.

Then at query time, you need to embed your query as well, to perform the search
why we need it at this stage
Attachments
image.png
image.png
Yea because .query() needs to embed the query, otherwies you cant perform a search
A vector index compares the embeded query to all your embeded data
by in ollama we use text directly on prompt
isnt query of llama doing same thing as this ?
Yea, thats for generating a response once you have some data

But to get that data, you need to embed your query and perform a search

The overall process goes like:
  • Create an index, by chunking and embedding your data (uses an embed model)
  • retrieve from your index, by embedding the query and performing a search (uses an embed model)
  • using the retrieved nodes/text chunks and your query text, generate a response with the LLM
yah this is what it should be
i have done the second step correctly // i use it on my 1st summary embedded and 2nd on question embedded to check it then i put the most similar text query why it reask for embedded model when its job is already been done
3rd stage still need embedded model ?
i just want to use ollama embedded model
If you want to use ollama as the embedding model, then use it πŸ™‚

Plain Text
pip install llama-index-llms-ollama llama-index-embeddings-ollama


Plain Text
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

Settings.llm = Ollama(model="<some model>", request_timeout=3600.0)
Settings.embed_model = OllamaEmbedding(model_name="<some model>")

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine(similarity_top_k=2)

response = query_engine.query("hello world")
tried it earlier the performance was really worse
6 sec it take while ollama itself take 0.2sec
embedded performance on llama was pretty low
that sounds about right to me
Ollama is not really that fast
Especially when you pack the prompt, or need to embed a lot of text
because its running locally, and running locally requires powerful hardware
its fine enough for local dev
i didnt got it what u mean by local isnt ur local as well
like, you are running the LLM and embedding model on your computer directly, when you use ollama
other APIs (openai, anthropic, etc.) are running models on huge serverfarms, not on your computer, so they are much faster
ah i have notice that they use online way to work
is their way to make it completely offline
i really didnt understand this one as well when u mention it faster but it seems to be slower then 5sec compare to local run
I don't really know what you mean anymore πŸ˜† A little lost.

In summary though
  • ollama runs locally on your computer
  • ollama will be slower than openai
  • BUT with ollama, its free, and the data doesn't leave your computer
that correct point if it extensively used
if just 20people use i dont think local would cause any harm
anyway bro
Plain Text
index = VectorStoreIndex.from_vector_store(vector_store)
how can i avoid embedded model on it?
is it un avoidable ?
should i directly feed the similarity on to ollama ? it does work if i do it like that but i dont know if i use any benefit of llama index
i am bit curious what all the trouble using llama index in the first place
its unavoidable. The embedding model is intialized because you can still insert more data index.insert() and that same embedding model is also used when index.as_query_engine() or index.as_retriever() is called

The sample above just sets global defaults for the LLM and embed model, so you don't really have to think about it
I don't know what this means
I think there is just some confusion here on how RAG works, and what happens at each step
first of all bro thnx alot for giving me such precious time of urs 🀍
find the similarity create an best result onto context given to ollama
If you are using that low-level of the API, then yea, you could just give the nodes to the LLM in a prompt, or use a response synthesizer
response synthesizer what that
havent heard about it
can u show me an example of using response syntheszier with this
Plain Text
from llama_index.core.data_structs import Node
from llama_index.core.response_synthesizers import ResponseMode
from llama_index.core import get_response_synthesizer


query_result = vector_store.query(query_obj)
response_synthesizer = get_response_synthesizer(
    response_mode=ResponseMode.COMPACT
)

response = response_synthesizer.synthesize(
    "who is moshin", nodes=[query_result]
)
is this correct ?
yea thats exactly it
from llama_index.core.schema import NodeWithScore
bit seeing this line make me wonder but it return an array who do we assign each one of this after setting tuple of score of similarity
Plain Text
nodes = []
for node, score in zip(query_result.nodes, query_result.similarities):
  nodes.append(NodeWithScore(node=node, score=score)

response = response_synthesizer.synthesize(
    "who is moshin", nodes=nodes
)
ah i see score is similarity knew it
You are using some very low-level APIs right now, normally this is all handled for you lol
thnx this is amazing i think the filter is bit same work as well OwO again thnx a million
Yea thats python -- await has to be used in an async function
async doesn't make it go faster or anything, it only makes sense for when you are running in a sever, or are executing multiple async things at the same time
Add a reply
Sign up and join the conversation on Discord