It can sometimes fail to find an answer depending on the query right?
with the same documents in the index?
let me do some hit and try
any way that we force it to search entire context for a keyword
temperature expands the possibility of responses as per my understanding
so temperature is 0.25 means
and gettings this -
VectorStoreIndex' object has no attribute 'insert_ref_doc
I think it is insert not insert_ref_doc
ha yea you are right, it's just insert(document)
, forgot we never added that re-named method
# define custom retriever
vector_retriever = vector_index.as_retriever(similarity_top_k=10)
keyword_retriever = keyword_index.as_retriever()
custom_retriever = CustomRetriever(vector_retriever, keyword_retriever)
rerank = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5
)
I think we should add similarity_top_k to keyword_retriever also?
service_context is only used at the time of querying? Right?
theres no similarity for keywords -- it just fetches all nodes that contain the same keywords as the query
service context contains multiple things. The embedding model for example is used during vector index construction and querying
ok ! so we pass service_context in both cases then?
and then I'll disturb you again π
Is the service_context passed correctly -
vector_index = VectorStoreIndex([], storage_context=storage_context, service_context=service_context)
keyword_index = SimpleKeywordTableIndex([], storage_context=storage_context, service_context=service_context)
does it supports streaming?
custom_query_engine = RetrieverQueryEngine(retriever=custom_retriever,response_synthesizer=response_synthesizer,streaming=True,node_postprocessors=[rerank])
I tried to pass it this way
but that doesn't turn out to work
before I was doing this way
vectorIndex.as_query_engine(response_mode="tree_summarize", similarity_top_k=10, streaming=True, node_postprocessors=[rerank])
you can set streaming in the response synthesiszer
response_synthesizer = get_response_synthesizer(..., streaming=True)
and found it should be passed in streaming
That seems to work but response time is increased
Any suggestion for decreasing that
maybe changing the re-ranker to top 2 instead of top 3? Response times do just vary depending on how busy openai is too
Yeah but we were using openai before also
But may be we doing deep search
That's why it's increased
Hey! Doing good! π hbu?
About passing chat history
Do you know how we can do that
In llama as well as in openai chat completion api
Ok so when using chat engine is it necessary to send chat history'
Or it remember by itself?
It will remember it by itself -- every chat engine starts with a fresh memory
does llama supports gpt-4?
from llama_index.llms import OpenAI
llm = OpenAI(model="gpt-4", temperature=0.1)
service_context = ServiceContext.from_defaults(llm=llm)
Which is the best vector store you suggest
That have great accuracy speed and results
I am using qdrant but results are not so convincing
I don't think the results will change much between vector stores tbh -- they all do the same thing, which is vector similarity π€
Maybe weaviate is worth trying?
And is there any limitations in terms of document count
So that it generates good results
Not really size limits, but more like things will work better if your collections are organized (i.e. per topic, etc.)
The bigger an index gets, you may also need to play with the top k
Ok that good let me try the one u suggested
can we host weaviate locally?
what is embedded installation?
Yea it's a weird way to name it haha
yeah! they should have said app-integrated
embedded takes mind to circuits
I don't have that much data to Index, around 5Gb of Markdown files for a RAG chatbot but I will need the Vector database to support 300k concurrent users. In that case which one would you recommend? Qdrant vs Milvus vs Weaviate vs Chroma vs Pinecone?