Find answers from the community

Updated 3 months ago

Speed

max_input = 4096
tokens = 2048

prompt_helper = PromptHelper(max_input, tokens)

llmPredictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=tokens))

service_context = ServiceContext.from_defaults(llm_predictor=llmPredictor, prompt_helper=prompt_helper)
storage_context = StorageContext.from_defaults(persist_dir=self.persist_directory)
index = load_index_from_storage(storage_context=storage_context, index_id=self.vector_store_name, service_context=service_context)
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query(query)
answer = response.response
L
s
p
101 comments
It can sometimes fail to find an answer depending on the query right?
replies sometime
No replies sometimes
with the same documents in the index?
Yes same document
and sometimes not
hmm, no idea lol
let me do some hit and try
any way that we force it to search entire context for a keyword
like temperature does
temperature expands the possibility of responses as per my understanding
you are right πŸ‘
so temperature is 0.25 means
and gettings this -
Plain Text
VectorStoreIndex' object has no attribute 'insert_ref_doc
I think it is insert not insert_ref_doc
ha yea you are right, it's just insert(document) , forgot we never added that re-named method
Plain Text
# define custom retriever
vector_retriever = vector_index.as_retriever(similarity_top_k=10)
keyword_retriever = keyword_index.as_retriever()

custom_retriever = CustomRetriever(vector_retriever, keyword_retriever)

rerank = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-2-v2", top_n=5
)
I think we should add similarity_top_k to keyword_retriever also?
service_context is only used at the time of querying? Right?
theres no similarity for keywords -- it just fetches all nodes that contain the same keywords as the query
service context contains multiple things. The embedding model for example is used during vector index construction and querying
ok ! so we pass service_context in both cases then?
yea! πŸ‘
and then I'll disturb you again πŸ˜„
Is the service_context passed correctly -
vector_index = VectorStoreIndex([], storage_context=storage_context, service_context=service_context)
keyword_index = SimpleKeywordTableIndex([], storage_context=storage_context, service_context=service_context)
one last question
does it supports streaming?
custom_query_engine = RetrieverQueryEngine(retriever=custom_retriever,response_synthesizer=response_synthesizer,streaming=True,node_postprocessors=[rerank])
I tried to pass it this way
but that doesn't turn out to work
before I was doing this way
Plain Text
vectorIndex.as_query_engine(response_mode="tree_summarize", similarity_top_k=10, streaming=True, node_postprocessors=[rerank])
you can set streaming in the response synthesiszer

response_synthesizer = get_response_synthesizer(..., streaming=True)
done that πŸ™‚
already πŸ˜„
did it work? lol
I debugged the library
and found it should be passed in streaming
Fingers crossed!
deployed the updates
That seems to work but response time is increased
Any suggestion for decreasing that
maybe changing the re-ranker to top 2 instead of top 3? Response times do just vary depending on how busy openai is too
Yeah but we were using openai before also
But may be we doing deep search
That's why it's increased
yea thats true πŸ€”
Hey! Doing good! πŸ‘‹ hbu?
I have a question
About passing chat history
Do you know how we can do that
In llama as well as in openai chat completion api
Using the query engine, you'd have to use a chat engine/agent

Otherwise, you'd have to manually create the prompt template on each query to contain some chat history

Using the raw llm object, you can just do llm.chat() and give it a list of ChatMessage objects
https://gpt-index.readthedocs.io/en/stable/examples/llm/openai.html#call-chat-with-a-list-of-messages
Ok so when using chat engine is it necessary to send chat history'
Or it remember by itself?
It will remember it by itself -- every chat engine starts with a fresh memory
does llama supports gpt-4?
It sure does!
How Do we specify
Plain Text
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-4", temperature=0.1)
service_context = ServiceContext.from_defaults(llm=llm)
One more question
Which is the best vector store you suggest
That have great accuracy speed and results
I am using qdrant but results are not so convincing
I don't think the results will change much between vector stores tbh -- they all do the same thing, which is vector similarity πŸ€”

Maybe weaviate is worth trying?
Ok I'll give a try
And is there any limitations in terms of document count
Per collection
So that it generates good results
Not really size limits, but more like things will work better if your collections are organized (i.e. per topic, etc.)

The bigger an index gets, you may also need to play with the top k
Ok that good let me try the one u suggested
can we host weaviate locally?
what is embedded installation?
okay understood
Yea it's a weird way to name it haha
yeah! they should have said app-integrated
embedded takes mind to circuits
I don't have that much data to Index, around 5Gb of Markdown files for a RAG chatbot but I will need the Vector database to support 300k concurrent users. In that case which one would you recommend? Qdrant vs Milvus vs Weaviate vs Chroma vs Pinecone?
Add a reply
Sign up and join the conversation on Discord