OpenAILike
to talk to a Vllm instance. I need to pass a custom stop token, and currently the only way i can figure out how to do this is like this:llm.complete(prompt, True, extra_body={"stop_token_ids":[...]})
llm.predict
because it interprets all remaining kwargs as prompt template expansion arguments. Is there any other way to get this KV in the outoging OpenAI-API request?VectorIndexAutoRetriever
- it figured out the metadata correctly but it doesn't seem to make an embedding for the query, struggling to figure out why.INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using query str: INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using filters: [('topic', '==', 'TOPIC0')] INFO:llama_index.core.indices.vector_store.retrievers.auto_retriever.auto_retriever:Using top_k: 10 Traceback (most recent call last): File "ex1.py", line 79, in <module> print(retr.retrieve("What is topic TOPIC0?")) File "site-packages/llama_index/core/base/base_retriever.py", line 229, in retrieve nodes = self._retrieve(query_bundle) File "site-packages/llama_index/core/base/base_auto_retriever.py", line 37, in _retrieve return retriever.retrieve(new_query_bundle) File "site-packages/llama_index/core/base/base_retriever.py", line 229, in retrieve nodes = self._retrieve(query_bundle) File "site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 94, in _retrieve return self._get_nodes_with_embeddings(query_bundle) File "site-packages/llama_index/core/indices/vector_store/retrievers/retriever.py", line 170, in _get_nodes_with_embeddings query_result = self._vector_store.query(query, **self._kwargs) File "site-packages/llama_index/core/vector_stores/simple.py", line 273, in query top_similarities, top_ids = get_top_k_embeddings( File "site-packages/llama_index/core/indices/query/embedding_utils.py", line 30, in get_top_k_embeddings similarity = similarity_fn(query_embedding_np, emb) File "site-packages/llama_index/core/base/embeddings/base.py", line 47, in similarity product = np.dot(embedding1, embedding2) TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'
llama_index.core.query_engine.ToolRetrieverRouterQueryEngine
still requires a ServiceContext
to operate?LinearLayer
is missing from llama_index/finetuning/embeddings/adapter_utils.py
It's imported from adapter.py
in the same namespace. (from llama-index-finetuning
)ReActAgent
and when i do a .chat
it works as expected, but if i do .stream_chat
and then .print_response_stream
it streams the first chain-of-thought step text instead. Did i miss a step somewhere?GuardrailsOutputParser
working when i pass it as an argument to my LlamaCPP
constructor, but when i try to apply it separately via a QueryPipeline
I can't quite figure out how to configure it. If i pass it the llm
instance during parser construction i get:AttributeError: 'LlamaCPP' object has no attribute '__call__'. Did you mean: '__class__'?
ValueError: API must be provided.
extract
, but aextract
is the abstract method in that base class.output_cls
parameter of TreeSummarize
to extract some common information from my documents. I have metadata in place that will allow me to capture exactly the document subset I want to summarize over. SummaryIndex
and then filter on the metadata after retrieval. I also tried using my normal vector store with filters set during retriever creation, but it is hard to create a query that captures all the documents pre-filtering, because there is no default embedding for an empty query and i can't see a place to pass one in (like you can with the VectorIndexAutoRetriever
).