max_retries
parameter is passed to openai's OpenAI
client ("native" SDK) as well as it is used by llm_retry_decorator
which wraps methods such as chat
. This leads to retrying 3x3 times when we think we only retry 3 times. Is that on purpose or am I missing something?query_str
that is used by both retriever and synthesizer but additional arguments foo
and bar
for example that would partial format the prompt from the synthesizer? I've been unsuccessful until now with run_multi
because I don't see how to pass additional variables to format the prompt template to the synthesizerfilename = xxx
?import nest_asyncio nest_asyncio.apply() ... service_context = ServiceContext.from_defaults(llm=llm, chunk_size=1024) response_synthesizer = get_response_synthesizer( use_async=True, response_mode="tree_summarize", summary_template=PromptTemplate(custom_tmpl) ) doc_summary_index = DocumentSummaryIndex.from_documents( tables, service_context=service_context, response_synthesizer=response_synthesizer, show_progress=True, use_async=True, )
tables
is summarized one by one, waiting for the previous to be completed, no parallelism at all. Am I missing something? I didn't find a doc specifying how to parameterize the level of parallelism (how many call to an API in parallel for example?)index.docstore.get_node('xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx')
, however I noticed it does not work anymore (I checked carefully the id and it exist in the db indeed). Did something change?server | File "/usr/local/lib/python3.11/site-packages/llama_index/core/storage/docstore/keyval_docstore.py", line 279, in get_document server | raise ValueError(f"doc_id {doc_id} not found.")
metadata_template
and text_template
parameters of Document / TextNode in order to customize the generated prompt sent to the LLM. I noticed it was written to the metadata_
field in the DB, does it mean it can't be dynamic? Ideally, I would like to be able to change the prompt context_str
easily without re-indexing everything