Find answers from the community

Updated 3 months ago

Batch

At a glance

The community members are discussing the possibility of using the LlamaIndex LLM class or other LlamaIndex abstractions to access the OpenAI/Anthropic Message Batching API. One community member suggests that the current abstractions are built around real-time interactions, but another community member believes they have a good use case for it.

The community members then discuss a proposed solution that involves submitting a batch during a pipeline process and then checking the batch status to update the nodes. The advantages mentioned are cost savings and the ability to handle offline/asynchronous scenarios where the computer can be shut off or the code can be interrupted without issue.

The proposed solution would be entirely stateless and use node IDs as batch job IDs to track the processing status of each node. This would allow the user to submit the batch and then check on it later, without needing to keep a Python script running for days.

There is no explicitly marked answer in the comments, but the community members seem to be collaborating on a potential solution to the original question.

hey all. is there a way to use the llama index LLM class (or other llama index abstractions), to access the openai/anthropic Message Batching API?
L
S
11 comments
Not really. All the abstractions are built around real time interactions πŸ€”
i think I have a pretty good use case for it.
I would just use the raw openai client. Or if you want to make a pr, I can review it πŸ™‚
maybe!! but one Pr at a time eh? Here's my proposed usecase/interface:
Plain Text
# Part 1: Submit batch during pipeline - remember a documentcontextextractor can literally take # hours to run - or days in the extreme case. batch processing can cut costs by 50%!

extractor = DocumentContextExtractor(
    docstore=docstore,
    llm=llm,
    mode="submit_batch"
)

# Must be last transform in pipeline
index.update_nodes(transforms=[transform_A, transform_B, transform_C, extractor])
index.persist(...)


# Part 2: Check batch status and update nodes

# first load index
index = ...

extractor.set_mode("process_batch")
while not extractor.is_batch_complete():
    num_completed = index.update_nodes(transforms=[extractor])
    # Maybe return number of nodes updated for user feedback
    time.sleep(...)  # User controls polling frequency

# After this, user can do whatever they want with their context-enabled nodes
I guess the advantage here is just cost savings? Otherwise async calls will achieve similar right?

Yea. Would need to be added to the llm interference for llms that support that, the concept doesn't quite exist yet in the codebase
(the above code requires either raw api calls under the hood, using openai library, or an addition to the LLM interface to do batch processing)
yep yep. cost, and also offline/asynchronous
the computer can be shutoff or the code can crash/get interrupted and its fine
my proposed solution above would be entirely stateless too. it can use node-id as the batch job id, and just check which nodes are already processed (if they have a 'context' key) and which ones are currently waiting for procsesing and which ones are ready.
with this approach you dont need to keep a python script running for days. you just submit, then check again in a day or two
Add a reply
Sign up and join the conversation on Discord