Nvidia module issues with nemotron models

At a glance

The community member is using the NVIDIA module from the llama-index-llms-nvidia package, but is encountering issues with the "nemotron" models, which work fine with the native OpenAI library. They are getting a 404 Not Found error when trying to use the "nvidia/nemotron-4-51b-instruct" model.

The comments suggest that the community member should try upgrading to the latest version of the llama-index-llms-nvidia package, which is 0.2.6, as it may fix the issue. Another community member mentions that the model name may be a typo and should be "nvidia/llama-3.1-nemotron-51b-instruct" instead.

The community member also tried using the "nvidia/nemotron-4-340b-instruct" model, but encountered a 400 Bad Request error. They are unable to upgrade to the latest version of the package due to dependency conflicts, but the head of open source for llama-index suggests that the latest version should fix the issue.

The community member also inquires about incorporating Nemoguardrails into the llama-index workflow, and the head of open source confirms that any LLM can be used in the workflow, and that llama

Useful resources

FFullEmpty

Hi, I'm using NVIDIA module. It works fine for any model, except for nemotron models (which work smooth with native OpenAI library). Is there any idea on that?

Plain Text

from llama_index.llms.nvidia import NVIDIA

    Settings.llm = NVIDIA(model="nvidia/nemotron-4-51b-instruct", ...)

Plain Text

INFO:httpx:HTTP Request: POST https://integrate.api.nvidia.com/v1/chat/completions "HTTP/1.1 404 Not Found"

...

  File "/home/0/miniconda3/envs/gpu_rag/lib/python3.10/site-packages/openai/_base_client.py", line 1058, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: 404 page not found

8 comments

LLogan M

Not sure, but I can pass this along to the nvidia folks (they maintain this integration)

LLogan M

Do you have the latest version of the package? pip install -U llama-index-llms-nvidia ?

LLogan M

Is that maybe a typo?

nvidia/llama-3.1-nemotron-51b-instruct instead of nvidia/nemotron-4-51b-instruct ?

https://build.nvidia.com/nvidia/llama-3_1-nemotron-51b-instruct?snippet_tab=Shell

FFullEmpty

Thank you for your reply! The snippet I pasted was sort of amalgamated. I was testing "nvidia/nemotron-4-340b-instruct". The exact code and erorr message is:

Plain Text

from llama_index.llms.nvidia import NVIDIA

    Settings.llm = NVIDIA(model="nvidia/nemotron-4-340b-instruct", ...)

Plain Text

INFO:httpx:HTTP Request: POST https://integrate.api.nvidia.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
2024-10-30 13:47:25.979 Uncaught app exception
Traceback (most recent call last):

...

/home/0/miniconda3/envs/gpu_rag/lib/python3.10/site-packages/openai/_base_client.py", line 1058, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'type': 'about:blank', 'status': 400, 'title': 'Bad Request', 'detail': 'Inference error'}

I'm using llama-index-llms-nvidia 0.1.4 , but I can't upgrade to the latest on my project due to the dependency conflict. I tested, though, with

Plain Text

pip install -U llama-index-llms-nvidia --no-deps

but it gave the same error.
I'm not sure the upgrade with dependencies will fix the error. Thank you again!!

LLogan M

the latest is 0.2.6 for the nvidia llm, so it will almost certainly fix this I think 🤔 They've done a lot to maintain this class

FFullEmpty

You really seem to be keeping up with their progress. Then, do you happen to know Llamaindex workflow can incorporate Nemoguardrails as one of multi-agents or by as_query_engine, or the workflow:

Plain Text

from llama_index.core.llms import ChatMessage
from llama_index.core.tools import ToolSelection, ToolOutput
from llama_index.core.workflow import Event


class InputEvent(Event):
    input: list[ChatMessage]


class ToolCallEvent(Event):
    tool_calls: list[ToolSelection]


class FunctionOutputEvent(Event):
    output: ToolOutput

(is the workflow module also deprecated by llama_deploy? please say no...🥺 )

LLogan M

I'm the head of open source for llama-index lol so I hope I keep up with it 😅

You can use any llm you want in any workflow

Llama-Deploy is just one way to host your existing workflows as services 👍

FFullEmpty

🫣 You're the Logan! I'm stoked 🤩

I just found this one, so I'm goana give it a try: Building a multi-agent concierge system from scratch

Eager to see it work on me 🧐 Have a wonderful day or night!

Add a reply

Find answers from the community

Nvidia module issues with nemotron models