Find answers from the community

Updated last year

Local api

At a glance
hi all, i am doing some project with locally installed llama 2 and following simple API interface:

{
"input": "how is weather in new york",
"context":"new york is hot in these days"
}

input the query and context should coming from the the vector DB. How i can get it integrate with existing lllamaindex library without change too much of my codes ? @WhiteFang_Jr
L
a
4 comments
Your best bet is implementing the LLM class

There's a small example here https://gpt-index.readthedocs.io/en/latest/core_modules/model_modules/llms/usage_custom.html#example-using-a-custom-llm-model-advanced

Basically in the complete/stream_complete endpoints, you'll want to send requests to your api

There's also chat/stream_chat endpoints, if you want to handle how lists of chat messages get sent to the LLM as well
thanks for the quick response. But, how to setup the API access points point ?
@Logan M further questsion, how to setup the API access point, etc in below section ?

class OurLLM(CustomLLM):

@property
def metadata(self) -> LLMMetadata:
"""Get LLM metadata."""
return LLMMetadata(
context_window=context_window,
num_output=num_output,
model_name=model_name
)

def complete(self, prompt: str, kwargs: Any) -> CompletionResponse: prompt_length = len(prompt) response = pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"] # only return newly generated tokens text = response[prompt_length:] return CompletionResponse(text=text) def stream_complete(self, prompt: str, kwargs: Any) -> CompletionResponseGen:
raise NotImplementedError()
and in the normal api interaction with llm, i believe llamaindex will qurey the llm couples time for a re-fine. How llamaindex decide stop calling llm api and return the last answer back to user ?
Add a reply
Sign up and join the conversation on Discord