Find answers from the community

Home
Members
Sergio Casero
S
Sergio Casero
Offline, last seen 4 months ago
Joined September 25, 2024
Is as_chat_engine ignoring both parameters?
We've tried with system_prompt too with no success, it happens not only with OpenAI, but also with Vertex and LLaMA, we're missing something
1 comment
L
Hello all! Trying to use llama with vertex ai, but I'm getting this error:

Plain Text
TypeError: BaseLLM.predict() got an unexpected keyword argument 'context_str'


Any help?

This is my code:
Plain Text
def query_google_llm(chat, query):
    response = chat.send_message(query)
    print(response.text)
    return response.text

chat = build_google_llm()

class PaLM(LLM):

    model_name = "Bard"
    total_tokens_used = 0
    last_token_usage = 0

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        print("prompt: ", prompt)
        response = query_google_llm(chat, prompt)

        print("response: ", response)
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"
3 comments
n
L
S
Hello again, just trying to refine the behavior between Vicuna and llama_index, I can get the responses from the model, but looks they're lost because of the "second question".

I have this prompt template

Plain Text
QA_PROMPT_TMPL = (
    "### Human: Considering the following code:\n"
    "{context_str}\n"
    "{query_str}\n ### Assistant: \n"
)


If I print the response inside CustomLLM._call method I see this response:

  1. Production Machine Data Source is a data source class for vending machines that provides a set of APIs to interact with the machine. The creator of this class is "Sergio Casero" and it was created on 18/04/2023.
For this question:

Who creates the code?, This is so nice, but if I print the response from response = index.query("Who creates the code?", text_qa_template=QA_PROMPT, similarity_top_k=1), I get an empty response, any ideas?
6 comments
L
S
Hello again, still trying to estimate the costs, I see some strange things,

This is my code:

(Only called first time):
Plain Text
def train(path):
    tokens = 0
    name = path.split("/")[-1]

    # get the documents inside the folder
    documents = SimpleDirectoryReader(path).load_data()
    print("Starting Vector construction at ", datetime.datetime.now())
    index = GPTSimpleVectorIndex.from_documents(documents)

    index.save_to_disk("indexes/" + name + ".json")

    return tokens


Now, I just call this another method:
Plain Text
def query(query, toIndex):
    index = GPTSimpleVectorIndex.load_from_disk("indexes/" + toIndex + ".json")
    response = index.query(query)
    return response

response = query("question", "data")


This is what the console output says:
Plain Text
INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 5002 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 23 tokens


But this is what OpenAI billing console says:
Plain Text
11:35
Local time: 30 mar 2023, 13:35
text-davinci, 2 requests
4,483 prompt + 512 completion = 4,995 tokens
11:35
Local time: 30 mar 2023, 13:35
text-embedding-ada-002-v2, 2 requests
56,906 prompt + 0 completion = 56,906 tokens


is that right? 🤔
15 comments
L
S
Hi folks, first of all, thanks for this awesome job

I'm trying to estimate the costs of the "training", the use case is the following: I have lot of pdfs and I want to integrate them with LLM. By using the MockLLMPredictor, I get the following info attached (all of them based on SimpleDirectoryReader, same dir), the question is... does these values have sense?, the "per query" it's obviously a query estimation based on 5 five queries made with Mocks.
19 comments
S
L