Find answers from the community

j
jaykob
Offline, last seen 3 months ago
Joined September 25, 2024
I have a bit more of a targeted use case maybe someone has tried that I'd like to discuss
Essentially I'm trying to provide a list of questions for an LLM to answer from a document (who was involved, what day did this occur), and return these answers in a more structured format (csv, json, whatever) to begin building a structured table that could then be queried by SQL (probably using an NL-SQL model)
I'm trying to understand if anyone has tried to generate more structured data out of the summarised responses from an LLM and if there is a good llamaindex/langchain way to go about this, short of generating custom prompts to do this entirely?
5 comments
L
j
Hey everyone, just starting my LLM journey in trying to get my own chatbot running with llama-2-chat and had a few general and llamaindex specific questions I was hoping someone could help me with:
1) This probably seems like a naive question but it's never explicit anywhere - does the system prompt itself reduce the amount of tokens available? I.e. if i instantiate my model with 2048 tokens and my system prompt is 48 tokens I only have 2000 tokens left to work with right?
2) What is the best way of passing a custom prompt to llama 2 via llama-cpp-python (and the new associated llamaindex wrapper)? I'm unclear if system tokens should or should not be appearing in my output with the new default prompts
3) What exactly is a good way of dealing with historical memory / large knowledge bases? In a conversational chatbot ideally it should be able to keep track of things previously said, is this just as simple as adding a running history of inputs and outputs as part of the prompt? Won't this eventually blow out my token limit? Similarly in trying to summarise or aggregate across large knowledge bases how can I get my model to summarise a 20 page pdf if trying to provide that information goes beyond the token size available?

Huge thanks in advance
1 comment
L
Hi I just had a question about using LlamaCPP bindings to .complete() a variable number of new tokens for a request
llama_cpp.py/LlamaCPP
Plain Text
    @llm_completion_callback()
    def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
        self.generate_kwargs.update({"stream": False})

        is_formatted = kwargs.pop("formatted", False)
        if not is_formatted:
            prompt = self.completion_to_prompt(prompt)

        response = self._model(prompt=prompt, **self.generate_kwargs)

        return CompletionResponse(text=response["choices"][0]["text"], raw=response)


Am I missing something or does this function not actually use any of the provided kwargs to update the generated kwargs?
5 comments
L
j
j
jaykob
·

Parsing

Hi everyone, hoping to have a bit more discussion as I don't think I understand the tools llamaindex provides fully to see how to use them in my problem.
Essentially I'm trying to work with news articles about building constructions. The way i see it is that users could ask two kinds of questions:
  • Targeted questions about a specific building
  • Aggregated questions about a company or area or set of buildings
Targeted questions seem straightfoward enough. Assuming the information I'm looking for is within a single article, and I've embedded that entire article it should be returned as context and the LLM can parse out an answer.
Aggregated questions are where my knowledge starts to fall down. I'm having trouble understanding how I'd be able to process and parse potentially hundreds of articles to answer a question like "Which buildings did XYZ build?"
Does the key in this lie in how I'm processing my documents into nodes? Into using stacked indices? I'd appreciate any thoughts or feedback from people who may have encountered similar issues
6 comments
b
L
j
I am running into a weird issue with PGVector I was hoping someone might be able to assist with
I have a separate file to pre-process some data into my postgres database
Plain Text
documents = SimpleDirectoryReader(DATA_FOLDER + COLLECTION_NAME).load_data()

url = make_url(f"postgresql://...")
vector_store = PGVectorStore.from_params(
    #database conn details...
    table_name=TABLE_NAME,
    #embed_dim=1536,  # openai embedding dimension
    embed_dim=768, # bge-base-en 
)

service_context = ServiceContext.from_defaults(embed_model=load_embedding_model(), llm=load_llm_model())

storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context, show_progress=True, service_context=service_context
)

query_engine = index.as_query_engine()
response = query_engine.query("What buildings did UBS management buy?")
print(response)

A response is actually generated in this file, meaning my embeddings in my postgres database are working and can be indexed

When I now use a separate file to try and query this table (which I have confirmed exists, and has the correct number of rows of data in it) I am returned 0 nodes
Plain Text
vector_store = PGVectorStore.from_params(
    #database_conn_details...
    table_name=TABLE_NAME,
    #embed_dim=1536,  # openai embedding dimension
    embed_dim=768
)

service_context = ServiceContext.from_defaults(embed_model=load_embedding_model(), llm=load_llm_model())

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, service_context=service_context
)

query_engine = index.as_query_engine()
response = query_engine.query("What building did UBS management buy?")
print(response)


Just wondering if I've done anything wrong? As far as I can tell when the index is created directly there's no problem but trying to create an index from an existing vector store doesn't seem to work
1 comment
j
Is it possible to use the structured output parser on a per-document basis rather than use it as an index?
As an example, I would like to define some number of response schemas to ask as individual questions against each document in my dataset to generate summaries for me to store in a structured table. It seemed like StructuredParsers seemed to be the best way to go about this, but it looks like they require an index to still be created rather than allowing me to define a context to supply
Is there an easy way of decoupling these parsers from an index, or should i just wrap everything in my own loop?
3 comments
L
j
Has anyone been able to extract the individual pieces of a prompt, and the final combined string being sent to LlamaCPP() using the debug handler? Ideally trying to understand how the llama_utils.messages_to_prompt/completion_to_prompt are being handled but it doesn't seem like you can extract out (system/user/context) information from the debug since it's all contained within CompletionResponse.text
12 comments
L
j
I'm trying to use the DatasetGenerator.from_documents() function but because of my local resource limitations (and not using OpenAI) I don't have enough tokens to generate the full list of questions that gets returned like it does in the docs
Is there anyway to force further generation to keep showcasing questions from the document_summary_index?
1 comment
L