@Logan M

At a glance

Plain Text

from llama_index.readers.schema.base import Document

from llmsherpa.readers import LayoutPDFReader

llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_path = "2023190_riteaid_complaint_filed.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_reader = LayoutPDFReader(llmsherpa_api_url)
doc = pdf_reader.read_pdf(pdf_path)

doc = pdf_reader.read_pdf(pdf_path)
for chunk in doc.chunks():
    # Create a Document object for each chunk.
    document = Document(text=chunk.to_context_text(), extra_info={})

is this how you would use the libary to chunk it i'm a bit confused lol

26 comments

LLogan M

uhhhh I've never used llmsherpa, but that looks like you are creating document objects correctly yes

DDangFutures

do i convert to nodes after?

DDangFutures

yeah...AttributeError: 'tuple' object has no attribute 'id_'

LLogan M

what are you doing with the document objects next?

LLogan M

Generally you would give them to an index, and ingestion pipeline, or parse them into nodes with a node parser/text splitter

DDangFutures

i gave up lol but you can help fixt this prompt template for me @Logan M

Plain Text

`
qa_prompt_tmpl_str = (
    "<|im_start|>Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query\n<|im_end|>"
    "<|im_start|>user: {query_str}\n"
    "<|im_start|>assistant Answer: "
)

my liife would be easier if i just used closed source models

DDangFutures

heres the format

Attachment

LLogan M

Rather than modifying the prompt template, it might be easier to set messages_to_prompt/completion_to_prompt on the llm?

But anyways,

Plain Text

qa_prompt_tmpl_str = (
    "<|im_start|>user\n"
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the query:\n"
    "{query_str}<|im_end|>\n"
    "<|im_start|>assistant\n"
)

Might be more accurate?

DDangFutures

@Logan M cant do that for vllm right

LLogan M

Oh right -- vllm should be automatically applying the prompt templates

LLogan M

at least from my understanding

LLogan M

you should just need to pass plain old text

LLogan M

(maybe I read that wrong somewhere)

LLogan M

ugh that vllm class is so messy :PSadge: I need to clean that up

DDangFutures

yeah or vlllm needs to clean it up sad boy

DDangFutures

my model lies

Attachment

DDangFutures

also you should know im not getting any lies when using hte vllm langchian wrapper

LLogan M

Are you doing any other prompt setup for langchain?

LLogan M

But also, if it works better, definitely use it in llama-index lol

LLogan M

their LLM code isn't doing anything differently

LLogan M

Seems pretty equivilant in terms of invoking the LLM

Attachment

DDangFutures

not really same set up as llama except the wrapper

DDangFutures

oh interesting

DDangFutures

i didnt set temperature using llama but for langchain at set at .3

DDangFutures

also is there a way to test an llms output like the one in the finetune retrival test

LLogan M

Like, test the accuracy of the output? We have some eval stuff, but it mostly relies on using another llm (like gpt-4) to act as a judge for various aspects

Add a reply

Find answers from the community

@Logan M