CHY4E

I have two html files

I have two html files
one is
"Christmas Party 2023"
and one is
"Christmas Party 2019"
my prompt templates include this string:

 "Given the context information and not prior knowledge, "
    "answer the query. The current date is 30.10.2023, please select only relevant data when creating the answer.\n"
    "Query: {query_str}\n"
    "Answer: "

and it correctly identifies that one of them is outdated and selects only the date provided in the 2023 version.
however, it mixes the activities from 2019 with 2023
the html files look like this:

Plain Text

<h1 id="title-text" class="with-breadcrumbs">
                                                <a href="/display/IN/Weihnachtsfeier+2023">Weihnachtsfeier 2023</a>
                                    </h1>
Erstellt von Peter Lustig, zuletzt geändert von Peterson Findus am Okt 30, 2023
<p><strong>Die Weihnachtsfeier findet am Freitag, den 15.12.2023 statt.
<description of activites here>

the context the llm receives looks like this:

Plain Text

Weihnachtsfeier 2023 Erstellt von Peter Lustig, zuletzt geändert von Peterson Findus am Okt 30, 2023

Die Weihnachtsfeier findet am Freitag, den 15.12.2023 statt.
<description of activites here>

the only problem I would see (from the logs) is that the llm receives both contexts right after another, without any seperator.

how could I tackle this challenge?

6 comments

CCHY4E

I just cant reliably get the right

I just cant reliably get the right context for the llm, this is my current code:

Plain Text

embed_model = HuggingFaceBgeEmbeddings(model_name="sentence-transformers/gtr-t5-xxl")
service_context = ServiceContext.from_defaults(llm=llm,embed_model=embed_model)
set_global_service_context(service_context)

UnstructuredReader = download_loader('UnstructuredReader')
dir_reader = SimpleDirectoryReader('./data', file_extractor={
  ".html": UnstructuredReader(),
})
documents = dir_reader.load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True);

sometimes the context is good, sometimes its completely offtopic
my entire data is exported HTML from a company confluence
I went trough https://huggingface.co/spaces/mteb/leaderboard and tried most of the top multilanguage models, some are better, some are worse, but they all fail to find relevant context in 70% of the cases
what else can I try to improve accuracy?

edit:
just found a benchmark specifically for german, will test the top model there
https://github.com/ClimSocAna/tecb-de
but question still stands, any ways to improve?

9 comments

CCHY4E

Llama2

I'm using llamaindex to answer questions (with lama2 70b) with data from a confluence. I first tested pdf exports and saw "okay" results. trying to improve this, I switched to html pages, using UnstructuredReader as parser/extracter.
however now my responses are really bad, questions that where easy at first are now not answered with garbage or "I dont know"
how could I fix this?

5 comments

CCHY4E

Indexing

whats the best format for indexing?
I have html files and I covert them to markdown before indexing, is that a good idea?

1 comment

CCHY4E

Right now I'm using the default

Right now I'm using the default configuration with VectorIndexRetriever, with the index being on the filesystem.
Would it bring notable performance improvements to use something dedicated like chroma or pinecone?

1 comment

CCHY4E

Embedding benchmark

how could I benchmark multiple embedding models on my data?
Seems like the embedding model is the most important. If the context isnt precise, the llm has to go trough more data and increases the time needed. And if the correct context isn't found, there obviously is no answer.

13 comments

CCHY4E

Hi, I'm using a fine tuned version of

Hi, I'm using a fine tuned version of llama2 with 13b parameters, the llm is ran by Ollama.
for response mode I'm using "refine" as the responses are much better than "compact"
my top_k is 3
responses take about 30 Seconds, most of that time is used on the llm.
is there any way to improve that speed without impacting quality of responses? I know it has to make many requests because of refine, but it just provides the best answers (from my testing).
I have two rtx 4090, however I think ollama already uses both.

4 comments

CCHY4E

I'm using llama2 70b as llm and "local:

I'm using llama2 70b as llm and "local:BAAI/bge-large-en-v1.5" as embedding model, however all of my content is in german.
I'm not sure if the embedding model is really the best choice then, as it mentions "en", but I cant find anything that supports german or anything in that direction.
Also, llama2 only responds in english, even tho it can respond in german when asked to, but somehow doesnt when its run over llamaindex.

Any advice on improving the whole pipeline when using exclusivly german data to index and query?

3 comments

Find answers from the community

I have two html files

I just cant reliably get the right

Llama2

Indexing

Right now I'm using the default

Embedding benchmark

Hi, I'm using a fine tuned version of

I'm using llama2 70b as llm and "local: