Hmm that s rather odd I ll try swapping

At a glance

Hmm that's rather odd. I'll try swapping the sentence window notebook to use weaviate and confirm, it shouuuuld be fine 🤔

19 comments

JJAX

this is what i'm using:

node_parser = SentenceWindowNodeParser.from_defaults(
window_size=25,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
simple_node_parser = SimpleNodeParser.from_defaults()

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-mpnet-base-v2", max_length=512
)
ctx = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
)

nodes = node_parser.get_nodes_from_documents(all_docs)
vector_store = WeaviateVectorStore(weaviate_client=client, index_name="SentenceWindow1")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
vector_index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=ctx)

LLogan M

that is a huge window size 😅 but other than that looks fine on first glance. Let me try the original notebook with weaviate

JJAX

true, but playing around is the only thing that works well for us. smaller window size unfortunately gives truncated content which unfortunately creates some unwanted hallucinations.
besides that, im also doing top_k=10 because using a smaller one it mainly retrieves the content from a single document and while sometimes is good, for our use case its good to get more docs together 🙂

LLogan M

Are you not concerened about latency or costs? top_k=10 and window_size=25, every query must be making several LLM calls 🤔 Or maybe for your data, something about it is not causing this issue

LLogan M

While I wait for my test to run

LLogan M

in your testing with weaviate

LLogan M

do the source nodes make sense?

JJAX

you mean the retrieved nodes?

JJAX

if you mean the retrieved nodes, to give you an example if I am asking a query about city "budapest" the retrieved nodes have absolutely nothing to do with that.

JJAX

if its supposed to work as it is, then i'll investigate further and see what I find 🤷‍♂️

JJAX

right now the latency / cost is not a concern - we want to demo it internally and pass all these non-believer's judgement 😆

LLogan M

hmm I've been trying to test this, struggling to get weaviate to work even 🤔

LLogan M

constant connection errors

LLogan M

lol

LLogan M

got it working kind of, weaviate kind of sucks when you are uploading a ton of data once lol

LLogan M

but retrieval seemed to work fine for me

LLogan M

although tbh mpnet is a terrible model -- I would probably use BAAI/bge-base-en-v1.5

JJAX

gotcha, thanks a ton, will give it a try

JJAX

ok so did some testing wit the bge one, its way way way worse than mpnet. apparently for our documentation mpnet seems to work magic. i still havent been able to do weaviate + node sentence window. i guess i'll try to use another vector db, im not currently bound to any of them. as always, thanks for the quick help 🙏

Add a reply

Find answers from the community

Hmm that s rather odd I ll try swapping