Hey Logan M got some weird situation

At a glance

Hey @Logan M - got some weird situation when switching from the typical VectorStoreIndex to a Vector DB (tried with Chroma & FAISS for the moment). When writing the embeddings to the vectorstore, after about 1000 embeddings being calculated, I get this:

Generating embeddings: 2%
1020/44049 [00:14<07:35, 94.40it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-15-b41c66c4891b> in <cell line: 3>()
1 vector_store = FaissVectorStore(faiss_index=faiss_index)
2 storage_context = StorageContext.from_defaults(vector_store=vector_store)
----> 3 index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=ctx, show_progress=True)

14 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/mpnet/modeling_mpnet.py in compute_position_bias(self, x, position_ids, num_buckets)
376
377 rp_bucket = self.relative_position_bucket(relative_position, num_buckets=num_buckets)
--> 378 rp_bucket = rp_bucket.to(x.device)
379 values = self.relative_attention_bias(rp_bucket)
380 values = values.permute([2, 0, 1]).unsqueeze(0)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

11 comments

JJAX

This is the code that I have for FAISS:

d = 768
faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=ctx, show_progress=True)

JJAX

The nodes are extracted this way:

node_parser = SentenceWindowNodeParser.from_defaults(
window_size=25,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
simple_node_parser = SimpleNodeParser.from_defaults()

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-mpnet-base-v2"
)
ctx = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
)

nodes = node_parser.get_nodes_from_documents(all_docs)

JJAX

I am running all of this in Google Colab (free version) with T4 GPU.

JJAX

It happened with both CPU and GPU runtimes at exactly the same number

LLogan M

for mpnet, manually set the max length

LLogan M

Plain Text

embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2",
    max_length=512
)

JJAX

oh 😮

LLogan M

mpnet is the only model I've noticed where this is needed lol very strange

JJAX

fantastic, i'll give it a try and let you know. as always, thank you for the quick answer : )

JJAX

yup, works like a charm, thanks @Logan M

LLogan M

nice!

Add a reply

Find answers from the community

Hey Logan M got some weird situation