Find answers from the community

Updated last year

Hey Logan M got some weird situation

At a glance
Hey @Logan M - got some weird situation when switching from the typical VectorStoreIndex to a Vector DB (tried with Chroma & FAISS for the moment). When writing the embeddings to the vectorstore, after about 1000 embeddings being calculated, I get this:

Generating embeddings: 2%
1020/44049 [00:14<07:35, 94.40it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-15-b41c66c4891b> in <cell line: 3>()
1 vector_store = FaissVectorStore(faiss_index=faiss_index)
2 storage_context = StorageContext.from_defaults(vector_store=vector_store)
----> 3 index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=ctx, show_progress=True)

14 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/mpnet/modeling_mpnet.py in compute_position_bias(self, x, position_ids, num_buckets)
376
377 rp_bucket = self.relative_position_bucket(relative_position, num_buckets=num_buckets)
--> 378 rp_bucket = rp_bucket.to(x.device)
379 values = self.relative_attention_bias(rp_bucket)
380 values = values.permute([2, 0, 1]).unsqueeze(0)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
J
L
11 comments
This is the code that I have for FAISS:

d = 768
faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=ctx, show_progress=True)
The nodes are extracted this way:

node_parser = SentenceWindowNodeParser.from_defaults(
window_size=25,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
simple_node_parser = SimpleNodeParser.from_defaults()

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = HuggingFaceEmbedding(
model_name="sentence-transformers/all-mpnet-base-v2"
)
ctx = ServiceContext.from_defaults(
llm=llm,
embed_model=embed_model,
)

nodes = node_parser.get_nodes_from_documents(all_docs)
I am running all of this in Google Colab (free version) with T4 GPU.
It happened with both CPU and GPU runtimes at exactly the same number
for mpnet, manually set the max length
Plain Text
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2",
    max_length=512
)
oh ๐Ÿ˜ฎ
mpnet is the only model I've noticed where this is needed lol very strange
fantastic, i'll give it a try and let you know. as always, thank you for the quick answer : )
yup, works like a charm, thanks @Logan M
Add a reply
Sign up and join the conversation on Discord