Find answers from the community

Updated 2 months ago

Hi I see a very strange problem when

Hi, I see a very strange problem when parsing a text of ~60,000 symbols - this code (the from_documents function) never ends, I can't figure out why, it worked very well till today (even with much bigger documents):
Plain Text
document = Document(text)
document.doc_id = data_source_id
service_context = ServiceContext.from_defaults(chunk_size=chunk_size)#1024)
vector_store = QdrantVectorStore(client=get_qrant_client(), collection_name=project_id)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
    [document], storage_context=storage_context, service_context=service_context
)

If I go into the function from_documents, this code is hanging:
Plain Text
            return cls(
                nodes=nodes,
                storage_context=storage_context,
                service_context=service_context,
                **kwargs,
            )
L
S
18 comments
How long did you try waiting? Just curious?
10-20 minutes or more. Usually for texts of such length, it takes no more than a minutes, usually much less
I see you looked at where it's hanging. Do you know what file that line of code was from?
That's definitely a long time to not be working
It's from BaseIndex/base.py, line 97
Actually, I suspect the text itself but I can't figure out what's wrong with it. Maybe there are some prohibited symbols or something like that?
Hmmm maybe? Is it in a different language or from a problematic data source?
It's in English, scraped from one website, nothing usual at the first glance
Maybe as a sanity test, does this code run well for you?

Plain Text
from llama_index.embeddings import OpenAIEmbedding

embed_model = OpenAIEmbedding()

embeddings = embed_model.get_text_embedding("It is raining cats and dogs here!")
print(len(embeddings))
Interesting, let me try, do you want me to test with that text or with this one?
Hmmm I have the exception "cannot import name 'OpenAIEmbedding' from 'llama_index.embeddings'"
I guess should be llama_index.embeddings.openai πŸ˜‰
Okay, it worked, the output: 1536
Nice! That means at least the embeddings can be generated πŸ˜…
And it's not some other problem related to servers or keys
Which server? It's working with any other text, just doesn't with that "problematic" one
keys can't be involved either
Openai servers I meant, like there's no specific network issue with them
Ahhh okay... right sorry didn't see you "not"πŸ˜† Let me check the text, maybe it really has some suspicious symbols
Add a reply
Sign up and join the conversation on Discord