Hi, I am new to llama index and trying

Hi,

If you want to create document for each paragraph separately you can either use Textnode class or a wrapper over the Textnode class in the form of Document

Plain Text

from llama_index.schema import TextNode

para_1 = TextNode(text="Your text here")

# To insert this into a index
index = VectorStoreIndex([para_1])

How are you generating embeddings here?

I am using AzureOpenAIEmbedding. Now whenever I am calling VectorStoreIndex with multiple documents, it is giving me error.

Can you share your code for the same, if possible?

Plain Text

from llama_index.embeddings import AzureOpenAIEmbedding

embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="textembeddingada002",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
    
)

from llama_index import set_global_service_context

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)

set_global_service_context(service_context)

documents = SimpleDirectoryReader(
    input_files=["./data/paul_graham_essay.txt"]
).load_data()

index = VectorStoreIndex.from_documents(documents)

On the last line, I get

" Too many inputs. The max number of inputs is 1. We hope to increase the number of inputs per request soon...."

Code looks fine, Can you check what do you get in the documents variable

Plain Text

[Document(id_='84d29648-c597-407c-968a-443924ebf956', embedding=None, metadata={'file_path': 'data/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2023-12-11', 'last_modified_date': '2023-12-11', 'last_accessed_date': '2023-12-11'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, hash='5dfe27179663d7ae4c02bbb134d50a62143a55545a90f194a20454deb5df5901', text='\n\nWhat I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, ...................Jessica Livingston, Robert Morris, and Harj Taggar for reading drafts of this.\n', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')]

I omited the text here. It is text sample from llama website only. Multiple paragraphs.

It is single Document element here. ^
When I just keep a single paragraph here, it works fine.

Do I have to define chunk_size or similar value somewhere?

No default is fine, Can you share your full error ? as the code looks alright to me

Also what is the version that you are trying with

This is classic azure, set embed_batch_size=1 in the embedding model

Oh, is that so. I will try out this, Thanks!
Anyway, is there any documentation available on AzureOpenAIEmbedding class?

Not really documentation, but an example!

https://docs.llamaindex.ai/en/stable/examples/customization/llms/AzureOpenAI.html

Yes I saw this. But looking for documentation with all the parameters and explanations.

Best bet is api docs I guess

https://docs.llamaindex.ai/en/stable/api_reference/llms/azure_openai.html

But tbh api docs suck, reading source code is more informative lol

Attachment

True, thanx!

Is it the case that when a document/text is large enough, the AzureOpenAIEmbedding object will break the text into multiple batches and create embeddings then?
Context: While using embed_batch_size=1 with a small document length, it works fine. When I put in a large text document, thenembedding generation fails again with "Too many requests" error.

Creating nodes from the documents is right approch in this case?