LlamaIndex

Log inLog into community

Find answers from the community

Updated 6 months ago

For the moment, when creating an index

For the moment, when creating an index

At a glance

·

For the moment, when creating an index with specified fields, and putting documents in it parsed and indexed with the embedding, the llm and llama_index, it works, but the fields different than "content" are not filled. How to make these field filled automatically according to the document given ?

L

R

74 comments

I'm not sure what you mean here 🤔

The coginitive search vector store has a pre-set number of fields that are always populated. This can't be modified unless you change the vector store implemenation

Plain Text

         fields = [
            SimpleField(name=self._field_mapping["id"], type="Edm.String", key=True),
            SearchableField(
                name=self._field_mapping["chunk"],
                type="Edm.String",
                analyzer_name="en.microsoft",
            ),
            SearchField(
                name=self._field_mapping["embedding"],
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                hidden=False,
                searchable=True,
                filterable=False,
                sortable=False,
                facetable=False,
                vector_search_dimensions=self.embedding_dimensionality,
                vector_search_configuration="default",
            ),
            SimpleField(name=self._field_mapping["metadata"], type="Edm.String"),
            SimpleField(
                name=self._field_mapping["doc_id"], type="Edm.String", filterable=True
            ),
        ]

https://github.com/run-llama/llama_index/blob/29ef306ae0536de44840ca5acfdf93d84b9a560c/llama_index/vector_stores/cogsearch.py#L134

Sorry for the misunderstanding

I mean that the vector store created have this form

Attachment

Attachment

so the 4th first fields are filled, but the other one created are not

same when I use your example of your doc

"paul_graham"

the author etc.. fields are empty too

the llama-index implemenation will only fill in 4 fields I listed above. No idea where these other fields are coming from

I thought that they were auto-filled by the llm or whatever

I added them

as you've added yours in your doc

Plain Text

# Example of a complex mapping, metadata field 'theme' is mapped to a differently name index field 'topic' with its type explicitly set
metadata_fields = {
    "author": "author",
    "theme": ("topic", MetadataIndexFieldType.STRING),
    "director": "director",
}

# A simplified metadata specification is available if all metadata and index fields are similarly named
# metadata_fields = {"author", "theme", "director"}

but the director and author fields are not filled automatically, so how to proceed if I want to do something similar without having to do it manually

This is from our docs or something you've done?

So if you want them to be filled in, you need to modify the CognitiveSearchVectorStore to populate them

There is a metadata field that will contain the metadata for your document (which likely has author, theme, director?)

your doc

here:

https://docs.llamaindex.ai/en/stable/examples/vector_stores/CognitiveSearchIndexDemo.html

my document will not contain these elements directly (like a json), but I was hoping the tool will find in the docs the elements specified in the metadata variable and put them in the azure CS

if it does not work like this, then do you know how to proceed ?

Attachment

something seems to be linked here in your doc

Yea idk whats going on, the code for this is extremely complex for some reason LOL

it seems like it should be working

how to add this index mapping if it corresponds to what I want

no it does'nt do all the fields idk why

only the default one specified here

I know it doesn't, I'm saying I don't understand why not

Ow ok sorry

the index mapping seems to not be used in the doc

but in the class it seems to be necessary for this task isn't it ?

Looked at the code a little closer

My best guess is the example notebook will work.

Your example is likely not working because your input documents do not have fields like name_corp in the metadata

You can confirm this by printing document.metadata before adding a document to the index

ok, but the document given by your doc semms also to not working

The author fields, etc are null in my CS site

It will be null when you insert the pg eassy nodes

But when you run this section, it should work
https://docs.llamaindex.ai/en/stable/examples/vector_stores/CognitiveSearchIndexDemo.html#filtering

since it inserts new nodes

with the proper metadata

Ok, sorry I'm very new with Azure CS and this field system

could you explain me how to proceed if I have 6 raw documents of a transcription of a phone conversation for example

and I want to have the fields name of the person in addition with the other one, how should I proceed folowing what you said ? (I don't need the field to be filled for each document so, only for the batch)

cause here idk how to associate my fields and my documents (and their contents)

So each input document, you need to be setting the metadata with the same fields that you passed into filterable_metadata_field_keys

So from what I can see, your document metadata should look something like

Plain Text

>>> print(document.metadata)
{"name_corp": "..", "name_assistant": "..", "topic": "..", "name_ins": "..", "carplate": ".."}

How you set those values is up to you. Those above fields should be set for every input document you want to insert

but I don't have these values

that's why I want the llm to fill these one himself according to the content

So you need to get them. There's no automatic method built in for this

If you want the LLM to fill those in, you'll need to code that 🤔

if it's not possible np, it's just that is does not work as I was thinking

ok thanks for these advice !

I will search further so and maybe code a new thing lol

You might be interested in this!
https://docs.llamaindex.ai/en/stable/examples/metadata_extraction/PydanticExtractor.html#pydantic-extractor

thanks

also, when using the summaryindex

(instead of the vectorstore index, for summary purpose)

When using an, existing index, it show "empty response"

whereas using a new document (for a new index/adding a new document to an existing index) it works

index1 = SummaryIndex(
[],
service_context=service_context,
storage_context=storage_context,
summary_text=summary_text,
response_mode="tree_summarize"
)

like this

it does not work

but like this it works, but it's bad for summaries: index2 = VectorStoreIndex(
[],
service_context=service_context,
storage_context=storage_context,
qa_text=qa_text
)

You need to give it documents or it won't have any data to summarize 👀

Only the vector index pulls data from a vector db

it's in your doc

I can basically retrieve an existing database from cs to ask him

the proof is that it works with the VectorStoreIndex

https://docs.llamaindex.ai/en/stable/examples/vector_stores/CognitiveSearchIndexDemo.html#use-existing-index

as specified here

ow so it will not work with the summarize index ?

but I really need this one, why it won't work ?

Because all the nodes are stored in the vectordb. So you would need a way to get everything from the db and pass it to the summary index (which isn't built in)

The summary index works best when the nodes are in order, so even if you did pull all the data, it would likely not be in the correct order 🤔

so it's not possible to combine cognitive search (and persistent db in general) with the summary index, and just the goal of summarizing a index/a file ?

Nope, since the storage backends are not compatible

If you need a summary, you can create a summary index on the fly and generate the summary at the same time you are inserting into your vector db

so for each document sended, a sumary put in an other index named "summary" ?

and for a classic persistent (maybe local) db, it will also not work ?

Add a reply

Sign up and join the conversation on Discord