Find answers from the community

Updated 3 months ago

For the moment, when creating an index

For the moment, when creating an index with specified fields, and putting documents in it parsed and indexed with the embedding, the llm and llama_index, it works, but the fields different than "content" are not filled. How to make these field filled automatically according to the document given ?
L
R
74 comments
I'm not sure what you mean here πŸ€”

The coginitive search vector store has a pre-set number of fields that are always populated. This can't be modified unless you change the vector store implemenation

Plain Text
         fields = [
            SimpleField(name=self._field_mapping["id"], type="Edm.String", key=True),
            SearchableField(
                name=self._field_mapping["chunk"],
                type="Edm.String",
                analyzer_name="en.microsoft",
            ),
            SearchField(
                name=self._field_mapping["embedding"],
                type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
                hidden=False,
                searchable=True,
                filterable=False,
                sortable=False,
                facetable=False,
                vector_search_dimensions=self.embedding_dimensionality,
                vector_search_configuration="default",
            ),
            SimpleField(name=self._field_mapping["metadata"], type="Edm.String"),
            SimpleField(
                name=self._field_mapping["doc_id"], type="Edm.String", filterable=True
            ),
        ]


https://github.com/run-llama/llama_index/blob/29ef306ae0536de44840ca5acfdf93d84b9a560c/llama_index/vector_stores/cogsearch.py#L134
Sorry for the misunderstanding
I mean that the vector store created have this form
so the 4th first fields are filled, but the other one created are not
same when I use your example of your doc
"paul_graham"
the author etc.. fields are empty too
the llama-index implemenation will only fill in 4 fields I listed above. No idea where these other fields are coming from
I thought that they were auto-filled by the llm or whatever
I added them
as you've added yours in your doc
Plain Text
# Example of a complex mapping, metadata field 'theme' is mapped to a differently name index field 'topic' with its type explicitly set
metadata_fields = {
    "author": "author",
    "theme": ("topic", MetadataIndexFieldType.STRING),
    "director": "director",
}

# A simplified metadata specification is available if all metadata and index fields are similarly named
# metadata_fields = {"author", "theme", "director"}
but the director and author fields are not filled automatically, so how to proceed if I want to do something similar without having to do it manually
This is from our docs or something you've done?
So if you want them to be filled in, you need to modify the CognitiveSearchVectorStore to populate them

There is a metadata field that will contain the metadata for your document (which likely has author, theme, director?)
my document will not contain these elements directly (like a json), but I was hoping the tool will find in the docs the elements specified in the metadata variable and put them in the azure CS
if it does not work like this, then do you know how to proceed ?
something seems to be linked here in your doc
Yea idk whats going on, the code for this is extremely complex for some reason LOL
it seems like it should be working
how to add this index mapping if it corresponds to what I want
no it does'nt do all the fields idk why
only the default one specified here
I know it doesn't, I'm saying I don't understand why not
the index mapping seems to not be used in the doc
but in the class it seems to be necessary for this task isn't it ?
Looked at the code a little closer

My best guess is the example notebook will work.

Your example is likely not working because your input documents do not have fields like name_corp in the metadata

You can confirm this by printing document.metadata before adding a document to the index
ok, but the document given by your doc semms also to not working
The author fields, etc are null in my CS site
It will be null when you insert the pg eassy nodes

But when you run this section, it should work
https://docs.llamaindex.ai/en/stable/examples/vector_stores/CognitiveSearchIndexDemo.html#filtering
since it inserts new nodes
with the proper metadata
Ok, sorry I'm very new with Azure CS and this field system
could you explain me how to proceed if I have 6 raw documents of a transcription of a phone conversation for example
and I want to have the fields name of the person in addition with the other one, how should I proceed folowing what you said ? (I don't need the field to be filled for each document so, only for the batch)
cause here idk how to associate my fields and my documents (and their contents)
So each input document, you need to be setting the metadata with the same fields that you passed into filterable_metadata_field_keys

So from what I can see, your document metadata should look something like

Plain Text
>>> print(document.metadata)
{"name_corp": "..", "name_assistant": "..", "topic": "..", "name_ins": "..", "carplate": ".."}


How you set those values is up to you. Those above fields should be set for every input document you want to insert
but I don't have these values
that's why I want the llm to fill these one himself according to the content
So you need to get them. There's no automatic method built in for this

If you want the LLM to fill those in, you'll need to code that πŸ€”
if it's not possible np, it's just that is does not work as I was thinking
ok thanks for these advice !
I will search further so and maybe code a new thing lol
also, when using the summaryindex
(instead of the vectorstore index, for summary purpose)
When using an, existing index, it show "empty response"
whereas using a new document (for a new index/adding a new document to an existing index) it works
index1 = SummaryIndex(
[],
service_context=service_context,
storage_context=storage_context,
summary_text=summary_text,
response_mode="tree_summarize"
)
it does not work
but like this it works, but it's bad for summaries: index2 = VectorStoreIndex(
[],
service_context=service_context,
storage_context=storage_context,
qa_text=qa_text
)
You need to give it documents or it won't have any data to summarize πŸ‘€
Only the vector index pulls data from a vector db
it's in your doc
I can basically retrieve an existing database from cs to ask him
the proof is that it works with the VectorStoreIndex
as specified here
ow so it will not work with the summarize index ?
but I really need this one, why it won't work ?
Because all the nodes are stored in the vectordb. So you would need a way to get everything from the db and pass it to the summary index (which isn't built in)

The summary index works best when the nodes are in order, so even if you did pull all the data, it would likely not be in the correct order πŸ€”
so it's not possible to combine cognitive search (and persistent db in general) with the summary index, and just the goal of summarizing a index/a file ?
Nope, since the storage backends are not compatible

If you need a summary, you can create a summary index on the fly and generate the summary at the same time you are inserting into your vector db
so for each document sended, a sumary put in an other index named "summary" ?
and for a classic persistent (maybe local) db, it will also not work ?
Add a reply
Sign up and join the conversation on Discord