2) What index I should use for example,

Leang · 2024-03-29T11:48:38.803Z

2) What index I should use for example, VectorStoreIndex,TreeIndex, GPTKeywordTableIndex.

I would suggest you go through the documentations regarding different types of indexes and then choose the one that fits your case.
https://docs.llamaindex.ai/en/stable/module_guides/indexing/

LLeang

Currently, I write the code like this because I also want to filter specify data in .csv file but I am not sure it is practical ? pdf_nodes = parser.get_nodes_from_documents(list_of_documents)
index = VectorStoreIndex(list_of_documents)

LLeonardo Oliva

you're not filtering here

LLeonardo Oliva

if u want to do postprocessing filtering, you can use https://docs.llamaindex.ai/en/stable/module_guides/querying/node_postprocessors/node_postprocessors/?h=#keywordnodepostprocessor

LLeonardo Oliva

this can filter out nodes based on keywords

LLeang

@Leonardo Oliva This is my full code but I am sure it can be applied your method? df_Perf = pd.read_csv('/content/gdrive/My Drive/llama_index/VW_T_LEAVE_DETAIL.csv')
list_of_documents = []
for i in range(len(df_Perf)):
list_of_documents.append(Document().encode)
pdf_nodes = parser.get_nodes_from_documents(list_of_documents)

index = VectorStoreIndex(list_of_documents)
filters = MetadataFilters(filters=[
ExactMatchFilter(
key="EMP_CODE",
value='1002809'
),
])

query_engine = index.as_query_engine(similarity_top_k=2,
filters = filters,
streaming=True)

LLeonardo Oliva

Well this is the same 🙂

LLeonardo Oliva

I think you can use both

LLeonardo Oliva

I love the postprocessors because they make the flow more "logic" : read, ingest, postprocess

LLeang

@Leonardo Oliva Thanks you, however, the result is incorrect answer when I asked some information in .csv file. Can you give me the advice what I should check it?

LLeonardo Oliva

Well, there are a lot of things to check

LLeonardo Oliva

How are you Splitting your text? What's your retriveal strategy? Which LLM are you using?

LLeang

@Leonardo Oliva How are you Splitting your text? =>> SentenceSplitter (chunk_size = 512) What's your retriveal strategy? (I don't understand this question), Which LLM are you using? =>> gpt-3.5-turbo, model ='text-embedding-3-small, max_tokens=512

WWhiteFang_Jr

I would suggest to use the readers that I shared in other thread. They prepare nodes based on each row which makes it better while querying.

LLeang

@WhiteFang_Jr Can you share it directs to because I just joins discourd today and I don't know thread that you say

WWhiteFang_Jr

https://discord.com/channels/1059199217496772688/1223236860726218793/1223237450957328474

Find answers from the community

2) What index I should use for example,