Find answers from the community

Updated 2 months ago

Text classification?

Hi!

I have a list of topics, e.g: [expenses, profit, risks] etc.

I have a big pdf (around 500 pages).

I want to classify each page in that pdf; e.g.:

page-1: [expenses, risks]
page-2: [profit]
page-3: []

etc.

Is it possible to acheve with Llamaindex?
p
L
A
7 comments
@Logan M is there anything in Llamaindex to help me with this?
Eh, unless you wanted the LLM to classify each page (slow af), but could work

I might also look into using something like Gliner to do zero shot classification
Thanks! Never heard of gliner before. Will give it a shot.

So you define entities yourself, right?
Yea, basically you define the entities/labels, and it pulls them out

Actually on second though, gliner is more for token classification, rather than text classification (I think)
So might not work, unless they have an example of text classification
@Logan M True. I was thinking to map each page to labels, If gliner finds any label in a page πŸ™‚
Not sure if some smart summrize+vectorizing and doing a similarity_search k_means=3 can pull this off πŸ™ˆ
Add a reply
Sign up and join the conversation on Discord