Find answers from the community

Updated 5 months ago

Text classification?

At a glance

The community member has a list of topics (expenses, profit, risks) and a large 500-page PDF. They want to classify each page of the PDF based on these topics, and are wondering if Llamaindex can help with this task. The comments suggest that using an LLM to classify each page may be slow, and the community members discuss alternatives like Gliner for zero-shot classification. However, there does not appear to be a clear answer on the best approach to solve this problem.

ppikachu8887867

Hi!

I have a list of topics, e.g: [expenses, profit, risks] etc.

I have a big pdf (around 500 pages).

I want to classify each page in that pdf; e.g.:

page-1: [expenses, risks]
page-2: [profit]
page-3: []

etc.

Is it possible to acheve with Llamaindex?

7 comments

ppikachu8887867

@Logan M is there anything in Llamaindex to help me with this?

LLogan M

Eh, unless you wanted the LLM to classify each page (slow af), but could work

I might also look into using something like Gliner to do zero shot classification

ppikachu8887867

Thanks! Never heard of gliner before. Will give it a shot.

So you define entities yourself, right?

LLogan M

Yea, basically you define the entities/labels, and it pulls them out

Actually on second though, gliner is more for token classification, rather than text classification (I think)

LLogan M

So might not work, unless they have an example of text classification

ppikachu8887867

@Logan M True. I was thinking to map each page to labels, If gliner finds any label in a page 🙂

AAshwinMS

Not sure if some smart summrize+vectorizing and doing a similarity_search k_means=3 can pull this off 🙈

Add a reply