Find answers from the community

Updated last year

if you have research papers what s the

At a glance

if you have research papers -> what's the best way to extract and chunk the data? I want the LLM to reference the journal article when providing answers

4 comments

TTeemu

I usually add metadata and increase similarity top k but keep most parameters like chunk size as default. For completions I use CitationQueryEngine which provides in-text citations that map back to the retrieved source nodes

vvalu

do you have any repos?

vvalu

I'm just at the article retrieval stage, wondering whether to extract as markdown or JSON with metadata or something

TTeemu

I don't think I have any public ones that display that, have you tried the quickstart guide? I'd start with that https://gpt-index.readthedocs.io/en/stable/getting_started/starter_example.html

Add a reply