Research Buddy || Streamlit LLM Hackatho...

BBpHigh

Hey Everyone built an AI Research Buddy using Nougat, a much needed piece of tech I needed to stay updated with the fast pace of AI research these days.
AI Research Buddy is your discussion budy when it comes to Research papers on Arxiv or ACL Anthology powered by the recently released Meta's Nougat Transformer.
Video:- https://www.youtube.com/watch?v=c3Ae7DBuE-U
Github:- https://github.com/bp-high/research_buddy
Streamlit Link:- https://gpt-research-buddy.streamlit.app/
HF Spaces Link:- https://huggingface.co/spaces/bpHigh/AI-Research-Buddy
Open to any feedback. This was built for the StreamLit LLM Hackathon (https://streamlit.io/community/llm-hackathon-2023)
Tech Stack:-
a) Platforms:-ClarifAI, Modal
b)Frameworks:- Streamlit, LLamaIndex, Langchain(for integration with ClarifAI),
c) Models:- GPT-3.5-turbo, OpenAI Embeddings, Nougat Transformer(https://facebookresearch.github.io/nougat/) and ClarifAI multilingual moderation model

Just another day building apps leveraging LLamaIndex🙂. CC:- would love your feedback on this.

3 comments

jjerryjliu0

amazing! did you want feedback or want me to help promote on socials?

BBpHigh

Well kinda both but right now feedback would help a lot, I am currently writing a blog on how I built it so might need social promotion after that's done. Currently it has decent performance on the questions but I think I am not able to get the max out of the data extracted by Nougat in the insights and Q&A mostly it is related to the fact that for both Insights and Q&A I leverage the Vector Index and it is currently not able to get the table chunks as they are in latex format and are not selected for the respective top-k values by the retriever. For insights I query to get the key contributions of the paper and key results(earlier I was asking for getting the result tables but as the tables are in latex format those chunks were not getting selected with the respective top-k) and have set a larger top-k whereas for Q&A user can enter their query and it is processed using the chat engine built on top the query engine with lower top-k value. I wanted to understand simpler steps on how can I improve the retriever to select and send the table related latex chunks when asked with the insights query maintaining the same top-k values before I go towards any other embedding models or finetuning.

BBpHigh

Hey @jerryjliu0 won the streamlit llm hackathon, really thankfully to the developers at LLamaIndex for building and mantaining this library.
Was a little busy last week gonna post my blog on how I built the app and how the different libraries and frameworks helped me build faster soon.

https://www.linkedin.com/posts/bhavishpahwa_llm-hackathon-streamlit-activity-7115524495343255552-z82h?utm_source=share&utm_medium=member_desktop

Add a reply

Find answers from the community

Research Buddy || Streamlit LLM Hackatho...