Find answers from the community

Updated 2 years ago

Hey What would you recommend as best way

At a glance

Hey! What would you recommend as best way to index a website?
I am using bs4 to crawl and format documents to SimpleVectorStore and chatGPT for llm_predictor but the results are sub optimal for bigger websites.
-> Often the answer is not found from context that chatGPT receives.

8 comments

LLogan M

Are you using the latest llama_index and langchain versions? There are some ChatGPT specific improvements in both. There's also a small demo here: https://github.com/jerryjliu/gpt_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb

If you are still encountering issues, you might need to increase similarity_top_k in your query call

aadrianlee2220

so increasing similarity_top_k loads more than one context found from embeddings?

aadrianlee2220

yeah I updated to 0.4.22 today, will check langchain also

aafewell

I am experimenting with similar now. One thing I didnt expect was, @dagthomas shared a post yesterday with chunk size limit set to 600, and while I need to test a lot more, the smaller token size has been the most impactful parameter so far. I havent tried knowledge graph yet although I am about to try that one. The playground code looks especially useful for testing different combinations: https://github.com/jerryjliu/gpt_index/blob/main/examples/playground/PlaygroundDemo.ipynb

aafewell

Keep in mind, smaller chunk size can be more expensive as, embeddings have a fixed size output. So if you stuff 4k tokens in an embedding versus 600, you need more embeddings per quantity of input data and accordingly more vectors to store.

aadrianlee2220

Hey that playground code looks very useful to trying to figure this out, thanks mate

LLogan M

Exactly! By default, it's set to one

aafewell

@adrianlee2220 I have been eager to try out the playground myself but havent had a chance yet and it may be a few days before I can, if you get the chance to share your experience or any pointers, I would be grateful!

Add a reply