hybrid search is a good approach (i.e. combining vector retrieval with keyword search). And then re-ranking the results. I actually just merged a PR which added BM25, there's a notebook here with a custom hybrid retriever
https://gpt-index.readthedocs.io/en/stable/examples/retrievers/bm25_retriever.html#advanced-hybrid-retriever-re-rankingNode size tends to depend on the embedding model being used. OpenAI embeds 1536 dimensions. I find chunks smaller than 512-ish may not be great? Your mileage may vary
Doesn't matter if some nodes are larger/smaller imo