This community is probably the experts on this, what methods are there to do information retrieval on highly noisy sources like chat channels (Slack, Discord, etc)? Due to the small amount of text, lack of context, and noise - embedding models don't seem to do well at all with these sources.
Also with most embedding models, they tend to match very closely to short passages compared to longer passages so it matches slack messages more than other sources.
We considered doing summaries but passing every message through an LLM is expensive