Sounds cool, maybe with llama.cpp can be cheaper
And integration tests helps better to the users understand better what's going and how to replicate
less tolerant to failures, and help better to make sure that everything is working together
Starting with the main engines can be very interesting
llama.cpp is not a dependency yet on LlamaIndexTS so I'll just start with OpenAI, token usage should be minimal. Thanks!
@kkang2097 I agree 100% -- it's been something I've wanted to improve on the python side for a long time. Even things like vector stores are basically untested at the moment
The issue is these types of tests need to a) Make outbound API calls and b) tests get run on every PR, and I don't think we want to be calling openai on every CI run on github
For LLM calls, I wonder if it's possible to setup tests, run once locally, and cache the LLM responses
Then, we have realistic LLM outputs to test with, low costs, and we can refresh the cache every so often
Amazing, I was thinking about run the CI tests (calling the external services) only on the releases, so before publish you can have a real test/results
But your idea sounds better
Because you probably will want to test the vector databases and etc as well
I'll come up with a prototype integration test based on your suggestions, the inaugural integration test on the TS side could be for the /Selector module (and subsquently RouterQueryEngine).
The code is all ready to go, just needs the right tests in case we refactor later.
A good starting point is making a Jest integration test and going from there. (integration tests and unit tests can be run separately in Jest)
There's a "SimpleVectorStore"/"SimpleVectorStoreIndex" on the TS side that's a bit hacky, but we can do that locally every time
Yea vector dbs are harder in this cases. We can test the base vector store easily because it's in memory. But for testing something like weaviate or pinecone, it's a little more complicated
This makes sense. I think ideally there'd be some way of running these tests where, if your API key is present or some flag is set, create a response cache during the tests, otherwise, pull from the response cache. Or something like that anyways π€ The most work will be figuring out how to maintain the cache properly
If you have an approach for this on the TS side, I can copy it for python π
On the non-local vector store/reader side, I think we'll be okay. A contributor can
-> create a reader/VectorStore
-> write a test initializing a dummy collection (with that library's NodeJS/TS driver)
-> do an integration test
The LLM cache part would be tricky, I'll try a few different things and let you guys know. On the bright side, the TS side is minimal enough right now that we can make sweeping changes without too many people complaining π
edit: This mocking stuff is a mess in Jest, re-looking at this issue
edit 2: forget mocks completely, working on cache idea
Lol yeaaa that was kind of my thought too
For caching, I had the idea of a CacheLLM
object that either reads from a cache or creates a cache as it runs
Yeah I'm working on an implementation in Redis.
export class RedisCacheLLM implements LLM {
llm: LLM;
to_cache: boolean;
constructor(llm: LLM, to_cache: boolean){
this.llm = llm;
}
async chat(messages: ChatMessage[]): Promise<ChatResponse> {
//Same logic as complete()
}
async complete(prompt: string): Promise<CompletionResponse> {
//Check the flag
if(this.to_cache === true){
//Try cache lookup. If we don't have it, query and store
}
//If we aren't using the cache, behave like normal
return item;
}
}
- Can do a similar thing for embedding models too. I realized we only need to cache
LLM.complete()
and embedModel.getEmbedding
- Then our Hash store would have
{TextChunk: Embedding}
and {Prompt: Completion.message.content}
instances
can a redis cache be checked into github? Or is this a server we would have to host outside of llama-index? π
I'll get the rest of the details together soon, but it should be local most likely. Also need to separate various tests (with some TSconfig black magic):
- unit tests: run every commit
- integration tests: not sure how often to run these, but a main Redis store should let everyone run these tests freely
For the individual contributor this shouldn't matter, they're only running single integration tests. It's Mr. Ding's Macbook we should worry about πWe can worry about scale later, integration tests are relatively small.
Scale of Redis Store- I'll profile this and get back