Integration Tests

At a glance

The community members are discussing the use of integration tests, rather than just unit tests, to ensure the robustness and safety of new features in the TypeScript (TS) side of the project. They suggest using an OpenAI language model (LLM) along with local data in the library and new features like the RouterQueryEngine.

The comments suggest using llama.cpp as a potentially cheaper alternative, and that integration tests can help users better understand and replicate the system's behavior. There is also discussion around caching LLM responses to reduce costs and maintain realistic test data, as well as the challenges of testing vector databases like Weaviate or Pinecone.

The community members are collaborating on a prototype integration test using Jest, starting with the /Selector module and RouterQueryEngine. They are exploring a Redis-based caching solution to handle the LLM and embedding model responses, and discussing the logistics of running integration tests separately from unit tests.

kkkang2097

Trying to make new features robustly and safely for the TS side. However, I don't think just unit testing is the best way. (Mocks just give a false sense of security in this case) How about integration tests using an OpenAI LLM + local data in the library + new feature (ie. RouterQueryEngine, etc)?

22 comments

EEmanuel Ferreira

Sounds cool, maybe with llama.cpp can be cheaper

EEmanuel Ferreira

And integration tests helps better to the users understand better what's going and how to replicate

EEmanuel Ferreira

less tolerant to failures, and help better to make sure that everything is working together

EEmanuel Ferreira

Starting with the main engines can be very interesting

kkkang2097

llama.cpp is not a dependency yet on LlamaIndexTS so I'll just start with OpenAI, token usage should be minimal. Thanks!

EEmanuel Ferreira

@Logan M @Yi Ding WDYTK?

EEmanuel Ferreira

Integration Tests

EEmanuel Ferreira

cc: @sourabhd too

LLogan M

@kkang2097 I agree 100% -- it's been something I've wanted to improve on the python side for a long time. Even things like vector stores are basically untested at the moment

The issue is these types of tests need to a) Make outbound API calls and b) tests get run on every PR, and I don't think we want to be calling openai on every CI run on github

For LLM calls, I wonder if it's possible to setup tests, run once locally, and cache the LLM responses

Then, we have realistic LLM outputs to test with, low costs, and we can refresh the cache every so often

EEmanuel Ferreira

Amazing, I was thinking about run the CI tests (calling the external services) only on the releases, so before publish you can have a real test/results

EEmanuel Ferreira

But your idea sounds better

EEmanuel Ferreira

Because you probably will want to test the vector databases and etc as well

kkkang2097

I'll come up with a prototype integration test based on your suggestions, the inaugural integration test on the TS side could be for the /Selector module (and subsquently RouterQueryEngine).

The code is all ready to go, just needs the right tests in case we refactor later.

A good starting point is making a Jest integration test and going from there. (integration tests and unit tests can be run separately in Jest)

kkkang2097

There's a "SimpleVectorStore"/"SimpleVectorStoreIndex" on the TS side that's a bit hacky, but we can do that locally every time

LLogan M

Yea vector dbs are harder in this cases. We can test the base vector store easily because it's in memory. But for testing something like weaviate or pinecone, it's a little more complicated

LLogan M

This makes sense. I think ideally there'd be some way of running these tests where, if your API key is present or some flag is set, create a response cache during the tests, otherwise, pull from the response cache. Or something like that anyways 🤔 The most work will be figuring out how to maintain the cache properly

If you have an approach for this on the TS side, I can copy it for python 😆

kkkang2097

On the non-local vector store/reader side, I think we'll be okay. A contributor can
-> create a reader/VectorStore
-> write a test initializing a dummy collection (with that library's NodeJS/TS driver)
-> do an integration test

The LLM cache part would be tricky, I'll try a few different things and let you guys know. On the bright side, the TS side is minimal enough right now that we can make sweeping changes without too many people complaining 😂

kkkang2097

edit: This mocking stuff is a mess in Jest, re-looking at this issue
edit 2: forget mocks completely, working on cache idea

LLogan M

Lol yeaaa that was kind of my thought too

For caching, I had the idea of a CacheLLM object that either reads from a cache or creates a cache as it runs

kkkang2097

Yeah I'm working on an implementation in Redis.


export class RedisCacheLLM implements LLM {

  llm: LLM;
  to_cache: boolean;

  constructor(llm: LLM, to_cache: boolean){
    this.llm = llm;
  }

  async chat(messages: ChatMessage[]): Promise<ChatResponse> {
    //Same logic as complete()
  }

  async complete(prompt: string): Promise<CompletionResponse> {
    //Check the flag
    if(this.to_cache === true){
    //Try cache lookup. If we don't have it, query and store
    }
    //If we aren't using the cache, behave like normal
    return item;
  }
}

Can do a similar thing for embedding models too. I realized we only need to cache LLM.complete() and embedModel.getEmbedding
Then our Hash store would have {TextChunk: Embedding} and {Prompt: Completion.message.content} instances

LLogan M

can a redis cache be checked into github? Or is this a server we would have to host outside of llama-index? 👀

kkkang2097

I'll get the rest of the details together soon, but it should be local most likely. Also need to separate various tests (with some TSconfig black magic):

unit tests: run every commit
integration tests: not sure how often to run these, but a main Redis store should let everyone run these tests freely

~~For the individual contributor this shouldn't matter, they're only running single integration tests. It's Mr. Ding's Macbook we should worry about 😆~~

We can worry about scale later, integration tests are relatively small.

Scale of Redis Store

I'll profile this and get back

Add a reply

Find answers from the community

Integration Tests