in ElasticsearchStore#async_add, why we do async

bbcwong

@kapa.ai in ElasticsearchStore#async_add, why we do async_bulk add two times: async with self.client as client:
await async_bulk(client, requests, chunk_size=self.batch_size, refresh=True)
try:
success, failed = await async_bulk(
client, requests, stats_only=True, refresh=True
)
logger.debug(
f"Added {success} and failed to add {failed} texts to index"
)

logger.debug(f"added texts {ids} to index")
return return_ids
except BulkIndexError as e:
logger.error(f"Error adding texts: {e}")
firstError = e.errors[0].get("index", {}).get("error", {})
logger.error(f"First error reason: {firstError.get('reason')}")
raise

5 comments

kkapa.ai

Please stand by @bcwong as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
The async_bulk function is called twice in the async_add method of the ElasticsearchStore class in LlamaIndex. The first call is to actually perform the bulk indexing operation, where the data is sent to the Elasticsearch index. This operation is performed with the refresh=True option, which means that the changes made by this operation will be immediately visible to search.

The second call to async_bulk is made with the stats_only=True option. This means that this call will not actually perform any indexing operation, but instead return statistics about what would happen if the operation was performed. In this case, it's used to get the number of successful and failed operations from the previous bulk indexing operation.

This is done to provide detailed logging information about the indexing operation, such as the number of texts successfully added and failed to add to the index. If any error occurs during the bulk indexing operation, it is caught and logged, and the first error reason is also logged for debugging purposes.

Here is the relevant code snippet from the knowledge source ():

```python
async with self.client as client:
await async_bulk(client, requests, chunk_size=self.batch_size, refresh=True)
try:
success, failed = await async_bulk(
client, requests, stats_only=True, refresh=True
)
logger.debug(
f"Added {success} and failed to add {failed} texts to index"
)

kkapa.ai

logger.debug(f"added texts {ids} to index")
return return_ids
except BulkIndexError as e:
logger.error(f"Error adding texts: {e}")
firstError = e.errors[0].get("index", {}).get("error", {})
logger.error(f"First error reason: {firstError.get('reason')}")
raise
```
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

kkapa.ai

Sorry @bcwong, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist:

Perhaps try again in 15mins?

LLogan M

looks like a typo/bug to me

Add a reply

Find answers from the community

in ElasticsearchStore#async_add, why we do async_bulk add two times: async with se