Skip to main content

Full reindexing

Proposal

When performing a full reindex, we're removing all existing records and replacing them with a fresh set. This is ideal when data changes a lot or when there’s no way to track it.

In instances when we feel our data is not in-sync or it is important that it is up to date, then we should perform a full reindex.

To fully reindex our data, we need to use the replaceAllObjects method. This ensures there’s no downtime during the reindex, and our search is always available.

Warning: A full reindex can significantly increase our indexing operations count. It’s also slower than incremental updates, because we're reindexing every single record. This is a powerful way of keeping our data up to date, but we must be mindful of the impact on our indexing pipeline as this has no impact on our operations count.

replaceAllObjects

const client = algoliasearch('YourApplicationID', 'YourWriteAPIKey');

const objects = []; // Fetch your objects

const index = client.initIndex('YourIndexName');

index.replaceAllObjects(objects, { safe: true }).then(({ objectIDs }) => {
console.log(objectIDs);
});

Note: This method doesn't return a response.

Architecture

Below is a proposed flow:

Full Reindexing Flow

Flow

  1. Gitlab triggers a pipeline via a schedule. This will be run via a schedule but can also be triggered manually. More can be found below on scheduled pipelines. This will trigger a reindex of all languages.
  2. The pipeline sends a HTTP request to a REST endpoint in the Hybris API Gateway. This is POST: product/reindex.
  3. This in turn triggers a Step Function in the Hybris Domain. This is responsible for:
  4. Getting all the products from Hybris
  5. Storing temporary product files in the Algolia Temp Product S3
  6. Get all products by language from the temp store. This will be a map on the step function
  7. Saving the complete indexes in the Algolia Product Index S3
  8. Once complete we send an hybrisIndexComplete Event to EventBridge
  9. The Event is picked up by the Product Event Bus in the Platform Domain.
  10. This then triggers a subsequent event platformIndexComplete
  11. There is a rule which triggers a Step Function in the Algolia domain which:
  12. Gets the JSON from the S3 Bucket by language
  13. Passes the body to a Lambda which uses the replaceAllObjects method in the Javascript SDK
  14. The object are sent to Algolia

Scheduled pipelines

Use scheduled pipelines to run GitLab CI/CD pipelines at regular intervals.

To trigger a pipeline schedule manually, so that it runs immediately instead of the next scheduled time:

  1. On the top bar, select Main menu > Projects and find your project.
  2. On the left sidebar, select CI/CD > Schedules.
  3. On the right of the list, for the pipeline you want to run, select Play ().

You can manually run scheduled pipelines once per minute.

Scheduled pipelines execute with the permissions of the user who owns the schedule. The pipeline has access to the same resources as the pipeline owner, including protected environments and the CI/CD job token. This means we are able to track who is executing the full re-index.

Resources