Skip to main content

Cleanup of unapproved assets in Cloudinary

Overview

When we move to Cloudinary we want to implement an efficient DAM maintenance process, and keep our DAM lean. This involves either deleting unused assets or moving them to cold storage, with the idea that they will not be accesed frequently but they can be retrieved if they are ever needed again. Currently on the Digial Merch side, when a product is unapproved in Hybris, there is no cleanup process in Amplience. This document outlines how we would like to approach the cleanup process in Hybris/Cloudinary for products that are unapproved (not OOS) that are not planned to be coming back in stock or sold again. On the Digital Brand side there is no cleanup process, however the implementation of the process in Hybris for product imagery will also impact their workflow.

Workflow for unapproval

There is an "unapproved" status in Hybris that is currently used by merchandising for a number of reasons - a product may be out of stock, we may be awaiting stock for a product that is due to launch, or we may no longer be selling a product and so want to remove it from the site and all indexing. In the case where a product is no longer going to be sold, we want to remove the assets associated with that product from Cloudinary, and move them to "cold storage". Cold storage lets you store any amount of infrequently accessed or historical data at a lower cost than other storage tiers.

To separate the unapproved status from out of stock or awaiting stock, we will introduce the use of another existing Hybris status "checked". For temporarily unapproved products - those that are out of stock, awaiting stock or other reasons - we will use the checked status. If a product is no longer going to be sold, the unapproved status should be used and we will implement infrastructure to handle this process

Unapproved asset storage

There are multiple different storage options with AWS, the main 2 suitable classes being:

  • S3 Standard-IA and S3 One Zone-IA
    • IA stands for Infrequently accessed
    • Amazon charges a retrieval fee
    • For storing backups or older data that is access infrequently but still needs millisecond access
  • S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive
    • Low cost data archiving - much cheaper than S3 Standard-IA and S3 One Zone-IA
    • Slower data retrieval, with multiple options across the three classes
    • S3 Glacier Instant Retrieval - For long-lived archive data accessed once a quarter with instant retrieval in milliseconds
    • S3 Glacier Flexible Retrieval (Formerly S3 Glacier) - For long-term backups and archives with retrieval option from 1 minute to 12 hours
    • S3 Glacier Deep Archive - For long-term data archiving that is accessed once or twice in a year and can be restored within 12 hours

We aren't certain that we will ever need to access some of the assets in cold storage ever again, and it would be an edge case if we did, however we need to keep the option of retrieving assets in place and as low cost as possible. Considering we can change the S3 class as and when necessary (bearing in mind minimum storage length requirements) we would suggest to move forward with S3 Glacier Flexible Retrieval. This seems to be the best solution for the settling in period where teams are still adjusting to the new workflow:

Things to bear in mind:

  • Objects that are archived to S3 Glacier Instant Retrieval and S3 Glacier Flexible Retrieval are charged for a minimum storage duration of 90 days, and S3 Glacier Deep Archive has a minimum storage duration of 180 days.
  • Objects deleted prior to the minimum storage duration incur a pro-rated charge equal to the storage charge for the remaining days.
  • Objects that are deleted, overwritten, or transitioned to a different storage class before the minimum storage duration will incur the normal storage usage charge plus a pro-rated charge for the remainder of the minimum storage duration.
  • Objects stored longer than the minimum storage duration will not incur a minimum charge.
  • S3 Glacier Flexible Retrieval Bulk data retrievals and requests are free of charge.

Suggested architecture:

Design for when products are unapproved in Hybris

  1. A product SKU is unapproved in Hybris and a notification is sent to the Hybris API Gateway including the SKU(s) that are unapproved
  2. The API Gateway has a direct integration with Amazon EventBridge and sends the appropriate event to EventBridge
  3. An archive image event is registered in the Image Event Bus which calls the Cloudinary Archive Step Function
  4. The Step Function first calls the Platform Notification Lambda to send a message to Slack to say that assets are going to be unapproved
  5. Next the Step Function calls the Cold Storage Lambda, which moves assets from Cloudinary to the S3 Glacier Flexible Retrieval storage
  6. After the assets have been successfully moved to storage, the Step Function calls the Cloudinary Delete Lambda which uses the Delete method of the Cloudinary Admin API
  7. The Step Function calls the Platform Notification Lambda to send a message to Slack to say that assets have been successfully moved

Recovering assets

Amazon S3 objects that are stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes are not immediately accessible. To access our assets in these storage classes, we will be required to restore a temporary copy of them to their S3 bucket for a specified duration (number of days).

S3 Standard, RRS, S3 Standard-IA, S3 One Zone-IA, S3 Glacier Instant Retrieval, and S3 Intelligent-Tiering objects are available for anytime access.

Restored assets from S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive are stored only for the number of days that we specify. If we require a permanent copy, we need to create a copy in our Amazon S3 bucket. If we don't make a copy, the assets would still be stored in the S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes.

When we restore an archived object, we will pay for both the archive and a copy that you restored temporarily.

Suggested architecture:

Design fr restoring previously unapproved assets

  1. A product SKU is marked for re-approval in Hybris and a notification is sent to the Hybris API Gateway including the SKU(s) for products that are to be recovered
  2. The API Gateway has a direct integration with Amazon EventBridge and sends the appropriate event to EventBridge
  3. A restore image event is registered in the Image Event Bus which calls the Cloudinary Restore Step Function
  4. The Step Function calls the Platform Notification Lambda to send a message to Slack to say list that the assets are going to be recovered
  5. The Step Functions initiates RestoreObject from S3 Glacier Deep Archive. Note that with S3 Glacier Flexible Retrieval, assets cannot be restored in real time. Retrieval takes 5 - 12 hours, unless using the expedited option which has much higher costs.
  6. Step Functions stores the metadata of this retrieval in an Amazon DynamoDB table. 6.1. Sync data from DynamoDB table using Amazon Athena Federated Queries to generate reports dashboard in Amazon QuickSight.
  7. Upon completion, S3 sends the RestoreComplete event to Amazon EventBridge.
  8. EventBridge triggers the Cloudinary Restore Step Function. This time it will take a different branch from step 3.
  9. The Step Function calls the Platform Notification Lambda to send a message to Slack to confirm that assets have are in the process of being restored
  10. The Step Function then calls the Cloudinary Get S3 Restore Object Lambda. This will use a pre-signed url to download and unpack images for the next step
  11. The Step Function then calls the Cloudinary Restore Lambda, which calls Clouindary Upload API and uploads the restored assets to the Media Library 11.1. On successful upload, the assets are deleted from cold storage
  12. Finally, the Step Function calls the Platform Notification Lambda to send a message to Slack to confirm that assets have been restored

Resources