Skip to main content

Microservice Production Ready Checklist

Stability

StepDescriptionLevel CLevel BLevel A
Unit testsIt has unit tests. And the unit tests are running in a CI system and passing.
Development CycleWe have a stable and reliable development cycle including code reviews, build systems and deployment pipelines.
Threshold test coverageIts test coverage is over 60%.
High test coverageIts test coverage is over 80%.
Config in env-varIts config can be overridden via environment variable.
Depreciation ProceduresWe have a depreciation run-book. Must include appropriate alerting and guide for updating clients.

Reliability

StepDescriptionLevel CLevel BLevel A
Automated BuildIts automated build process is running in CI/CD system.
Automatic DeployIts automated deploy process is running in CI/CD system.
DependenciesIts dependencies are automatically/continuously updated and fixed when they are out of date or vulnerable.

Scalability

StepDescriptionLevel CLevel BLevel A
Manual ScaleIt can be manually scaled horizontally to handle changes in workload.
Auto ScaleIt automatically scales horizontally to handle fluctuating workloads.
CPU Req/LimitIts CPU limit and request are set as described in the Resource Requests and Limits documentation.
Memory Req/LimitIts memory resource request value is as same as limit value.
Capacity PlanningIt can handle the expected load: either load test has been performed, or the expected traffic is under control.
Deployment DowntimeIts deploy process does not cause service degradation or downtime (e.g. error rate does not increase during deploy).
Graceful DegradationIt keeps working, at least partially, while dependencies (e.g. other service or database) are not working partially or completely.
RetriesIt performs smart retries when interacting with dependencies (e.g. other services or database).

Fault Tolerance and Catastrophe-Preparedness

StepDescriptionLevel CLevel BLevel A
Identify failure scenarios and plan mitigationPotential catastrophes and failure scenarios are identified and planned for.
Single points of failure are identified and resolvedSingle points of failure are identified and resolved.
Failure detection and remediation strategies are in placeFailure detection and remediation strategies are in place.
Load testingLoad tests are automated or occur on a regular cadence. We should document the results.
Stress testingLoad tests are automated or occur on a regular cadence. We should document the results.
Chaos testingOnce the applications have proven the ability to stand up to load and stress, chaos testing is integrated to identify weak points and opportunities to reduce failures.
IncidentIncidents and outages are handled appropriately and productively.

Performance

StepDescriptionLevel CLevel BLevel A
Appropriate service-level agreements (SLAs) for availabilityMakes sure these are actually achievable with the current size of our team.
Task handling and processingHow does the microservice processes tasks, how efficiently the microservice processes those tasks, and how their microservice will perform as the number of requests scales. Common issues include async/await using a synchronous loop.
Application sizeReduce application size. Make sure AWS-SDK is a dev dependency. Remove unnecessary packages with depcheck. Reuse available runtime packages. List of node packages pre-installed on AWS Lambda runtime

Monitoring

StepDescriptionLevel CLevel BLevel A
LoggingProper logging and tracing throughout the stack.
GrafanaWell-designed dashboards that are easy to understand and accurately reflect the health of the service.
Granfana / Cloudwatch Alarms and run-booksEffective, actionable alerting accompanied by run-books.
On-callImplementing and maintaining an on-call rotation.

Documentation

StepDescriptionLevel CLevel BLevel A
GeneralThorough, updated, and centralised documentation containing all of the relevant and essential information about the microservice. Organisational understanding at the developer, team, and ecosystem levels.
DescriptionA description that is short, sweet, and to the point.
Architecture diagramArchitecture diagram showing the full microservice.
LinksLinks to the repository, a link to the dashboard that is used for monitoring, a link to the original RFC for the microservice, and a link to the most recent architecture review. Plus any extra information that may be useful to the developer.
Onboarding and Development GuideStep by step on setup including environments and running offline. It should also include the development cycle and pipelines.
Request Flows, Endpoints, and DependenciesRequest flow diagram to support architecture diagram. Endpoints or invocation method(s). Documented critical dependencies including packages and layers.
FAQCommon questions