Skip to main content

Microservice Production Ready Checklist

Microservice Production Ready Checklist

Stability

Step	Description	Level C	Level B	Level A
Unit tests	It has unit tests. And the unit tests are running in a CI system and passing.	✅	✅	✅
Development Cycle	We have a stable and reliable development cycle including code reviews, build systems and deployment pipelines.	✅	✅	✅
Threshold test coverage	Its test coverage is over 60%.		✅	✅
High test coverage	Its test coverage is over 80%.		✅	✅
Config in env-var	Its config can be overridden via environment variable.		✅	✅
Depreciation Procedures	We have a depreciation run-book. Must include appropriate alerting and guide for updating clients.		✅	✅

Reliability

Step	Description	Level C	Level B	Level A
Automated Build	Its automated build process is running in CI/CD system.		✅	✅
Automatic Deploy	Its automated deploy process is running in CI/CD system.		✅	✅
Dependencies	Its dependencies are automatically/continuously updated and fixed when they are out of date or vulnerable.		✅	✅

Scalability

Step	Description	Level C	Level B	Level A
Manual Scale	It can be manually scaled horizontally to handle changes in workload.	✅
Auto Scale	It automatically scales horizontally to handle fluctuating workloads.		✅	✅
CPU Req/Limit	Its CPU limit and request are set as described in the Resource Requests and Limits documentation.	✅	✅	✅
Memory Req/Limit	Its memory resource request value is as same as limit value.	✅	✅	✅
Capacity Planning	It can handle the expected load: either load test has been performed, or the expected traffic is under control.		✅	✅
Deployment Downtime	Its deploy process does not cause service degradation or downtime (e.g. error rate does not increase during deploy).		✅	✅
Graceful Degradation	It keeps working, at least partially, while dependencies (e.g. other service or database) are not working partially or completely.		✅	✅
Retries	It performs smart retries when interacting with dependencies (e.g. other services or database).			✅

Fault Tolerance and Catastrophe-Preparedness

Step	Description	Level C	Level B	Level A
Identify failure scenarios and plan mitigation	Potential catastrophes and failure scenarios are identified and planned for.			✅
Single points of failure are identified and resolved	Single points of failure are identified and resolved.		✅	✅
Failure detection and remediation strategies are in place	Failure detection and remediation strategies are in place.		✅	✅
Load testing	Load tests are automated or occur on a regular cadence. We should document the results.			✅
Stress testing	Load tests are automated or occur on a regular cadence. We should document the results.			✅
Chaos testing	Once the applications have proven the ability to stand up to load and stress, chaos testing is integrated to identify weak points and opportunities to reduce failures.			✅
Incident	Incidents and outages are handled appropriately and productively.		✅	✅

Performance

Step	Description	Level C	Level B	Level A
Appropriate service-level agreements (SLAs) for availability	Makes sure these are actually achievable with the current size of our team.		✅	✅
Task handling and processing	How does the microservice processes tasks, how efficiently the microservice processes those tasks, and how their microservice will perform as the number of requests scales. Common issues include async/await using a synchronous loop.		✅	✅
Application size	Reduce application size. Make sure AWS-SDK is a dev dependency. Remove unnecessary packages with depcheck. Reuse available runtime packages. List of node packages pre-installed on AWS Lambda runtime	✅	✅	✅

Monitoring

Step	Description	Level C	Level B	Level A
Logging	Proper logging and tracing throughout the stack.		✅	✅
Grafana	Well-designed dashboards that are easy to understand and accurately reflect the health of the service.		✅	✅
Granfana / Cloudwatch Alarms and run-books	Effective, actionable alerting accompanied by run-books.		✅	✅
On-call	Implementing and maintaining an on-call rotation.		✅	✅

Documentation

Step	Description	Level C	Level B	Level A
General	Thorough, updated, and centralised documentation containing all of the relevant and essential information about the microservice. Organisational understanding at the developer, team, and ecosystem levels.		✅	✅
Description	A description that is short, sweet, and to the point.		✅	✅
Architecture diagram	Architecture diagram showing the full microservice.		✅	✅
Links	Links to the repository, a link to the dashboard that is used for monitoring, a link to the original RFC for the microservice, and a link to the most recent architecture review. Plus any extra information that may be useful to the developer.		✅	✅
Onboarding and Development Guide	Step by step on setup including environments and running offline. It should also include the development cycle and pipelines.		✅	✅
Request Flows, Endpoints, and Dependencies	Request flow diagram to support architecture diagram. Endpoints or invocation method(s). Documented critical dependencies including packages and layers.		✅	✅
FAQ	Common questions		✅	✅

Stability
Reliability
Scalability
Fault Tolerance and Catastrophe-Preparedness
Performance
Monitoring
Documentation