Various components in the system need to report their status. This
enum
standardizes on a set of conventions for reporting
this. Each
Status has an integer
value
which can
be compared; higher values represent higher levels of functionality.
In addition, each
Status has a
String
signal
which is either GREEN, YELLOW, or RED.
The general notion is that the whole system (or a subsystem) should
only report GREEN status (UP) if everything is working as designed.
When subsystems start to go down, or the current system stops working,
status should drop to either YELLOW (DEGRADED) or RED (DOWN), depending
on whether service was still available in some form. For example, if
a cache subsystem goes down, we might report a YELLOW status because
we can attempt to serve without a cache. However, if a required database
goes down, we probably need to report a RED status, unable to serve
requests.
A "rugged" system should be able to accurately (and responsively)
report on its status even if it is unable to perform its main functions.
This will assist operators in diagnosing the problem; a hung process
tells no tales.