Health Monitoring
How to monitor the health of Ghostwriter services
Introducing Health Monitoring
Ghostwriter monitors the health of its services in two ways: Docker health checks and internal monitoring and testing.
Docker Health Checks
Docker automatically monitors the containers via HEALTHCHECK
commands (see Docker documentation for technical information). These commands check to make sure the service is responding and basic functionality is working.
The results of these commands can be checked with this command:
Each container runs a service-specific command on a schedule. If the command returns successfully (exit code 0
), the Status
column will show healthy
. Any other exit code will flip the status to unhealthy
, indicating the service is likely not functioning properly.
By default, the commands run with these attributes that can be adjusted via Ghostwriter CLI's config set
command:
Start running after 30s (
HEALTHCHECK_START
, default is30s
)Run every 120s (
HEALTHCHECK_INTERVAL
, default is120s
)Timeout after 10s (
HEALTHCHECK_TIMEOUT
, default is10s
)Will be retried once (
HEALTHCHECK_RETRIES
, default is1
)
Internal Testing and Monitoring
Ghostwriter also tests each service more thoroughly with two API endpoints:
/status/
/status/simple/
The first endpoint, /status, tests critical services and displays a table of results:
These tests are more thorough than the Docker health checks. For example, Docker will verify the database back end is listening and accepting connections, but Ghostwriter runs tests to ensure the database is accepting connections and reading and writing are working as expected. Ghostwriter also checks disk usage is below 90% and at least 100MB of memory is available. These checks are can be adjusted via Ghostwriter CLI's config set
command:
HEALTHCHECK_DISK_USAGE_MAX
(Default is90
)HEALTHCHECK_MEM_MIN
(Default is100
)
This endpoint can also return a JSON version of the test results if you set the Accept: application/json
header.
Running these tests constantly could be a strain on the server, so the tests run on-demand when you visit this page.
You can visit the simplified endpoint, /status/simple/, to run lightweight checks. This endpoint checks the web server, database status, and cache status and returns one of the following responses:
Response | Response Code | Description |
---|---|---|
OK | 200 | System is healthy |
WARNING | 200 | One or more tests did not pass and you should check the detailed status endpoint |
ERROR | 500 | There was an unexpected error indicating a critical issue |
The home dashboard displays a basic system health status. The status is based on the results from the simplified endpoint.
Automated Monitoring
You can automate monitoring with something like Uptime Kuma or a similar tool. The recommended logic is:
Request the /status/simple/ endpoint
Check for the response code and content
If
OK
is not in the response or the response code is not200
, send an alert with a link to /status/ for details
If the tool is capable of triggering a secondary action, you could have the tool pull the JSON data from /status/:
Working services will have a working
status. A service experiencing a problem will have a descriptive warning or error message that will tell you why it failed the test. You can use this information to customize your monitoring alert.
Last updated