System monitoring / Health Check
Introduction
This article describes how the CoreOne services can be monitored.
Usage
This information can be used for normal system monitoring as well as for load balancing and high availability functions. This helps customers to monitor the application by themselves. Please do not forget to duplicate the monitoring system on all nodes if the application is deployed in a high availability deployment scenario. This System monitoring recommendations is not including a rudimentary server monitoring like CPU, RAM or disk usage. This is of course also recommended.
Components you should monitor
Windows Services
We recommend to monitor the state of the following Windows Services:
World Wide Web Publishsing Service (IIS)
CoreOne Suite
CoreOne Suite System Connector
MySQL
App Pools and Sites
Please monitor the state of all App Pools and Sites starting with CoreOne*
Health Check functions
The following functions work like normal HTTP GET calls:
Service | URL | Answer OK | Answer ERROR | What is being tested? | Description |
---|---|---|---|---|---|
CoreOne Authentication Services | https://${authenticationUrl}/health | HTTP 200 | HTTP 500 HTTP 404 |
| |
CoreOne Authentication Services | https://${authenticationUrl}/health/details | HTTP 200 | HTTP 200Â HTTP 404 HTTP 500 |
| Gives a 200 answer with a detailed list of which subsystems work and which do not. The 200 also appears when subsystems are not available. The content is intended for graphical assessment by system administrators. The 404 and 500 message appears if, for example, .NET Core hosting is missing on the system. |
CoreOne Authentication Services | https://${authenticationUrl}/health/authenticationservice | HTTP 200 | HTTP 500 HTTP 404 |
| This checks the same connections as the /health/details but without the API-Connection. This is used to check the connection from the Backend- to the Auth-Server. |
CoreOne Web Service | https://${webUrl}/health | HTTP 200 | HTTP 500 HTTP 404 |
| |
CoreOne Web Service | https://${webUrl}/health/details | HTTP 200 | HTTP 200Â HTTP 404 HTTP 500 |
| Gives a 200 answer with a detailed list of which subsystems work and which do not. The 200 also appears when subsystems are not available. The content is intended for graphical assessment by system administrators. The 404 and 500 message appears if, for example, .NET Core hosting is missing on the system. |
CoreOne Self-Service Portal | https://${portalUrl}/health | HTTP 200 | HTTP 500 HTTP 404 |
| Â |
CoreOne Self-Service Portal | https://${portalUrl}/health/details | HTTP 200 | HTTP 200Â HTTP 404 HTTP 500 |
| Â |
CoreOne Application Services | http://${applicationUrl}:7000/health | HTTP 200 | HTTP 500 HTTP 404 |
|
CoreOne Database Services
We recommend to monitor the database instance you are using for the CoreOne Database Services. Depending on the database you use, the parameters to monitor may vary. Here is an example list of metrics to track for an MySQL database:
Status
Version
Uptime
Aborted clients per second
Aborted connections per second
Connection errors accept per second
Connection errors internal per second
Connection errors max connections per second
Connection errors peer address per second
Connection errors select per second
Connection errors tcpwrap per second
Connections per second
Max used connections
Threads cached
Threads connected
Threads created per second
Threads running
Buffer pool efficiency
Buffer pool utilization
Created tmp files on disk per second
Created tmp tables on disk per second
Created tmp tables on memory per second
InnoDB buffer pool pages free
InnoDB buffer pool pages total
InnoDB buffer pool read requests per second
InnoDB buffer pool reads per second
InnoDB row lock time
InnoDB row lock time max
InnoDB row lock waits
Slow queries per second
Bytes received
Bytes sent
Command Delete per second
Command Insert per second
Command Select per second
Command Update per second
Queries per second
Questions per second
Binlog cache disk use
Innodb buffer pool wait free
Innodb number open files
Open table definitions
Open tables
Innodb log written
Calculated value of innodb_log_file_size
Size of database {#DBNAME}
Binlog commits
Binlog group commits
Master GTID wait count
Master GTID wait time
Master GTID wait timeouts
Get status variables
InnoDB buffer pool read requests
InnoDB buffer pool reads
If you have a Galera Cluster in place, you should also monitor the health of the cluster and each node.
We also recommend to observe the Backup Process of MySQL. You could check the file size of the backup and the timestamp.
Errors in Logs & Tasks
We recommend to observe the Log’s for Errors. A corrupt import or if you delete objects in a target system, can generate a lot of errors and affect the performance. An overview of Log-Files can be found here: Logs
You also can and should monitor tasks: Just check the lastrun_message
-column. For further details see: Task configuration
Please get in touch with us, if you want to observe the tasks- and log_log
-Table with a read only user.
Appendix
Some monitoring systems are not able to check all these components directly. You can use workarounds like Powershell and Windows Task Scheduler to create a monitoring-text-file. Then you can check this text file content (please do not forget to check the age of the text-file with the monitoring - this way you make sure, the values are up to date). If you need assistance with that, we are happy to help you.
Please also make sure to keep a documentation of what to do, if the monitoring check state is not in a accepted condition.
Detailed monitoring
The health checks listed above only provide information about whether the relevant Services are available, started and operational. The checks do not provide any information about internal processes, pending processes or the like.
This detailed information can also be monitored using managed services. Please get in touch with your contact person.
© ITSENSE AG. Alle Rechte vorbehalten. ITSENSE und CoreOne sind eingetragene Marken der ITSENSE AG.