A server health audit catches degradation before it becomes an outage. Whether you run Linux or Windows servers on-premises or in the cloud, this checklist covers the critical metrics - CPU, memory, disk, services, logs, and security posture - that determine server reliability.

On this page

Linux Server Checklist
Windows Server Checklist
Resource Utilisation Thresholds
Security Hardening Checklist
Automation & Monitoring Tools

Linux Server Checklist

0/12 complete

Windows Server Checklist

0/10 complete

Resource Utilisation Thresholds

Metric	Normal	Warning	Critical	Immediate Action
CPU Usage (sustained)	< 60%	60–80%	> 80%	Investigate top processes, scale compute, check for runaway jobs
Memory Usage	< 70%	70–85%	> 85%	Check for memory leaks, add swap, scale instance, restart leaking service
Disk Usage	< 70%	70–85%	> 90%	Run disk cleanup, archive logs, expand volume, investigate top consumers
Disk I/O Await (Linux)	< 10ms	10–20ms	> 20ms	Check SMART, inspect disk-heavy processes, consider SSD/IOPS upgrade
Load Average / CPU Cores	< 0.7×	0.7–1.0×	> 1.0×	Profile CPU-heavy processes, optimise, or scale horizontally
Network Packet Loss	0%	< 0.1%	> 0.5%	Check NIC, switch port, ISP, or investigate potential attack traffic

CPU Usage (sustained)

Normal: < 60%
Warning: 60–80%
Critical: > 80%
Immediate Action: Investigate top processes, scale compute, check for runaway jobs

Memory Usage

Normal: < 70%
Warning: 70–85%
Critical: > 85%
Immediate Action: Check for memory leaks, add swap, scale instance, restart leaking service

Disk Usage

Normal: < 70%
Warning: 70–85%
Critical: > 90%
Immediate Action: Run disk cleanup, archive logs, expand volume, investigate top consumers

Disk I/O Await (Linux)

Normal: < 10ms
Warning: 10–20ms
Critical: > 20ms
Immediate Action: Check SMART, inspect disk-heavy processes, consider SSD/IOPS upgrade

Load Average / CPU Cores

Normal: < 0.7×
Warning: 0.7–1.0×
Critical: > 1.0×
Immediate Action: Profile CPU-heavy processes, optimise, or scale horizontally

Network Packet Loss

Normal: 0%
Warning: < 0.1%
Critical: > 0.5%
Immediate Action: Check NIC, switch port, ISP, or investigate potential attack traffic

Security Hardening Checklist

0/8 complete

Automation & Monitoring Tools

Prometheus + Grafana - open-source metric collection and dashboarding. Excellent for Linux/containers, free, widely adopted.
Zabbix - open-source monitoring with agent-based and agentless monitoring, network discovery, and alerting.
Nagios Core - the classic server monitoring tool. Still widely used, extensive plugin ecosystem.
Datadog - commercial, excellent for cloud + container environments, strong APM and anomaly detection.
New Relic - commercial, strong application performance monitoring, free tier available.
AWS CloudWatch / Azure Monitor / GCP Cloud Monitoring - native cloud monitoring, ideal for cloud-hosted servers.
Wazuh - open-source SIEM + EDR. Excellent for security log analysis and compliance (runs on Linux/Windows agents).

Automate This Checklist

Schedule this audit to run monthly at minimum. Most monitoring tools (Nagios, Zabbix, Prometheus) can run these checks continuously and alert on threshold breaches - moving you from reactive to proactive operations.

What’s next?

More ResourcesBrowse Audits & AssessmentsView category Implement ItIT Audit ServicesView services Build SkillsSecurity TrainingExplore courses

Back to all resources

On this page

Linux Server Checklist
Windows Server Checklist
Resource Utilisation Thresholds
Security Hardening Checklist
Automation & Monitoring Tools

Linux Server Checklist

0/12 complete

Windows Server Checklist

0/10 complete

Resource Utilisation Thresholds

Metric	Normal	Warning	Critical	Immediate Action
CPU Usage (sustained)	< 60%	60–80%	> 80%	Investigate top processes, scale compute, check for runaway jobs
Memory Usage	< 70%	70–85%	> 85%	Check for memory leaks, add swap, scale instance, restart leaking service
Disk Usage	< 70%	70–85%	> 90%	Run disk cleanup, archive logs, expand volume, investigate top consumers
Disk I/O Await (Linux)	< 10ms	10–20ms	> 20ms	Check SMART, inspect disk-heavy processes, consider SSD/IOPS upgrade
Load Average / CPU Cores	< 0.7×	0.7–1.0×	> 1.0×	Profile CPU-heavy processes, optimise, or scale horizontally
Network Packet Loss	0%	< 0.1%	> 0.5%	Check NIC, switch port, ISP, or investigate potential attack traffic

CPU Usage (sustained)

Normal: < 60%
Warning: 60–80%
Critical: > 80%
Immediate Action: Investigate top processes, scale compute, check for runaway jobs

Memory Usage

Normal: < 70%
Warning: 70–85%
Critical: > 85%
Immediate Action: Check for memory leaks, add swap, scale instance, restart leaking service

Disk Usage

Normal: < 70%
Warning: 70–85%
Critical: > 90%
Immediate Action: Run disk cleanup, archive logs, expand volume, investigate top consumers

Disk I/O Await (Linux)

Normal: < 10ms
Warning: 10–20ms
Critical: > 20ms
Immediate Action: Check SMART, inspect disk-heavy processes, consider SSD/IOPS upgrade

Load Average / CPU Cores

Normal: < 0.7×
Warning: 0.7–1.0×
Critical: > 1.0×
Immediate Action: Profile CPU-heavy processes, optimise, or scale horizontally

Network Packet Loss

Normal: 0%
Warning: < 0.1%
Critical: > 0.5%
Immediate Action: Check NIC, switch port, ISP, or investigate potential attack traffic

Security Hardening Checklist

0/8 complete

Automation & Monitoring Tools

Prometheus + Grafana - open-source metric collection and dashboarding. Excellent for Linux/containers, free, widely adopted.
Zabbix - open-source monitoring with agent-based and agentless monitoring, network discovery, and alerting.
Nagios Core - the classic server monitoring tool. Still widely used, extensive plugin ecosystem.
Datadog - commercial, excellent for cloud + container environments, strong APM and anomaly detection.
New Relic - commercial, strong application performance monitoring, free tier available.
AWS CloudWatch / Azure Monitor / GCP Cloud Monitoring - native cloud monitoring, ideal for cloud-hosted servers.
Wazuh - open-source SIEM + EDR. Excellent for security log analysis and compliance (runs on Linux/Windows agents).

Automate This Checklist

What’s next?

More ResourcesBrowse Audits & AssessmentsView category Implement ItIT Audit ServicesView services Build SkillsSecurity TrainingExplore courses

Back to all resources

Server Health Audit Checklist

Linux Server Checklist

Windows Server Checklist

Resource Utilisation Thresholds

Security Hardening Checklist

Automation & Monitoring Tools

Server Health Audit Checklist

Linux Server Checklist

Windows Server Checklist

Resource Utilisation Thresholds

Security Hardening Checklist

Automation & Monitoring Tools