Scrutiny¶
Proactive Disk Health Monitoring
Scrutiny is a S.M.A.R.T monitoring tool for hard drives with a web dashboard. It provides proactive disk failure prediction, historical tracking, and alerting to help prevent data loss before it happens.
Why Choose Scrutiny?¶
- Proactive monitoring - predict disk failures before they happen
- Historical tracking - S.M.A.R.T metrics over time
- Multi-host - centralized dashboard with collector agents
- Alerting - webhooks and email notifications
Install Hub¶
Infrastructure as Code
This Scrutiny hub instance is deployed using OpenTofu with custom Incus images. The infrastructure configuration manages container provisioning and persistent storage.
Custom Scrutiny Image
The Scrutiny Incus image is built and maintained at forgejo.benoit.jp.net/Benoit/Laminar.
Infrastructure Configuration¶
The OpenTofu configuration provisions:
- Incus instance running official Scrutiny OCI image
- One persistent storage volume:
/var/backups/scrutiny- Backup storage
- HTTP proxy device - Exposes port 8080 to host port 8097
resource "incus_storage_volume" "scrutiny_var_backups_scrutiny" {
name = "scrutiny_var_backups_scrutiny"
pool = incus_storage_pool.default.name
}
resource "incus_instance" "scrutiny" {
name = "scrutiny"
image = "laminar.incus:scrutiny-0.8.1-1benoitjpnet"
device {
name = "var_backups_scrutiny"
type = "disk"
properties = {
path = "/var/backups/scrutiny"
source = incus_storage_volume.scrutiny_var_backups_scrutiny.name
pool = incus_storage_pool.default.name
}
}
device {
name = "http"
type = "proxy"
properties = {
listen = "tcp:192.168.1.2:8097"
connect = "tcp:127.0.0.1:8080"
}
}
}
Deploy Infrastructure¶
The Scrutiny web dashboard will be accessible at http://192.168.1.2:8097.
Upgrade Hub¶
Incus Image Upgrade
When upgrading to a newer Incus image version, follow these steps to migrate your data.
Backup current instance data before upgrading:
Deploy to new Incus image and restore data:
Update OpenTofu configuration
Edit your OpenTofu configuration file to update the image version:
Upgrade Complete
Your Scrutiny hub is now running on the new Incus image with all monitoring data intact!
Install Collector¶
Collectors can be deployed using various methods. Choose the appropriate method for your environment.
For Incus host servers with direct disk access:
resource "incus_instance" "scrutiny-collector" {
name = "scrutiny-collector"
image = "ghcr:analogj/scrutiny:master-collector"
config = {
"environment.COLLECTOR_API_ENDPOINT" = "https://scrutiny.benoit.jp.net"
"environment.COLLECTOR_CRON_SCHEDULE" = "*/15 * * * *"
"environment.COLLECTOR_HOST_ID" = "incus.home.arpa"
"environment.COLLECTOR_RUN_STARTUP" = "True"
"security.privileged" = "true"
}
# NVMe device passthrough
device {
name = "nvme0"
type = "unix-char"
properties = {
source = "/dev/nvme0"
path = "/dev/nvme0"
mode = "0440"
}
}
}
Privileged Container
This collector runs in privileged mode to access disk devices. Only use on trusted systems.
For Linux systems (Arch Linux, Ubuntu, etc.) using systemd:
This guide sets up the Scrutiny collector agent as a native binary with systemd timer scheduling for automated S.M.A.R.T data collection.
Install the Scrutiny collector agent for automated S.M.A.R.T data collection:
Download & Install Binary¶
#!/bin/bash
# Download the latest scrutiny collector binary
LATEST_VERSION="v0.8.1" # Check GitHub for latest version
echo "Downloading Scrutiny collector ${LATEST_VERSION}..."
sudo wget -O /usr/local/bin/scrutiny-collector \
"https://github.com/AnalogJ/scrutiny/releases/download/${LATEST_VERSION}/scrutiny-collector-metrics-linux-amd64" # (1)!
# Make binary executable
sudo chmod +x /usr/local/bin/scrutiny-collector # (2)!
# Verify installation
/usr/local/bin/scrutiny-collector --help >/dev/null && \
echo "ā Scrutiny collector installed successfully" || \
echo "ā Installation failed" # (3)!
- Download latest collector binary for Linux AMD64
- Set executable permissions for the binary
- Verify installation by testing help command
Version Management
Stay up to date: Check the Scrutiny releases page for the latest version and update LATEST_VERSION accordingly.
Systemd Service Configuration¶
Create the systemd service unit for the collector:
# Create systemd service file
sudo tee /etc/systemd/system/scrutiny-collector.service > /dev/null << 'EOF'
[Unit]
Description=Scrutiny Disk Metrics Collector # (1)!
Documentation=https://github.com/AnalogJ/scrutiny
After=network.target
Wants=network.target
[Service]
Type=oneshot # (2)!
User=root # (3)!
Group=root
Environment=COLLECTOR_API_ENDPOINT=https://scrutiny.benoit.jp.net # (4)!
Environment=COLLECTOR_HOST_ID=lavie.home.arpa # (5)!
Environment=COLLECTOR_RUN_STARTUP=True # (6)!
ExecStart=/usr/local/bin/scrutiny-collector run # (7)!
StandardOutput=journal
StandardError=journal
TimeoutStartSec=300 # (8)!
# Security hardening (optional)
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/tmp # (9)!
[Install]
WantedBy=multi-user.target
EOF
echo "ā Systemd service file created"
- Descriptive service name and documentation
- One-shot service type for scheduled execution
- Root user required for disk access
- API endpoint of your Scrutiny hub server
- Unique host identifier for this collector
- :material-startup: Force startup metrics collection
- Command to execute the collector
- 5-minute timeout for disk scanning operations
- Security hardening with minimal permissions
Systemd Timer Configuration¶
Create a systemd timer for scheduled S.M.A.R.T data collection:
# Create systemd timer file
sudo tee /etc/systemd/system/scrutiny-collector.timer > /dev/null << 'EOF'
[Unit]
Description=Run Scrutiny Disk Metrics Collector every 15 minutes # (1)!
Documentation=https://github.com/AnalogJ/scrutiny
Requires=scrutiny-collector.service # (2)!
[Timer]
OnCalendar=*:0/15 # (3)!
Persistent=true # (4)!
RandomizedDelaySec=60 # (5)!
[Install]
WantedBy=timers.target # (6)!
EOF
echo "ā Systemd timer file created"
- Descriptive timer name and frequency
- Dependency on the collector service
- Run every 15 minutes (:00, :15, :30, :45)
- Execute missed runs after system boot
- :material-dice: Random 0-60 second delay to prevent resource conflicts
- Enable with system timers
Timer Schedule Options
Alternative schedules for different monitoring needs:
- Frequent:
OnCalendar=*:0/5(every 5 minutes) - Standard:
OnCalendar=*:0/15(every 15 minutes) - Conservative:
OnCalendar=hourly(every hour) - Minimal:
OnCalendar=*-*-* 06:00:00(daily at 6 AM)
Enable and start the Scrutiny collector timer:
#!/bin/bash
# Step 1: Reload systemd configuration
echo "Reloading systemd configuration..."
sudo systemctl daemon-reload # (1)!
# Step 2: Enable timer to start on boot
echo "Enabling scrutiny-collector timer..."
sudo systemctl enable scrutiny-collector.timer # (2)!
# Step 3: Start the timer immediately
echo "Starting scrutiny-collector timer..."
sudo systemctl start scrutiny-collector.timer # (3)!
# Step 4: Verify timer status
echo "Checking timer status..."
sudo systemctl status scrutiny-collector.timer --no-pager # (4)!
# Step 5: View timer schedule
echo "Timer schedule:"
sudo systemctl list-timers scrutiny-collector.timer --no-pager # (5)!
# Step 6: Test manual execution
echo "Testing manual service execution..."
sudo systemctl start scrutiny-collector.service # (6)!
echo "ā Scrutiny collector setup completed"
- Load new service and timer configurations
- Enable automatic startup on system boot
- Start timer for immediate scheduling
- Check current timer status and health
- Display next scheduled execution times
- Manual test run to verify functionality
Verification & Monitoring¶
Verify your Scrutiny collector is working correctly:
#!/bin/bash
echo "=== Scrutiny Collector Status Verification ==="
# Check timer status
echo "1. Timer Status:"
sudo systemctl is-active scrutiny-collector.timer && \
echo "ā Timer is active" || \
echo "ā Timer is inactive" # (1)!
# Check service execution history
echo -e "\n2. Service Execution History:"
sudo journalctl -u scrutiny-collector.service \
--since "24 hours ago" --no-pager # (2)!
# View next scheduled runs
echo -e "\n3. Upcoming Schedule:"
sudo systemctl list-timers scrutiny-collector.timer --no-pager # (3)!
# Test manual execution
echo -e "\n4. Manual Test Run:"
sudo systemctl start scrutiny-collector.service
sleep 5
sudo journalctl -u scrutiny-collector.service -n 10 --no-pager # (4)!
echo -e "\nā Verification completed"
- Verify timer is running and scheduled
- Review last 24 hours of collector executions
- Show next scheduled execution times
- Manual test with immediate log review
Monitor collector performance and troubleshoot issues:
# Real-time timer logs
sudo journalctl -u scrutiny-collector.timer -f # (1)!
# Service execution logs (last 50 entries)
sudo journalctl -u scrutiny-collector.service -n 50 # (2)!
# Filter for errors only
sudo journalctl -u scrutiny-collector.service \
--since "1 week ago" --grep "ERROR\|WARN\|FAIL" # (3)!
# Performance metrics
sudo journalctl -u scrutiny-collector.service \
--since "today" --grep "duration\|ms\|seconds" # (4)!
- Real-time monitoring of timer events
- Recent service execution details
- Filter for error and warning messages
- Performance timing information
Configuration Reference¶
Key Configuration Details
Current setup summary:
- Timer Schedule: Every 15 minutes (
OnCalendar=*:0/15) - Persistent Execution: Missed runs execute on system startup
- Binary Location:
/usr/local/bin/scrutiny-collector - Service Type: One-shot execution with systemd timer
- Environment Variables: Configured in service unit file
- Security: Hardened with minimal filesystem access
-
Performance Settings
-
Notification Config
-
Network Settings
-
Security Options
Troubleshooting Guide¶
Common Issues & Solutions
Issue: Collector can't access NVMe devices
# Check NVMe device availability
ls -la /dev/nvme* # (1)!
sudo nvme list # (2)!
# Verify SMART support
sudo smartctl -i /dev/nvme0n1 # (3)!
sudo smartctl -a /dev/nvme0n1 | head -20 # (4)!
# Check device permissions
ls -la /dev/sd* /dev/nvme* # (5)!
- List all NVMe devices
- Display NVMe device information
- Check SMART capability
- Display SMART attributes
- Verify device permissions
Issue: Cannot connect to Scrutiny hub
# Test API endpoint
curl -v https://scrutiny.benoit.jp.net/api/health # (1)!
# Test with manual collector run
sudo COLLECTOR_API_ENDPOINT=https://scrutiny.benoit.jp.net \
COLLECTOR_HOST_ID=test.local \
COLLECTOR_LOG_LEVEL=DEBUG \
/usr/local/bin/scrutiny-collector run # (2)!
# Check DNS resolution
nslookup scrutiny.benoit.jp.net # (3)!
- Test hub API endpoint availability
- Run collector with debug logging
- Verify DNS resolution
Issue: Timer not executing as expected
# Check timer status
sudo systemctl status scrutiny-collector.timer # (1)!
# List all timers
sudo systemctl list-timers --all | grep scrutiny # (2)!
# Check service dependencies
sudo systemctl show scrutiny-collector.timer # (3)!
# Reset timer
sudo systemctl stop scrutiny-collector.timer
sudo systemctl start scrutiny-collector.timer # (4)!
- Check timer service status
- List timer states
- Verify service dependencies
- Reset timer execution
Related Documentation: - Infrastructure Overview - Complete self-hosting architecture - System Monitoring Guide - Additional monitoring tools and practices