Scrutiny¶

Proactive Disk Health Monitoring

Scrutiny is a S.M.A.R.T monitoring tool for hard drives with a web dashboard. It provides proactive disk failure prediction, historical tracking, and alerting to help prevent data loss before it happens.

Why Choose Scrutiny?¶

Proactive monitoring - predict disk failures before they happen
Historical tracking - S.M.A.R.T metrics over time
Multi-host - centralized dashboard with collector agents
Alerting - webhooks and email notifications

Install Hub¶

Infrastructure as Code

This Scrutiny hub instance is deployed using OpenTofu with custom Incus images. The infrastructure configuration manages container provisioning and persistent storage.

Custom Scrutiny Image

The Scrutiny Incus image is built and maintained at forgejo.benoit.jp.net/Benoit/Laminar.

Infrastructure Configuration¶

The OpenTofu configuration provisions:

Incus instance running official Scrutiny OCI image
One persistent storage volume:
- /var/backups/scrutiny - Backup storage
HTTP proxy device - Exposes port 8080 to host port 8097

OpenTofu Configuration

resource "incus_storage_volume" "scrutiny_var_backups_scrutiny" {
  name = "scrutiny_var_backups_scrutiny"
  pool = incus_storage_pool.default.name
}

resource "incus_instance" "scrutiny" {
  name  = "scrutiny"
  image = "laminar.incus:scrutiny-0.8.1-1benoitjpnet"

  device {
    name = "var_backups_scrutiny"
    type = "disk"
    properties = {
      path   = "/var/backups/scrutiny"
      source = incus_storage_volume.scrutiny_var_backups_scrutiny.name
      pool   = incus_storage_pool.default.name
    }
  }

  device {
    name = "http"
    type = "proxy"
    properties = {
      listen  = "tcp:192.168.1.2:8097"
      connect = "tcp:127.0.0.1:8080"
    }
  }
}

Deploy Infrastructure¶

Apply OpenTofu configuration

tofu apply

The Scrutiny web dashboard will be accessible at http://192.168.1.2:8097.

Upgrade Hub¶

Incus Image Upgrade

When upgrading to a newer Incus image version, follow these steps to migrate your data.

Backup Restore

Backup current instance data before upgrading:

Enter Incus container

incus shell scrutiny

Backup configuration files

cp -r /opt/scrutiny/config/ /var/backups/scrutiny/

Backup InfluxDB database

cp -r /var/lib/influxdb/ /var/backups/scrutiny/

Deploy to new Incus image and restore data:

Update OpenTofu configuration

Edit your OpenTofu configuration file to update the image version:

resource "incus_instance" "scrutiny" {
  name  = "scrutiny"
  image = "laminar.incus:scrutiny-NEW-VERSION"  # Update this line
  # ...
}

Apply infrastructure changes

Provision new Incus container

tofu apply

Enter Incus container

incus shell scrutiny

Restore configuration files

cp -r /var/backups/scrutiny/config/ /opt/scrutiny/

Restore InfluxDB database

rsync -av --delete /var/backups/scrutiny/influxdb/ /var/lib/influxdb/

Restart InfluxDB and Scrutiny services

systemctl restart influxdb scrutiny-webapp

Upgrade Complete

Your Scrutiny hub is now running on the new Incus image with all monitoring data intact!

Install Collector¶

Collectors can be deployed using various methods. Choose the appropriate method for your environment.

Incus Server Linux Laptop/Desktop

For Incus host servers with direct disk access:

OpenTofu Configuration

resource "incus_instance" "scrutiny-collector" {
  name  = "scrutiny-collector"
  image = "ghcr:analogj/scrutiny:master-collector"

  config = {
    "environment.COLLECTOR_API_ENDPOINT"  = "https://scrutiny.benoit.jp.net"
    "environment.COLLECTOR_CRON_SCHEDULE" = "*/15 * * * *"
    "environment.COLLECTOR_HOST_ID"       = "incus.home.arpa"
    "environment.COLLECTOR_RUN_STARTUP"   = "True"
    "security.privileged"                 = "true"
  }

  # NVMe device passthrough
  device {
    name = "nvme0"
    type = "unix-char"
    properties = {
      source = "/dev/nvme0"
      path   = "/dev/nvme0"
      mode   = "0440"
    }
  }
}

Privileged Container

This collector runs in privileged mode to access disk devices. Only use on trusted systems.

For Linux systems (Arch Linux, Ubuntu, etc.) using systemd:

This guide sets up the Scrutiny collector agent as a native binary with systemd timer scheduling for automated S.M.A.R.T data collection.

Collector Installation Service Activation

Install the Scrutiny collector agent for automated S.M.A.R.T data collection:

Download & Install Binary¶

install_scrutiny_collector.sh

#!/bin/bash
# Download the latest scrutiny collector binary
LATEST_VERSION="v0.8.1"  # Check GitHub for latest version

echo "Downloading Scrutiny collector ${LATEST_VERSION}..."
sudo wget -O /usr/local/bin/scrutiny-collector \
  "https://github.com/AnalogJ/scrutiny/releases/download/${LATEST_VERSION}/scrutiny-collector-metrics-linux-amd64" # (1)!

# Make binary executable
sudo chmod +x /usr/local/bin/scrutiny-collector        # (2)!

# Verify installation
/usr/local/bin/scrutiny-collector --help >/dev/null && \
    echo "✓ Scrutiny collector installed successfully" || \
    echo "✗ Installation failed"                       # (3)!

Download latest collector binary for Linux AMD64
Set executable permissions for the binary
Verify installation by testing help command

Version Management

Stay up to date: Check the Scrutiny releases page for the latest version and update LATEST_VERSION accordingly.

Systemd Service Configuration¶

Create the systemd service unit for the collector:

create_systemd_service.sh

# Create systemd service file
sudo tee /etc/systemd/system/scrutiny-collector.service > /dev/null << 'EOF'
[Unit]
Description=Scrutiny Disk Metrics Collector           # (1)!
Documentation=https://github.com/AnalogJ/scrutiny
After=network.target
Wants=network.target

[Service]
Type=oneshot                                          # (2)!
User=root                                             # (3)!
Group=root
Environment=COLLECTOR_API_ENDPOINT=https://scrutiny.benoit.jp.net  # (4)!
Environment=COLLECTOR_HOST_ID=lavie.home.arpa         # (5)!
Environment=COLLECTOR_RUN_STARTUP=True                # (6)!
ExecStart=/usr/local/bin/scrutiny-collector run       # (7)!
StandardOutput=journal
StandardError=journal
TimeoutStartSec=300                                   # (8)!

# Security hardening (optional)
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/tmp                                   # (9)!

[Install]
WantedBy=multi-user.target
EOF

echo "✓ Systemd service file created"

Descriptive service name and documentation
One-shot service type for scheduled execution
Root user required for disk access
API endpoint of your Scrutiny hub server
Unique host identifier for this collector
:material-startup: Force startup metrics collection
Command to execute the collector
5-minute timeout for disk scanning operations
Security hardening with minimal permissions

Systemd Timer Configuration¶

Create a systemd timer for scheduled S.M.A.R.T data collection:

create_systemd_timer.sh

# Create systemd timer file
sudo tee /etc/systemd/system/scrutiny-collector.timer > /dev/null << 'EOF'
[Unit]
Description=Run Scrutiny Disk Metrics Collector every 15 minutes  # (1)!
Documentation=https://github.com/AnalogJ/scrutiny
Requires=scrutiny-collector.service                              # (2)!

[Timer]
OnCalendar=*:0/15                                                # (3)!
Persistent=true                                                  # (4)!
RandomizedDelaySec=60                                           # (5)!

[Install]
WantedBy=timers.target                                          # (6)!
EOF

echo "✓ Systemd timer file created"

Descriptive timer name and frequency
Dependency on the collector service
Run every 15 minutes (:00, :15, :30, :45)
Execute missed runs after system boot
:material-dice: Random 0-60 second delay to prevent resource conflicts
Enable with system timers

Timer Schedule Options

Alternative schedules for different monitoring needs:

Frequent: OnCalendar=*:0/5 (every 5 minutes)
Standard: OnCalendar=*:0/15 (every 15 minutes)
Conservative: OnCalendar=hourly (every hour)
Minimal: OnCalendar=*-*-* 06:00:00 (daily at 6 AM)

Enable and start the Scrutiny collector timer:

activate_scrutiny_service.sh

#!/bin/bash

# Step 1: Reload systemd configuration
echo "Reloading systemd configuration..."
sudo systemctl daemon-reload                        # (1)!

# Step 2: Enable timer to start on boot
echo "Enabling scrutiny-collector timer..."
sudo systemctl enable scrutiny-collector.timer     # (2)!

# Step 3: Start the timer immediately
echo "Starting scrutiny-collector timer..."
sudo systemctl start scrutiny-collector.timer      # (3)!

# Step 4: Verify timer status
echo "Checking timer status..."
sudo systemctl status scrutiny-collector.timer --no-pager  # (4)!

# Step 5: View timer schedule
echo "Timer schedule:"
sudo systemctl list-timers scrutiny-collector.timer --no-pager  # (5)!

# Step 6: Test manual execution
echo "Testing manual service execution..."
sudo systemctl start scrutiny-collector.service    # (6)!

echo "✓ Scrutiny collector setup completed"

Load new service and timer configurations
Enable automatic startup on system boot
Start timer for immediate scheduling
Check current timer status and health
Display next scheduled execution times
Manual test run to verify functionality

Verification & Monitoring¶

Operation Verification Log Monitoring

Verify your Scrutiny collector is working correctly:

verify_scrutiny_operation.sh

#!/bin/bash

echo "=== Scrutiny Collector Status Verification ==="

# Check timer status
echo "1. Timer Status:"
sudo systemctl is-active scrutiny-collector.timer && \
    echo "✓ Timer is active" || \
    echo "✗ Timer is inactive"                        # (1)!

# Check service execution history
echo -e "\n2. Service Execution History:"
sudo journalctl -u scrutiny-collector.service \
    --since "24 hours ago" --no-pager               # (2)!

# View next scheduled runs
echo -e "\n3. Upcoming Schedule:"
sudo systemctl list-timers scrutiny-collector.timer --no-pager  # (3)!

# Test manual execution
echo -e "\n4. Manual Test Run:"
sudo systemctl start scrutiny-collector.service
sleep 5
sudo journalctl -u scrutiny-collector.service -n 10 --no-pager  # (4)!

echo -e "\n✓ Verification completed"

Verify timer is running and scheduled
Review last 24 hours of collector executions
Show next scheduled execution times
Manual test with immediate log review

Monitor collector performance and troubleshoot issues:

scrutiny_log_monitoring.sh

# Real-time timer logs
sudo journalctl -u scrutiny-collector.timer -f      # (1)!

# Service execution logs (last 50 entries)
sudo journalctl -u scrutiny-collector.service -n 50 # (2)!

# Filter for errors only
sudo journalctl -u scrutiny-collector.service \
    --since "1 week ago" --grep "ERROR\|WARN\|FAIL" # (3)!

# Performance metrics
sudo journalctl -u scrutiny-collector.service \
    --since "today" --grep "duration\|ms\|seconds"  # (4)!

Real-time monitoring of timer events
Recent service execution details
Filter for error and warning messages
Performance timing information

Configuration Reference¶

Key Configuration Details

Current setup summary:

Timer Schedule: Every 15 minutes (OnCalendar=*:0/15)
Persistent Execution: Missed runs execute on system startup
Binary Location: /usr/local/bin/scrutiny-collector
Service Type: One-shot execution with systemd timer
Environment Variables: Configured in service unit file
Security: Hardened with minimal filesystem access

Performance Settings

Performance optimization

# Adjust timeout for slow disks
TimeoutStartSec=600

# Reduce system load
Nice=19
IOSchedulingClass=2
IOSchedulingPriority=7

Notification Config

Environment variables

# Webhook notifications
COLLECTOR_NOTIFY_WEBHOOK=https://hooks.slack.com/...

# Custom metadata
COLLECTOR_CUSTOM_TAGS=production,nvme,raid

Network Settings

Connection configuration

# Custom API endpoint
COLLECTOR_API_ENDPOINT=https://scrutiny.domain.com

# Timeout settings
COLLECTOR_TIMEOUT=30
COLLECTOR_RETRY_COUNT=3

Security Options

Additional hardening

# Restrict capabilities
CapabilityBoundingSet=CAP_SYS_ADMIN
AmbientCapabilities=CAP_SYS_ADMIN

# Network isolation
PrivateNetwork=false  # Required for API calls

Troubleshooting Guide¶

Common Issues & Solutions

Disk Access Problems Network Connectivity Timer Issues

Issue: Collector can't access NVMe devices

Diagnose disk access issues

# Check NVMe device availability
ls -la /dev/nvme*                               # (1)!
sudo nvme list                                  # (2)!

# Verify SMART support
sudo smartctl -i /dev/nvme0n1                   # (3)!
sudo smartctl -a /dev/nvme0n1 | head -20       # (4)!

# Check device permissions
ls -la /dev/sd* /dev/nvme*                      # (5)!

List all NVMe devices
Display NVMe device information
Check SMART capability
Display SMART attributes
Verify device permissions

Issue: Cannot connect to Scrutiny hub

Test network connectivity

# Test API endpoint
curl -v https://scrutiny.benoit.jp.net/api/health  # (1)!

# Test with manual collector run
sudo COLLECTOR_API_ENDPOINT=https://scrutiny.benoit.jp.net \
     COLLECTOR_HOST_ID=test.local \
     COLLECTOR_LOG_LEVEL=DEBUG \
     /usr/local/bin/scrutiny-collector run          # (2)!

# Check DNS resolution
nslookup scrutiny.benoit.jp.net                    # (3)!

Test hub API endpoint availability
Run collector with debug logging
Verify DNS resolution

Issue: Timer not executing as expected

Debug timer problems

# Check timer status
sudo systemctl status scrutiny-collector.timer    # (1)!

# List all timers
sudo systemctl list-timers --all | grep scrutiny  # (2)!

# Check service dependencies
sudo systemctl show scrutiny-collector.timer      # (3)!

# Reset timer
sudo systemctl stop scrutiny-collector.timer
sudo systemctl start scrutiny-collector.timer     # (4)!

Check timer service status
List timer states
Verify service dependencies
Reset timer execution

Related Documentation: - Infrastructure Overview - Complete self-hosting architecture - System Monitoring Guide - Additional monitoring tools and practices

self-hosting monitoring disk-health smart scrutiny systemd go-lang web-dashboard alerting