Skip to content

Scrutiny

Proactive Disk Health Monitoring

Scrutiny is a S.M.A.R.T monitoring tool for hard drives with a web dashboard. It provides proactive disk failure prediction, historical tracking, and alerting to help prevent data loss before it happens.

Why Choose Scrutiny?

  • Proactive monitoring - predict disk failures before they happen
  • Historical tracking - S.M.A.R.T metrics over time
  • Multi-host - centralized dashboard with collector agents
  • Alerting - webhooks and email notifications

Install Hub

Infrastructure as Code

This Scrutiny hub instance is deployed using OpenTofu with custom Incus images. The infrastructure configuration manages container provisioning and persistent storage.

Custom Scrutiny Image

The Scrutiny Incus image is built and maintained at forgejo.benoit.jp.net/Benoit/Laminar.

Infrastructure Configuration

The OpenTofu configuration provisions:

  • Incus instance running official Scrutiny OCI image
  • One persistent storage volume:
    • /var/backups/scrutiny - Backup storage
  • HTTP proxy device - Exposes port 8080 to host port 8097
OpenTofu Configuration
resource "incus_storage_volume" "scrutiny_var_backups_scrutiny" {
  name = "scrutiny_var_backups_scrutiny"
  pool = incus_storage_pool.default.name
}

resource "incus_instance" "scrutiny" {
  name  = "scrutiny"
  image = "laminar.incus:scrutiny-0.8.1-1benoitjpnet"

  device {
    name = "var_backups_scrutiny"
    type = "disk"
    properties = {
      path   = "/var/backups/scrutiny"
      source = incus_storage_volume.scrutiny_var_backups_scrutiny.name
      pool   = incus_storage_pool.default.name
    }
  }

  device {
    name = "http"
    type = "proxy"
    properties = {
      listen  = "tcp:192.168.1.2:8097"
      connect = "tcp:127.0.0.1:8080"
    }
  }
}

Deploy Infrastructure

Apply OpenTofu configuration
tofu apply

The Scrutiny web dashboard will be accessible at http://192.168.1.2:8097.

Upgrade Hub

Incus Image Upgrade

When upgrading to a newer Incus image version, follow these steps to migrate your data.

Backup current instance data before upgrading:

Enter Incus container
incus shell scrutiny

Backup configuration files

cp -r /opt/scrutiny/config/ /var/backups/scrutiny/

Backup InfluxDB database

cp -r /var/lib/influxdb/ /var/backups/scrutiny/

Deploy to new Incus image and restore data:

Update OpenTofu configuration

Edit your OpenTofu configuration file to update the image version:

resource "incus_instance" "scrutiny" {
  name  = "scrutiny"
  image = "laminar.incus:scrutiny-NEW-VERSION"  # Update this line
  # ...
}

Apply infrastructure changes

Provision new Incus container
tofu apply
Enter Incus container
incus shell scrutiny

Restore configuration files

cp -r /var/backups/scrutiny/config/ /opt/scrutiny/

Restore InfluxDB database

rsync -av --delete /var/backups/scrutiny/influxdb/ /var/lib/influxdb/

Restart InfluxDB and Scrutiny services

systemctl restart influxdb scrutiny-webapp

Upgrade Complete

Your Scrutiny hub is now running on the new Incus image with all monitoring data intact!

Install Collector

Collectors can be deployed using various methods. Choose the appropriate method for your environment.

For Incus host servers with direct disk access:

OpenTofu Configuration
resource "incus_instance" "scrutiny-collector" {
  name  = "scrutiny-collector"
  image = "ghcr:analogj/scrutiny:master-collector"

  config = {
    "environment.COLLECTOR_API_ENDPOINT"  = "https://scrutiny.benoit.jp.net"
    "environment.COLLECTOR_CRON_SCHEDULE" = "*/15 * * * *"
    "environment.COLLECTOR_HOST_ID"       = "incus.home.arpa"
    "environment.COLLECTOR_RUN_STARTUP"   = "True"
    "security.privileged"                 = "true"
  }

  # NVMe device passthrough
  device {
    name = "nvme0"
    type = "unix-char"
    properties = {
      source = "/dev/nvme0"
      path   = "/dev/nvme0"
      mode   = "0440"
    }
  }
}

Privileged Container

This collector runs in privileged mode to access disk devices. Only use on trusted systems.

For Linux systems (Arch Linux, Ubuntu, etc.) using systemd:

This guide sets up the Scrutiny collector agent as a native binary with systemd timer scheduling for automated S.M.A.R.T data collection.

Install the Scrutiny collector agent for automated S.M.A.R.T data collection:

Download & Install Binary

install_scrutiny_collector.sh
#!/bin/bash
# Download the latest scrutiny collector binary
LATEST_VERSION="v0.8.1"  # Check GitHub for latest version

echo "Downloading Scrutiny collector ${LATEST_VERSION}..."
sudo wget -O /usr/local/bin/scrutiny-collector \
  "https://github.com/AnalogJ/scrutiny/releases/download/${LATEST_VERSION}/scrutiny-collector-metrics-linux-amd64" # (1)!

# Make binary executable
sudo chmod +x /usr/local/bin/scrutiny-collector        # (2)!

# Verify installation
/usr/local/bin/scrutiny-collector --help >/dev/null && \
    echo "āœ“ Scrutiny collector installed successfully" || \
    echo "āœ— Installation failed"                       # (3)!
  1. Download latest collector binary for Linux AMD64
  2. Set executable permissions for the binary
  3. Verify installation by testing help command

Version Management

Stay up to date: Check the Scrutiny releases page for the latest version and update LATEST_VERSION accordingly.

Systemd Service Configuration

Create the systemd service unit for the collector:

create_systemd_service.sh
# Create systemd service file
sudo tee /etc/systemd/system/scrutiny-collector.service > /dev/null << 'EOF'
[Unit]
Description=Scrutiny Disk Metrics Collector           # (1)!
Documentation=https://github.com/AnalogJ/scrutiny
After=network.target
Wants=network.target

[Service]
Type=oneshot                                          # (2)!
User=root                                             # (3)!
Group=root
Environment=COLLECTOR_API_ENDPOINT=https://scrutiny.benoit.jp.net  # (4)!
Environment=COLLECTOR_HOST_ID=lavie.home.arpa         # (5)!
Environment=COLLECTOR_RUN_STARTUP=True                # (6)!
ExecStart=/usr/local/bin/scrutiny-collector run       # (7)!
StandardOutput=journal
StandardError=journal
TimeoutStartSec=300                                   # (8)!

# Security hardening (optional)
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ReadWritePaths=/tmp                                   # (9)!

[Install]
WantedBy=multi-user.target
EOF

echo "āœ“ Systemd service file created"
  1. Descriptive service name and documentation
  2. One-shot service type for scheduled execution
  3. Root user required for disk access
  4. API endpoint of your Scrutiny hub server
  5. Unique host identifier for this collector
  6. :material-startup: Force startup metrics collection
  7. Command to execute the collector
  8. 5-minute timeout for disk scanning operations
  9. Security hardening with minimal permissions

Systemd Timer Configuration

Create a systemd timer for scheduled S.M.A.R.T data collection:

create_systemd_timer.sh
# Create systemd timer file
sudo tee /etc/systemd/system/scrutiny-collector.timer > /dev/null << 'EOF'
[Unit]
Description=Run Scrutiny Disk Metrics Collector every 15 minutes  # (1)!
Documentation=https://github.com/AnalogJ/scrutiny
Requires=scrutiny-collector.service                              # (2)!

[Timer]
OnCalendar=*:0/15                                                # (3)!
Persistent=true                                                  # (4)!
RandomizedDelaySec=60                                           # (5)!

[Install]
WantedBy=timers.target                                          # (6)!
EOF

echo "āœ“ Systemd timer file created"
  1. Descriptive timer name and frequency
  2. Dependency on the collector service
  3. Run every 15 minutes (:00, :15, :30, :45)
  4. Execute missed runs after system boot
  5. :material-dice: Random 0-60 second delay to prevent resource conflicts
  6. Enable with system timers

Timer Schedule Options

Alternative schedules for different monitoring needs:

  • Frequent: OnCalendar=*:0/5 (every 5 minutes)
  • Standard: OnCalendar=*:0/15 (every 15 minutes)
  • Conservative: OnCalendar=hourly (every hour)
  • Minimal: OnCalendar=*-*-* 06:00:00 (daily at 6 AM)

Enable and start the Scrutiny collector timer:

activate_scrutiny_service.sh
#!/bin/bash

# Step 1: Reload systemd configuration
echo "Reloading systemd configuration..."
sudo systemctl daemon-reload                        # (1)!

# Step 2: Enable timer to start on boot
echo "Enabling scrutiny-collector timer..."
sudo systemctl enable scrutiny-collector.timer     # (2)!

# Step 3: Start the timer immediately
echo "Starting scrutiny-collector timer..."
sudo systemctl start scrutiny-collector.timer      # (3)!

# Step 4: Verify timer status
echo "Checking timer status..."
sudo systemctl status scrutiny-collector.timer --no-pager  # (4)!

# Step 5: View timer schedule
echo "Timer schedule:"
sudo systemctl list-timers scrutiny-collector.timer --no-pager  # (5)!

# Step 6: Test manual execution
echo "Testing manual service execution..."
sudo systemctl start scrutiny-collector.service    # (6)!

echo "āœ“ Scrutiny collector setup completed"
  1. Load new service and timer configurations
  2. Enable automatic startup on system boot
  3. Start timer for immediate scheduling
  4. Check current timer status and health
  5. Display next scheduled execution times
  6. Manual test run to verify functionality

Verification & Monitoring

Verify your Scrutiny collector is working correctly:

verify_scrutiny_operation.sh
#!/bin/bash

echo "=== Scrutiny Collector Status Verification ==="

# Check timer status
echo "1. Timer Status:"
sudo systemctl is-active scrutiny-collector.timer && \
    echo "āœ“ Timer is active" || \
    echo "āœ— Timer is inactive"                        # (1)!

# Check service execution history
echo -e "\n2. Service Execution History:"
sudo journalctl -u scrutiny-collector.service \
    --since "24 hours ago" --no-pager               # (2)!

# View next scheduled runs
echo -e "\n3. Upcoming Schedule:"
sudo systemctl list-timers scrutiny-collector.timer --no-pager  # (3)!

# Test manual execution
echo -e "\n4. Manual Test Run:"
sudo systemctl start scrutiny-collector.service
sleep 5
sudo journalctl -u scrutiny-collector.service -n 10 --no-pager  # (4)!

echo -e "\nāœ“ Verification completed"
  1. Verify timer is running and scheduled
  2. Review last 24 hours of collector executions
  3. Show next scheduled execution times
  4. Manual test with immediate log review

Monitor collector performance and troubleshoot issues:

scrutiny_log_monitoring.sh
# Real-time timer logs
sudo journalctl -u scrutiny-collector.timer -f      # (1)!

# Service execution logs (last 50 entries)
sudo journalctl -u scrutiny-collector.service -n 50 # (2)!

# Filter for errors only
sudo journalctl -u scrutiny-collector.service \
    --since "1 week ago" --grep "ERROR\|WARN\|FAIL" # (3)!

# Performance metrics
sudo journalctl -u scrutiny-collector.service \
    --since "today" --grep "duration\|ms\|seconds"  # (4)!
  1. Real-time monitoring of timer events
  2. Recent service execution details
  3. Filter for error and warning messages
  4. Performance timing information

Configuration Reference

Key Configuration Details

Current setup summary:

  • Timer Schedule: Every 15 minutes (OnCalendar=*:0/15)
  • Persistent Execution: Missed runs execute on system startup
  • Binary Location: /usr/local/bin/scrutiny-collector
  • Service Type: One-shot execution with systemd timer
  • Environment Variables: Configured in service unit file
  • Security: Hardened with minimal filesystem access
  • Performance Settings


    Performance optimization
    # Adjust timeout for slow disks
    TimeoutStartSec=600
    
    # Reduce system load
    Nice=19
    IOSchedulingClass=2
    IOSchedulingPriority=7
    
  • Notification Config


    Environment variables
    # Webhook notifications
    COLLECTOR_NOTIFY_WEBHOOK=https://hooks.slack.com/...
    
    # Custom metadata
    COLLECTOR_CUSTOM_TAGS=production,nvme,raid
    
  • Network Settings


    Connection configuration
    # Custom API endpoint
    COLLECTOR_API_ENDPOINT=https://scrutiny.domain.com
    
    # Timeout settings
    COLLECTOR_TIMEOUT=30
    COLLECTOR_RETRY_COUNT=3
    
  • Security Options


    Additional hardening
    # Restrict capabilities
    CapabilityBoundingSet=CAP_SYS_ADMIN
    AmbientCapabilities=CAP_SYS_ADMIN
    
    # Network isolation
    PrivateNetwork=false  # Required for API calls
    

Troubleshooting Guide

Common Issues & Solutions

Issue: Collector can't access NVMe devices

Diagnose disk access issues
# Check NVMe device availability
ls -la /dev/nvme*                               # (1)!
sudo nvme list                                  # (2)!

# Verify SMART support
sudo smartctl -i /dev/nvme0n1                   # (3)!
sudo smartctl -a /dev/nvme0n1 | head -20       # (4)!

# Check device permissions
ls -la /dev/sd* /dev/nvme*                      # (5)!
  1. List all NVMe devices
  2. Display NVMe device information
  3. Check SMART capability
  4. Display SMART attributes
  5. Verify device permissions

Issue: Cannot connect to Scrutiny hub

Test network connectivity
# Test API endpoint
curl -v https://scrutiny.benoit.jp.net/api/health  # (1)!

# Test with manual collector run
sudo COLLECTOR_API_ENDPOINT=https://scrutiny.benoit.jp.net \
     COLLECTOR_HOST_ID=test.local \
     COLLECTOR_LOG_LEVEL=DEBUG \
     /usr/local/bin/scrutiny-collector run          # (2)!

# Check DNS resolution
nslookup scrutiny.benoit.jp.net                    # (3)!
  1. Test hub API endpoint availability
  2. Run collector with debug logging
  3. Verify DNS resolution

Issue: Timer not executing as expected

Debug timer problems
# Check timer status
sudo systemctl status scrutiny-collector.timer    # (1)!

# List all timers
sudo systemctl list-timers --all | grep scrutiny  # (2)!

# Check service dependencies
sudo systemctl show scrutiny-collector.timer      # (3)!

# Reset timer
sudo systemctl stop scrutiny-collector.timer
sudo systemctl start scrutiny-collector.timer     # (4)!
  1. Check timer service status
  2. List timer states
  3. Verify service dependencies
  4. Reset timer execution

Related Documentation: - Infrastructure Overview - Complete self-hosting architecture - System Monitoring Guide - Additional monitoring tools and practices