{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title: \"Upgrading Fedora's Monitoring\"\n", "subtitle: \"A real Tech Debt story\"\n", "author: \"Greg 'gwmngilfen' Sutcliffe\"\n", "format:\n", " revealjs: \n", " slide-number: true\n", " chalkboard: \n", " buttons: false\n", " preview-links: auto\n", " logo: images/fedora.png\n", " css: styles.css\n", " footer: '[https://fedoraproject.org](https://fedoraproject.org)'\n", "resources:\n", " - demo.pdf\n", "---\n", "\n", "## Abstract\n", "\n", "::: {.incremental}\n", "\n", "- Nagios, radio-carbon-dated to about 10,000 BC\n", "- Zabbix, and why this counts as progress\n", "- Monitoring via configuration management\n", " - Dedicated roles vs application snippets\n", " - Code density (hi Jinja!)\n", "- What are we monitoring again?\n", "\n", ":::\n", "\n", "## `cat me.yaml` {.smaller}\n", "\n", ":::: {.columns}\n", "::: {.column width=\"60%\"}\n", "```{yaml}\n", "- name: Greg 'Gwmngilfen' Sutcliffe\n", "- matrix: @gwmngilfen:fedora.im\n", "- history:\n", " - Senior Sysadmin, Fedora & CentOS\n", " - Community Architect / Data Scientist, Ansible\n", " - Community Architect, Foreman\n", " - 13+ years at Red Hat\n", "- notes:\n", " - likes solving problems\n", " - dislikes taking averages\n", " - plays too many automation games^1^\n", "```\n", ":::\n", "\n", "::: {.column width=\"40%\"}\n", "{width=\"100%\"}\n", ":::\n", "::::\n", "\n", "::: footer\n", "1: You already missed our [automation games talk](https://cfp.cfgmgmtcamp.org/ghent2026/talk/CBHVKV/) - check the recording :P\n", ":::\n", "\n", "## Fedora monitoring, 2024-era\n", "\n", "We're using Nagios, deployed via a monolithic Ansible role\n", "\n", "\n", "\n", "## Fedora monitoring, 2024-era\n", "\n", "We're using Nagios, deployed via a monolithic Ansible role\n", "\n", "``` {.yaml code-line-numbers=\"2|4-13\"}\n", "- name: Copy /etc/nagios/services (RDU3 specific files)\n", " ansible.builtin.copy: src=nagios/services/rdu3_internal/{{ item }} dest=/etc/nagios/services/{{ item }}\n", " with_items:\n", " - certgetter.cfg\n", " - db_backups.cfg\n", " - disk.cfg\n", " - fedora_messaging.cfg\n", " - file_age.cfg\n", " - koji.cfg\n", " - locking.cfg\n", " - mailman.cfg\n", " - nrpe.cfg\n", " - pgsql.cfg\n", "```\n", "\n", "::: footer\n", "[roles/nagios_server/tasks/main.yml](https://pagure.io/fedora-infra/ansible)\n", ":::\n", "\n", "## Concerns & Constraints\n", "\n", "::: {.incremental}\n", "- Can only handle OK/WARN/CRIT\n", "- Checks aren't on a given schedule\n", "- Sepatarate collectd instance for trend data\n", "- Monolithic Ansible role\n", "