code::core · Technical Book

DNS Analytics
from Zero to 2M

Complete step-by-step guide to a DNS analytics platform that starts as a two-server home origin and grows into multi-node ISP-scale design. BIND9 captures every LAN query, Python ingests both queries and query-errors, PostgreSQL stores raw and rollup data, and custom dashboards turn the schema into operational intelligence without hiding the SQL underneath.

ⓘ

Foundation

This guide now reflects the actual home foundation: dns1.codeandcore.home on 192.168.238.128 for 192.168.238.0/24 and dns2.codeandcore.home on 10.10.10.250 for 10.10.10.0/24. Building on Raspberry Pi4? Use home-server.html for the full Pi OS stack.

✓

New Book

Your full ISP peering manual is now available at cdn-peering-book.html, built from your corrective blueprint as a chaptered learning book.

dns1192.168.238.128

dns210.10.10.250

OSUbuntu 24.04 LTS

DNSBIND9

DatabasePostgreSQL 16

TopologyTwo resolvers

Networks192.168.238.0/24 + 10.10.10.0/24

Role	Hostname	IP	Network served
Primary DNS + Analytics	dns1.codeandcore.home	192.168.238.128	192.168.238.0/24
Secondary DNS + Analytics	dns2.codeandcore.home	10.10.10.250	10.10.10.0/24

Overview

How Everything Connects

The platform now starts with two independent resolver-and-analytics nodes. Each server resolves queries for its local LAN, writes BIND logs locally, ingests into its own PostgreSQL database, and can later be federated into a combined view with pg_fdw or an ETL job. That separation keeps each LAN resilient even if the other server is unavailable.

Teaching Lens

This lesson teaches pipeline thinking instead of command memorization: resolve DNS, capture events, store durable records, then aggregate for decisions.

You verify resolver behavior before trusting downstream analytics.
You verify ingest transforms raw logs into structured rows.
You verify rollups convert raw volume into low-cost summaries.
You verify faults by debugging left to right across the pipeline.

LAN 1 Clients

192.168.238.x

→

dns1 / BIND9

192.168.238.128:53

→

Upstream

8.8.8.8 / 1.1.1.1

LAN 1 Resolution

LAN 2 Clients

10.10.10.x

→

dns2 / BIND9

10.10.10.250:53

→

Upstream

8.8.4.4 / 9.9.9.9

LAN 2 Resolution

queries.log + query-errors.log

per server

→

dns-ingestor.service

per server

→

PostgreSQL

dns_analytics local

Logging & Ingest

Rollups

daily / weekly / monthly

→

Custom Dashboard

HTML + SVG + JS

→

Future Combine

pg_fdw / ETL

Analytics Path

BIND9

Resolve & Log

Recursive resolver. Every query written to a structured log file.

Python

Parse & Insert

Tails the log in real-time. Bulk-inserts into PostgreSQL.

PostgreSQL

Store & Aggregate

Raw queries partitioned by day. Nightly cron builds summaries.

Outcome

Traffic Intelligence

Top domains, CDN ratios, peak hours for network design.

Chapter 01

Installing & Configuring BIND9

BIND9 acts as a recursive caching resolver for your entire LAN. Every query is logged to the file that feeds our analytics pipeline.

Teaching Lens

This lesson teaches that resolver stability is a data-quality requirement, not just a DNS setup task.

You verify ACL policy to prevent open-recursion abuse.
You verify forwarder behavior and upstream response reliability.
You verify log timestamp quality for parser correctness.
You verify in order: syntax, service state, query path, and log emission.

1.1 Install BIND9

Update package index

bash

sudo apt-get update

This lesson teaches package metadata refresh as a prerequisite for reliable installation results.

You verify there are no repository errors and the command exits successfully before continuing.

Install BIND9 and tools

bash

sudo apt-get install -y bind9 bind9utils bind9-doc dnsutils

This lesson teaches installing both the resolver service and its diagnostic toolchain in the same step.

You verify named, dig, and named-checkconf are available after installation.

Verify and enable on boot

bash

sudo systemctl status bind9 && sudo systemctl enable bind9

ExpectedActive: active (running)

1.2 Configure named.conf.options

⚠

Back up first

sudo cp /etc/bind/named.conf.options /etc/bind/named.conf.options.bak

Before you paste this block, understand its job: this is the resolver policy engine for your network. The acl "trusted" section defines who is allowed to use your resolver recursively, which prevents open-resolver abuse. The forwarders list defines where unresolved queries are sent upstream. Cache and rate-limit settings protect performance during spikes. Read this as "who may ask, where answers come from, and how safely the resolver behaves under load."

named.conf

/etc/bind/named.conf.options — delete all, paste this

// DNS Analytics Server -- 192.168.238.128
acl "trusted" {
    127.0.0.1;
    192.168.0.0/16;
    10.0.0.0/8;
    172.16.0.0/12;
};
options {
    directory "/var/cache/bind";
    listen-on { any; }; listen-on-v6 { any; };
    allow-query { trusted; }; allow-recursion { trusted; };
    forwarders { 8.8.8.8; 8.8.4.4; 1.1.1.1; 9.9.9.9; };
    forward only;
    dnssec-validation auto;
    allow-transfer { none; }; notify no; recursion yes;
    max-cache-size 256m; min-cache-ttl 60; max-cache-ttl 86400;
    rate-limit { responses-per-second 50; window 5; };
    pid-file "/run/named/named.pid";
};

This lesson teaches resolver policy design using ACL, recursion limits, forwarding strategy, and cache controls.

You verify only trusted networks can recurse and external clients are refused.

1.3 Configure Query Logging

This command block prepares the filesystem for reliable telemetry capture. BIND writes logs as the bind service account, so directory ownership is not optional. If ownership is wrong, DNS might still resolve while analytics silently fail because no query lines are written. Treat this as data-pipeline readiness, not just Linux housekeeping.

bash

sudo mkdir -p /var/log/named
sudo chown bind:bind /var/log/named && sudo chmod 755 /var/log/named

ⓘ

Why chown bind:bind

BIND9 runs as the bind user. Root-owned log dir = silent failure with zero error message.

This logging block inside /etc/bind/named.conf.local decides which DNS events become analyzable records. The key line is print-time yes, because your parser and time-based rollups depend on accurate timestamps. The category mapping intentionally sends query events to queries.log and suppresses noisy internals with null sinks. You are defining signal versus noise for every lesson that follows.

named.conf

/etc/bind/named.conf.local — append this block

logging {
    channel queries_log {
        file "/var/log/named/queries.log"
            versions 7 size 500m;
        severity dynamic;
        print-time yes; // CRITICAL: timestamp on every line
        print-severity no; print-category no;
    };
    channel named_log {
        file "/var/log/named/named.log"
            versions 4 size 50m;
        severity info; print-time yes; print-severity yes; print-category yes;
    };
    channel errors_log {
      file "/var/log/named/query-errors.log"
        versions 4 size 100m;
      severity dynamic;
      print-time yes;
      print-severity no; print-category no;
    };
    category queries       { queries_log; };
    category query-errors  { errors_log; };
    category default       { named_log; };
    category general       { named_log; };
    category config        { named_log; };
    category network       { named_log; };
    category security      { named_log; };
    category lame-servers  { null; };
    category dnssec        { null; };
    category resolver      { null; };
    category cname         { null; };
    category xfer-in       { null; };
    category xfer-out      { null; };
    category notify        { null; };
    category client        { null; };
    category unmatched     { null; };
    category dispatch      { null; };
    category edns-disabled { null; };
    category rpz           { null; };
    category rate-limit    { null; };
};

1.4 Validate, Restart, Test

This sequence is ordered as a troubleshooting decision tree. First, named-checkconf validates syntax before risking a restart. Second, restarting applies your new policy and logging rules. Third, dig tests real query flow. Last, tail verifies telemetry emission. If any step fails, you know exactly where the break occurred: config parsing, service runtime, DNS path, or logging path.

bash

sudo named-checkconf  # silence = success
sudo systemctl restart bind9
dig @192.168.238.128 google.com A
dig @192.168.238.128 youtube.com AAAA
sudo tail -20 /var/log/named/queries.log
sudo tail -20 /var/log/named/query-errors.log

This lesson teaches deterministic validation flow from syntax check to live telemetry confirmation.

You verify dig returns NOERROR and query lines appear in queries.log with timestamps.

Expected log format 11-Jan-2026 08:15:33.421 client @0xabcd 192.168.238.1#45231 (google.com): query: google.com IN A + (192.168.238.128)

✓

Chapter 1 Complete

BIND9 installed and logging every query to /var/log/named/queries.log.

Chapter 02

PostgreSQL — Installation & Schema

Teaching Lens

This lesson teaches event modeling for scale: append-heavy raw storage plus pre-aggregated reporting tables.

You verify raw tables preserve full fidelity for reprocessing.
You verify daily, weekly, and monthly rollups reduce query cost.
You verify indexes match real analytics access patterns.
You verify category mapping translates domains into business meaning.

2.1 Install

This installation block creates the database runtime used by every later lesson. The package command installs the server and common extensions bundle, enabling PostgreSQL to start at boot ensures persistence after reboots, and the version query proves the server is reachable before any schema work begins. Treat this as a platform readiness gate, not just package setup.

bash

sudo apt-get install -y postgresql postgresql-contrib
sudo systemctl enable postgresql
sudo -u postgres psql -c "SELECT version();"

This lesson teaches database runtime setup before schema creation and ingestion onboarding.

You verify version output is returned and PostgreSQL is enabled at boot.

2.2 Create Database and User

This SQL block establishes least-privilege access. You create a dedicated login role for the ingestor, then bind the analytics database ownership to that role so application writes stay scoped to the project. Avoid using the postgres superuser for daily ingestion because privilege boundaries are part of operational safety and incident containment.

sql

sudo -u postgres psql

CREATE USER dns_user WITH PASSWORD 'UseALongRandomPasswordHere'
    NOSUPERUSER NOCREATEDB NOCREATEROLE LOGIN;
CREATE DATABASE dns_analytics OWNER dns_user
    ENCODING 'UTF8' LC_COLLATE 'en_US.UTF-8'
    LC_CTYPE 'en_US.UTF-8' TEMPLATE template0;
GRANT ALL PRIVILEGES ON DATABASE dns_analytics TO dns_user;
\q

2.3 Schema

This lesson teaches connection and ownership validation before schema creation. You verify that the application role can access the target database directly before any table definitions are applied.

bash

psql -U dns_user -d dns_analytics -h localhost

This lesson teaches the full analytics data model from raw events to reporting summaries. You verify that partitioned ingestion, rollup tables, and category metadata work together as one coherent decision-support schema.

sql

full schema

-- Raw query log, partitioned by day
CREATE TABLE dns_queries (
    id BIGSERIAL, queried_at TIMESTAMPTZ NOT NULL,
    client_ip INET NOT NULL, client_port INTEGER,
    domain TEXT NOT NULL, apex_domain TEXT NOT NULL,
    qtype VARCHAR(10) NOT NULL, flags TEXT,
    response_rcode TEXT, cached BOOLEAN DEFAULT FALSE,
    inserted_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY RANGE (queried_at);

  -- Create the first child partition using today's date in the same YYYYMMDD format used by cron.
  -- Example shown for 2026-01-11. Replace both dates and the suffix with your current day.
  CREATE TABLE dns_queries_20260111 PARTITION OF dns_queries
    FOR VALUES FROM ('2026-01-11') TO ('2026-01-12');

-- Per-domain count per day
CREATE TABLE dns_daily_stats (
    stat_date DATE NOT NULL, domain TEXT NOT NULL,
    apex_domain TEXT NOT NULL, qtype VARCHAR(10) NOT NULL,
    hit_count BIGINT NOT NULL DEFAULT 0,
    unique_clients INTEGER NOT NULL DEFAULT 0,
    nxdomain_count INTEGER NOT NULL DEFAULT 0,
    servfail_count INTEGER NOT NULL DEFAULT 0,
    first_seen TIMESTAMPTZ, last_seen TIMESTAMPTZ,
    PRIMARY KEY (stat_date, domain, qtype)
);

CREATE TABLE dns_weekly_stats (
    week_start DATE NOT NULL, week_end DATE NOT NULL,
    apex_domain TEXT NOT NULL, total_queries BIGINT NOT NULL DEFAULT 0,
    unique_clients INTEGER NOT NULL DEFAULT 0,
    peak_day DATE, peak_day_count BIGINT, category TEXT,
    PRIMARY KEY (week_start, apex_domain)
);

CREATE TABLE dns_monthly_stats (
    year_month CHAR(7) NOT NULL, apex_domain TEXT NOT NULL,
    total_queries BIGINT NOT NULL DEFAULT 0,
    unique_clients INTEGER NOT NULL DEFAULT 0,
    avg_daily NUMERIC(12,2), rank_in_month INTEGER, category TEXT,
    PRIMARY KEY (year_month, apex_domain)
);

CREATE TABLE dns_hourly_heatmap (
    stat_date DATE NOT NULL,
    hour_of_day SMALLINT NOT NULL CHECK (hour_of_day BETWEEN 0 AND 23),
    apex_domain TEXT NOT NULL, query_count BIGINT NOT NULL DEFAULT 0,
    PRIMARY KEY (stat_date, hour_of_day, apex_domain)
);

CREATE TABLE domain_categories (
    apex_domain TEXT PRIMARY KEY, category TEXT NOT NULL,
    provider TEXT, cache_priority SMALLINT DEFAULT 5,
    tagged_at TIMESTAMPTZ DEFAULT now()
);

INSERT INTO domain_categories VALUES
('google.com','search','Google',3,now()),('googleapis.com','cdn','Google',2,now()),
('youtube.com','streaming','Google',1,now()),('googlevideo.com','streaming','Google',1,now()),
('netflix.com','streaming','Netflix',1,now()),('nflxvideo.net','streaming','Netflix',1,now()),
('nflximg.net','cdn','Netflix',2,now()),('akamai.net','cdn','Akamai',1,now()),
('akamaiedge.net','cdn','Akamai',1,now()),('cloudflare.com','cdn','Cloudflare',2,now()),
('facebook.com','social','Meta',4,now()),('instagram.com','social','Meta',4,now()),
('whatsapp.net','messaging','Meta',3,now()),('whatsapp.com','messaging','Meta',3,now()),
('twitter.com','social','X',4,now()),('twimg.com','cdn','X',2,now()),
('tiktok.com','social','TikTok',4,now()),('microsoft.com','work','Microsoft',5,now()),
('office.com','work','Microsoft',5,now()),('windowsupdate.com','updates','Microsoft',8,now()),
('ubuntu.com','updates','Canonical',8,now()),('amazonaws.com','cloud','AWS',5,now()),
('fastly.net','cdn','Fastly',2,now()),('cloudfront.net','cdn','AWS',2,now()),
('doubleclick.net','ads','Google',9,now()),('googlesyndication.com','ads','Google',9,now());

INSERT INTO domain_categories VALUES
('safaricom.com','telco','Safaricom',5,now()),('mtn.com','telco','MTN',5,now()),
('airtel.com','telco','Airtel',5,now()),('dstv.com','streaming','MultiChoice',1,now()),
('showmax.com','streaming','MultiChoice',1,now()),('gotv.com','streaming','MultiChoice',1,now()),
('jumia.com','ecommerce','Jumia',6,now()),('pesapal.com','fintech','Pesapal',5,now()),
('mpesa.co.ke','fintech','Safaricom',5,now()),('equity.co.ke','fintech','Equity',5,now()),
('akamaitechnologies.com','cdn','Akamai',1,now()),('ggpht.com','cdn','Google',2,now()),
('gstatic.com','cdn','Google',2,now()),('ytimg.com','cdn','Google',2,now()),
('fbcdn.net','cdn','Meta',2,now()),('cdninstagram.com','cdn','Meta',2,now()),
('telegram.org','messaging','Telegram',3,now()),('zoom.us','work','Zoom',5,now())
ON CONFLICT DO NOTHING;
COMMIT;

This lesson teaches separation of raw event capture and summary analytics for scale and clarity.

You verify tables, constraints, and seed category inserts all succeed, then run \d+ dns_queries in psql to confirm only consistently named child partitions are present.

2.4 Indexes

This indexing block encodes your expected access paths. Time-descending indexes support recent-activity views, apex-domain indexes support ranking and filtering, and trigram search enables flexible domain text matching at scale. Indexes are not optional decoration; they are what keeps reporting latency predictable as data volume grows.

sql

CREATE INDEX idx_q_time   ON dns_queries (queried_at DESC);
CREATE INDEX idx_q_apex   ON dns_queries (apex_domain, queried_at DESC);
CREATE INDEX idx_q_client ON dns_queries (client_ip, queried_at DESC);
CREATE INDEX idx_d_apex   ON dns_daily_stats (apex_domain, stat_date DESC);
CREATE INDEX idx_d_hits   ON dns_daily_stats (stat_date, hit_count DESC);
CREATE INDEX idx_w_domain ON dns_weekly_stats (apex_domain, week_start DESC);
CREATE INDEX idx_m_domain ON dns_monthly_stats (apex_domain, year_month DESC);
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_d_trgm ON dns_daily_stats USING gin (apex_domain gin_trgm_ops);
COMMIT;

✓

Chapter 2 Complete

Five-table schema built, indexed, 26 domain categories seeded.

Chapter 03

The Log Ingestor — Python Daemon

Teaching Lens

This lesson teaches ingestion as a resilient ETL loop: read, parse, normalize, and write.

You verify parser tolerance against noisy log variants.
You verify batch settings balance latency and throughput.
You verify systemd restart behavior after failures and reboots.
You verify failures in order: file access, parse rate, then database auth.

3.1 Python Setup

This setup block creates a reproducible runtime for ingestion code. System packages install Python tooling, the dedicated directory isolates operational files, the virtual environment freezes dependency scope, and psycopg2-binary provides PostgreSQL connectivity. Reproducible environments reduce drift and make debugging repeatable across hosts.

bash

sudo apt-get install -y python3 python3-pip python3-venv
sudo mkdir -p /opt/dns-ingestor && sudo chown $USER:$USER /opt/dns-ingestor
cd /opt/dns-ingestor && python3 -m venv venv
source venv/bin/activate && pip install psycopg2-binary

This lesson teaches dependency isolation through a dedicated Python virtual environment.

You verify the virtual environment works and psycopg2-binary is installed inside it.

3.2 The Script

This edit command opens the ingestor source file location used by systemd later. Keeping the script in a stable operational path means your service unit, logs, and maintenance procedures all point to one canonical implementation instead of ad-hoc copies.

bash

nano /opt/dns-ingestor/ingestor.py

This Python block implements a resilient ETL loop. It parses resolver logs into structured fields, reconnects on transient database failures, batches inserts for throughput efficiency, and continuously tails rotating log files. Read it as a production loop: input normalization, controlled writes, observability, and graceful restart behavior.

python

/opt/dns-ingestor/ingestor.py

#!/usr/bin/env python3
  import re, time, logging, os, signal, sys, threading
from datetime import datetime
import psycopg2
from psycopg2.extras import execute_values

  DB = {'dbname': os.environ['DB_NAME'], 'user': os.environ['DB_USER'],
      'password': os.environ['DB_PASSWORD'], 'host': os.environ.get('DB_HOST', '127.0.0.1'),
      'port': int(os.environ.get('DB_PORT', 5432))}
  QUERIES_LOG = '/var/log/named/queries.log'
  ERRORS_LOG = '/var/log/named/query-errors.log'
BATCH_SIZE = 100
FLUSH_INTERVAL = 5

QUERY_RE = re.compile(
    r'^(?P<ts>\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2})\.\d+'
    r'\s+client\s+(?:@\S+\s+)?(?P<ip>[^\s#]+)#(?P<port>\d+)'
    r'\s+\([^)]+\):\s+query:\s+(?P<domain>[\w.\-]+)\s+IN\s+(?P<qtype>\w+)'
    r'\s+(?P<flags>[+\-\w\s]*?)(?:\s+\([\d.]+\))?$'
)

  ERROR_RE = re.compile(
    r'^(?P<ts>\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2})\.\d+'
    r'\s+client\s+(?:@\S+\s+)?(?P<ip>[^\s#]+)#(?P<port>\d+)'
    r'.*?(?P<domain>[\w.\-]+)\s+IN\s+(?P<qtype>\w+).*?'
    r'(?P<rcode>NXDOMAIN|SERVFAIL|REFUSED|FORMERR)'
  )

def apex(d):
    p = d.rstrip('.').split('.')
    return '.'.join(p[-2:]) if len(p)>=2 else d

def parse_ts(s):
    try: return datetime.strptime(s, '%d-%b-%Y %H:%M:%S')
    except: return datetime.utcnow()

def parse_query(line):
  m = QUERY_RE.match(line.strip())
    if not m: return None
    d = m.groupdict()
    dom = d.get('domain','').lower().rstrip('.')
    if not dom: return None
    flags = d.get('flags','').strip()
    return {'queried_at':parse_ts(d['ts']),'client_ip':d.get('ip','0.0.0.0'),
            'client_port':int(d.get('port',0)),'domain':dom,'apex_domain':apex(dom),
            'qtype':d.get('qtype','A').upper(),'flags':flags,
            'response_rcode':'NOERROR','cached':'CL' in flags}

    def parse_error(line):
        m = ERROR_RE.search(line.strip())
        if not m: return None
        d = m.groupdict()
        dom = d.get('domain','').lower().rstrip('.')
        if not dom: return None
        return {'queried_at':parse_ts(d['ts']),'client_ip':d.get('ip','0.0.0.0'),
          'client_port':int(d.get('port',0)),'domain':dom,'apex_domain':apex(dom),
          'qtype':d.get('qtype','A').upper(),'flags':'',
          'response_rcode':d.get('rcode','NXDOMAIN'),'cached':False}

def connect():
    while True:
        try:
            c=psycopg2.connect(**DB); c.autocommit=False
            logging.info('DB connected'); return c
        except psycopg2.OperationalError as e:
            logging.error(f'DB: {e}'); time.sleep(10)

def flush(conn, batch):
    if not batch: return 0
    sql = ('INSERT INTO dns_queries '
           '(queried_at,client_ip,client_port,domain,apex_domain,'
           'qtype,flags,response_rcode,cached) VALUES %s ON CONFLICT DO NOTHING')
    rows = [(r['queried_at'],r['client_ip'],r['client_port'],r['domain'],
             r['apex_domain'],r['qtype'],r['flags'],r['response_rcode'],r['cached'])
            for r in batch]
    try:
        with conn.cursor() as cur: execute_values(cur,sql,rows,page_size=500)
        conn.commit(); return len(batch)
    except psycopg2.Error as e:
        logging.error(f'Insert: {e}'); conn.rollback(); return 0

    _lock = threading.Lock()
    _batch = []

    def tailer(log_file, parser_fn, label):
      inode, f = 0, None
      logging.info(f'[{label}] watching {log_file}')
    while True:
        try: cur_inode=os.stat(log_file).st_ino
        except FileNotFoundError: time.sleep(5); continue
        if cur_inode != inode:
          try: f=open(log_file,'r'); f.seek(0,2); inode=cur_inode
            except IOError: time.sleep(5); continue
        while True:
            line = f.readline()
            if not line: break
          rec = parser_fn(line)
          if rec:
            with _lock:
              _batch.append(rec)
        time.sleep(0.5)

    def flusher(conn):
      total = 0
      while True:
        time.sleep(FLUSH_INTERVAL)
        with _lock:
          if not _batch: continue
          work = _batch[:]
          _batch.clear()
        n = flush(conn, work)
        total += n
        if n:
          logging.info(f'Flushed {n} | total {total:,}')

if __name__=='__main__':
    os.makedirs('/var/log/dns-ingestor',exist_ok=True)
    logging.basicConfig(level=logging.INFO,
        format='%(asctime)s [%(levelname)s] %(message)s',
        handlers=[logging.StreamHandler(sys.stdout),
                  logging.FileHandler('/var/log/dns-ingestor/ingestor.log')])
    def bye(s,f): sys.exit(0)
    signal.signal(signal.SIGTERM,bye); signal.signal(signal.SIGINT,bye)
            conn = connect()
            t1 = threading.Thread(target=tailer, args=(QUERIES_LOG, parse_query, 'QUERY'), daemon=True)
            t2 = threading.Thread(target=tailer, args=(ERRORS_LOG, parse_error, 'ERROR'), daemon=True)
            t1.start(); t2.start()
            flusher(conn)

This lesson teaches resilient tail-and-batch ingestion with reconnect behavior for transient failures.

You verify repeated flush logs and increasing database row counts during query activity.

3.3 Systemd Service

This unit file turns the script into a managed service. Dependency ordering waits for network and database availability, restart policy recovers from failures, and fixed working/executable paths make startup deterministic. Systemd is the reliability contract that keeps ingestion alive without manual supervision.

ⓘ

Least Privilege

The ingestor only needs read access to the BIND logs and a localhost PostgreSQL connection. Run it as a dedicated system user and load the database password from an environment file instead of hardcoding secrets into the script.

ini

/etc/systemd/system/dns-ingestor.service

[Unit]
Description=DNS Query Log Ingestor -- code::core
After=network.target postgresql.service bind9.service
Requires=postgresql.service

[Service]
Type=simple
User=dns-ingestor
Group=dns-ingestor
EnvironmentFile=/etc/dns-ingestor/db.env
WorkingDirectory=/opt/dns-ingestor
ExecStart=/opt/dns-ingestor/venv/bin/python3 /opt/dns-ingestor/ingestor.py
Restart=always
RestartSec=10s
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

bash

service user + credentials

sudo useradd --system --no-create-home --shell /usr/sbin/nologin dns-ingestor
sudo mkdir -p /etc/dns-ingestor
sudo nano /etc/dns-ingestor/db.env
# DB_NAME=dns_analytics
# DB_USER=dns_user
# DB_PASSWORD=UseTheSameLongRandomPassword
# DB_HOST=127.0.0.1
# DB_PORT=5432
sudo chmod 600 /etc/dns-ingestor/db.env
sudo chown dns-ingestor:dns-ingestor /etc/dns-ingestor/db.env
sudo apt-get install -y acl
sudo setfacl -m u:dns-ingestor:rx /var/log/named
sudo setfacl -m u:dns-ingestor:r /var/log/named/queries.log
sudo setfacl -m u:dns-ingestor:r /var/log/named/query-errors.log

These control commands register the new unit, start it immediately, and open a live log stream for first-run validation. Running them in order ensures the service definition is reloaded before activation, which prevents stale unit metadata from hiding configuration mistakes.

bash

sudo systemctl daemon-reload
sudo systemctl enable --now dns-ingestor
sudo journalctl -u dns-ingestor -f

This lesson teaches service activation workflow from unit reload to live log observation.

You verify dns-ingestor is active, not entering a crash loop, and ps aux | grep ingestor shows the dedicated dns-ingestor user instead of root.

3.4 End-to-End Test

This test block deliberately generates DNS traffic, waits for the batch flush window, then inspects the newest stored rows. It validates that resolver output, parser logic, and database writes are functioning as one pipeline. When this check passes, your telemetry path is proven, not assumed.

bash

for d in google.com youtube.com netflix.com instagram.com microsoft.com; do
    dig @192.168.238.128 $d A +short
done
sleep 6
psql -U dns_user -d dns_analytics -h localhost \
  -c "SELECT domain,qtype,queried_at FROM dns_queries ORDER BY queried_at DESC LIMIT 10;"

Expected google.com | A | 2026-01-11 08:20:01+00 (10 rows)

✓

Pipeline is LIVE

BIND9 → Python → PostgreSQL. Every LAN DNS query is now captured and stored.

Chapter 04

Rollup Jobs — Daily, Weekly & Monthly

All scripts use ON CONFLICT DO UPDATE — idempotent, safe to rerun at any time.

Teaching Lens

This lesson teaches rollups as a performance strategy: pre-compute stable summaries instead of repeatedly scanning raw volume.

You verify daily rollups support operational reporting.
You verify weekly rollups support trend and planning analysis.
You verify monthly rollups support forecasting and executive views.
You verify idempotent SQL enables safe rerun-based recovery.

This command creates a dedicated directory for rollup SQL artifacts. Keeping scheduled SQL in a single controlled location improves auditability and makes cron references explicit and maintainable.

bash

sudo mkdir -p /opt/dns-ingestor/sql

4.1 Daily Rollup

This daily rollup computes yesterday's operational summary and hourly distribution from raw events. The time-zone conversion anchors reporting to local business time, and conflict updates make reruns safe by replacing prior values for the same key window. This is your primary daily telemetry compression stage.

sql

/opt/dns-ingestor/sql/rollup_daily.sql

-- Run at 01:00 every morning
INSERT INTO dns_daily_stats
    (stat_date,domain,apex_domain,qtype,hit_count,unique_clients,
     nxdomain_count,servfail_count,first_seen,last_seen)
SELECT
    DATE(queried_at AT TIME ZONE 'Africa/Kampala') AS stat_date,
    domain, apex_domain, qtype,
    COUNT(*), COUNT(DISTINCT client_ip),
    COUNT(*) FILTER (WHERE response_rcode='NXDOMAIN'),
    COUNT(*) FILTER (WHERE response_rcode='SERVFAIL'),
    MIN(queried_at), MAX(queried_at)
FROM dns_queries
WHERE queried_at >= (CURRENT_DATE-INTERVAL '1 day') AT TIME ZONE 'Africa/Kampala'
  AND queried_at  <  CURRENT_DATE AT TIME ZONE 'Africa/Kampala'
GROUP BY 1,domain,apex_domain,qtype
ON CONFLICT (stat_date,domain,qtype) DO UPDATE SET
    hit_count=EXCLUDED.hit_count, unique_clients=EXCLUDED.unique_clients,
    nxdomain_count=EXCLUDED.nxdomain_count, servfail_count=EXCLUDED.servfail_count,
    first_seen=EXCLUDED.first_seen, last_seen=EXCLUDED.last_seen;

INSERT INTO dns_hourly_heatmap (stat_date,hour_of_day,apex_domain,query_count)
SELECT DATE(queried_at AT TIME ZONE 'Africa/Kampala'),
    EXTRACT(HOUR FROM queried_at AT TIME ZONE 'Africa/Kampala')::SMALLINT,
    apex_domain, COUNT(*)
FROM dns_queries
WHERE queried_at >= (CURRENT_DATE-INTERVAL '1 day') AT TIME ZONE 'Africa/Kampala'
  AND queried_at  <  CURRENT_DATE AT TIME ZONE 'Africa/Kampala'
GROUP BY 1,2,3
ON CONFLICT (stat_date,hour_of_day,apex_domain) DO UPDATE SET query_count=EXCLUDED.query_count;
COMMIT;

This lesson teaches idempotent daily rollups for stable metrics and hourly demand summaries.

You verify rerunning the job updates the same window without duplicate counts.

4.2 Weekly Rollup

This weekly rollup aggregates daily records into planning-level metrics. It computes total demand, peak day, and optional category context per apex domain, which is useful for capacity and peering decisions. The CTE structure separates total calculations from peak detection so each step remains understandable and testable.

sql

/opt/dns-ingestor/sql/rollup_weekly.sql

-- Run Monday 02:00
INSERT INTO dns_weekly_stats
    (week_start,week_end,apex_domain,total_queries,unique_clients,peak_day,peak_day_count,category)
WITH totals AS (
    SELECT DATE_TRUNC('week',stat_date)::DATE AS week_start,
           (DATE_TRUNC('week',stat_date)+INTERVAL '6 days')::DATE AS week_end,
           apex_domain, SUM(hit_count) AS total_queries, MAX(unique_clients) AS unique_clients
    FROM dns_daily_stats
    WHERE stat_date >= DATE_TRUNC('week',CURRENT_DATE-INTERVAL '7 days')::DATE
      AND stat_date  <  DATE_TRUNC('week',CURRENT_DATE)::DATE
    GROUP BY 1,2,3
), peaks AS (
    SELECT DATE_TRUNC('week',stat_date)::DATE AS week_start, apex_domain,
           stat_date AS peak_day, SUM(hit_count) AS day_total,
           RANK() OVER (PARTITION BY DATE_TRUNC('week',stat_date),apex_domain
                        ORDER BY SUM(hit_count) DESC) AS rnk
    FROM dns_daily_stats
    WHERE stat_date >= DATE_TRUNC('week',CURRENT_DATE-INTERVAL '7 days')::DATE
      AND stat_date  <  DATE_TRUNC('week',CURRENT_DATE)::DATE
    GROUP BY 1,2,3
)
SELECT t.week_start,t.week_end,t.apex_domain,t.total_queries,t.unique_clients,
       p.peak_day,p.day_total,dc.category
FROM totals t
LEFT JOIN peaks p ON p.week_start=t.week_start AND p.apex_domain=t.apex_domain AND p.rnk=1
LEFT JOIN domain_categories dc ON dc.apex_domain=t.apex_domain
ON CONFLICT (week_start,apex_domain) DO UPDATE SET
    total_queries=EXCLUDED.total_queries,unique_clients=EXCLUDED.unique_clients,
    peak_day=EXCLUDED.peak_day,peak_day_count=EXCLUDED.peak_day_count;
COMMIT;

4.3 Monthly Rollup

This monthly rollup produces executive-scale trends with totals, average daily volume, and rank ordering. Because it reads from daily summaries instead of raw logs, it stays efficient while preserving consistent semantics. Ranking within month helps highlight dominant domains and shifting traffic priorities over time.

sql

/opt/dns-ingestor/sql/rollup_monthly.sql

-- Run 1st of month 03:00
INSERT INTO dns_monthly_stats (year_month,apex_domain,total_queries,unique_clients,avg_daily,rank_in_month,category)
WITH m AS (
    SELECT TO_CHAR(stat_date,'YYYY-MM') AS year_month, apex_domain,
           SUM(hit_count) AS total_queries, MAX(unique_clients) AS unique_clients,
           ROUND(AVG(hit_count),2) AS avg_daily
    FROM dns_daily_stats
    WHERE TO_CHAR(stat_date,'YYYY-MM')=TO_CHAR(CURRENT_DATE-INTERVAL '1 month','YYYY-MM')
    GROUP BY 1,2
)
SELECT m.year_month,m.apex_domain,m.total_queries,m.unique_clients,m.avg_daily,
       RANK() OVER (PARTITION BY m.year_month ORDER BY m.total_queries DESC), dc.category
FROM m LEFT JOIN domain_categories dc ON dc.apex_domain=m.apex_domain
ON CONFLICT (year_month,apex_domain) DO UPDATE SET
    total_queries=EXCLUDED.total_queries,avg_daily=EXCLUDED.avg_daily,rank_in_month=EXCLUDED.rank_in_month;
COMMIT;

4.4 Cron Schedule

This cron block operationalizes the rollup lifecycle. It schedules daily, weekly, and monthly jobs with log capture for post-run diagnostics, and pre-creates tomorrow's partition to avoid insert failures at day boundaries. Scheduling converts analytics from manual scripts into dependable operations.

cron

sudo crontab -e

# DNS Analytics Rollups
0 1 * * *  psql -U dns_user -d dns_analytics -h localhost -f /opt/dns-ingestor/sql/rollup_daily.sql >> /var/log/dns-ingestor/daily.log 2>&1
0 2 * * 1  psql -U dns_user -d dns_analytics -h localhost -f /opt/dns-ingestor/sql/rollup_weekly.sql >> /var/log/dns-ingestor/weekly.log 2>&1
0 3 1 * *  psql -U dns_user -d dns_analytics -h localhost -f /opt/dns-ingestor/sql/rollup_monthly.sql >> /var/log/dns-ingestor/monthly.log 2>&1
# Create tomorrow's partition at 23:30
30 23 * * * psql -U dns_user -d dns_analytics -h localhost -c "CREATE TABLE IF NOT EXISTS dns_queries_$(date -d tomorrow +\%Y\%m\%d) PARTITION OF dns_queries FOR VALUES FROM ('$(date -d tomorrow +\%F)') TO ('$(date -d '+2 days' +\%F)');" >> /var/log/dns-ingestor/partition.log 2>&1

Chapter 05

SQL Analytics — Reading Your Traffic

Teaching Lens

This lesson teaches SQL as decision logic for peering, caching, abuse detection, and customer experience.

You verify top domains to identify concentration risk.
You verify category mix to understand bandwidth destinations.
You verify hourly heatmaps to plan cache warm-up windows.
You verify NXDOMAIN rates to detect misconfigurations or bot noise.

5.1 Top 50 Domains (7-Day)

This query measures concentration: which apex domains consume most resolver capacity over the last week. Percentage and average-per-day columns turn raw counts into comparative signals you can use for cache policy, upstream negotiation, and anomaly triage.

sql

SELECT ds.apex_domain, dc.category, dc.provider, dc.cache_priority,
    SUM(ds.hit_count) AS total_queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(),2) AS pct,
    ROUND(AVG(ds.hit_count),0) AS avg_per_day
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY ds.apex_domain,dc.category,dc.provider,dc.cache_priority
ORDER BY total_queries DESC LIMIT 50;

This lesson teaches concentration analysis to identify domains that dominate resolver workload.

You verify percentages are coherent and leaders match expected user behavior.

5.2 By Category

This category query collapses domain-level noise into service classes such as streaming, social, and work. It helps learners see how traffic mix reflects subscriber behavior and where optimization effort should be focused at a service-class level rather than individual hostnames.

sql

SELECT COALESCE(dc.category,'uncategorised') AS category,
    COUNT(DISTINCT ds.apex_domain) AS domains, SUM(ds.hit_count) AS total_queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(),2) AS pct
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY 1 ORDER BY total_queries DESC;

5.3 24-Hour Heatmap

This heatmap query reveals diurnal demand patterns by hour. The generated bar column is a quick visual proxy for load intensity, useful for identifying cache warm-up windows, maintenance windows, and periods where resolver stress is highest.

sql

SELECT LPAD(h.hour_of_day::TEXT,2,'0')||':00' AS hour,
    SUM(h.query_count) AS queries,
    REPEAT('█',(SUM(h.query_count)/NULLIF(MAX(SUM(h.query_count)) OVER()/40,0))::INT) AS bar
FROM dns_hourly_heatmap h
WHERE h.stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY h.hour_of_day ORDER BY h.hour_of_day;

5.4 NXDOMAIN Analysis

This failure-focused query identifies domains with significant NXDOMAIN rates, a common signal for typos, stale configs, blocked telemetry, or bot behavior. Ranking by failure percentage helps prioritize remediation where user impact or wasted resolver effort is highest.

ⓘ

Accuracy Requirement

This chapter is only meaningful if query-errors is logged separately and the ingestor tails both queries.log and query-errors.log. If you skip that fix, nxdomain_count and servfail_count will remain zero and the report will be silently wrong.

sql

SELECT apex_domain, SUM(hit_count) AS total, SUM(nxdomain_count) AS failures,
    ROUND(SUM(nxdomain_count)*100.0/NULLIF(SUM(hit_count),0),1) AS fail_pct
FROM dns_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '7 days' AND nxdomain_count > 0
GROUP BY apex_domain HAVING SUM(nxdomain_count)>10
ORDER BY fail_pct DESC LIMIT 30;

⚠

Alerting Hook

Turn these analytics into operations by adding a small cron-driven Python check that alerts when NXDOMAIN rate breaches your threshold or when the ingestor has inserted zero rows in the last 10 minutes. Passive reporting is useful, but threshold-based alerting is what makes the platform operational.

Chapter 06

Scaling to 2 Million Subscribers

2M subs = 600M queries/day = 7,000 qps sustained. Text log files break at this scale. The data model stays identical; the transport layer changes.

Teaching Lens

This lesson teaches scale migration without changing metric semantics, so reports stay comparable from lab to production.

You verify transport migration from text logs to binary event streams.
You verify ingest migration from single process to distributed workers.
You verify database migration to time-series optimization patterns.
You verify privacy controls before volume growth increases risk.

6.1 Production Architecture

Anycast VIP

resolver.isp.net

→

8x Unbound

DNS cluster

→

dnstap

binary protobuf

Resolver Layer

Kafka

dns-queries topic

→

Faust Workers

x8 processors

→

TimescaleDB

hypertables

Ingest Layer

Layer	VM (this guide)	Production (2M subs)
DNS daemon	BIND9 text log	Unbound x8 + dnstap
Transport	File on disk	Kafka 3-broker cluster
Ingestor	1 Python process	Faust workers x8
Database	PostgreSQL 16	TimescaleDB on NVMe RAID
Volume	~50K/day	600M/day (7K qps)

6.2 TimescaleDB

This command block installs time-series database capabilities for high-volume retention and query efficiency. Tuning adjusts PostgreSQL settings for Timescale workloads, and restarting applies those changes. This is the foundation for scaling from lab-sized daily volumes to sustained production ingestion.

bash

sudo apt-get install -y timescaledb-2-postgresql-16
sudo timescaledb-tune --quiet --yes && sudo systemctl restart postgresql

This SQL block enables hypertable behavior, compression, and continuous aggregates while keeping your logical schema intact. It changes storage and refresh mechanics, not metric meaning, so dashboards and lessons remain comparable before and after scale transition.

sql

CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT create_hypertable('dns_queries','queried_at',
    chunk_time_interval=>INTERVAL '1 day',if_not_exists=>TRUE);
ALTER TABLE dns_queries SET (timescaledb.compress,
    timescaledb.compress_orderby='queried_at DESC',
    timescaledb.compress_segmentby='apex_domain');
SELECT add_compression_policy('dns_queries',INTERVAL '7 days');
CREATE MATERIALIZED VIEW dns_daily_live WITH (timescaledb.continuous) AS
SELECT time_bucket('1 day',queried_at) AS bucket, apex_domain, qtype,
       COUNT(*) AS hit_count, COUNT(DISTINCT client_ip) AS unique_clients
FROM dns_queries GROUP BY bucket,apex_domain,qtype;
SELECT add_continuous_aggregate_policy('dns_daily_live',
    start_offset=>INTERVAL '3 days',end_offset=>INTERVAL '1 hour',
    schedule_interval=>INTERVAL '1 hour');

6.3 dnstap

This configuration switches resolver event export from text logs to structured binary messages. At large scale, dnstap reduces parsing overhead and improves event fidelity for downstream stream processors. Enabling both query and response messages preserves context needed for richer analytics and troubleshooting.

ini

unbound.conf additions

dnstap:
    dnstap-enable: yes
    dnstap-socket-path: "/var/run/unbound/dnstap.sock"
    dnstap-log-client-query-messages: yes
    dnstap-log-client-response-messages: yes

⚠

Privacy at ISP scale

Store only the /24 subnet, not the full /32 client IP. Aggregated domain counts contain no PII.

Chapter 07

MikroTik Home Network Lesson (10.10.10.0/24)

This lesson adapts the platform to a MikroTik home lab where the router IP is 10.10.10.10, the resolver is 10.10.10.250, and the local DNS namespace is codeandcore.home.

Teaching Lens

This lesson teaches hybrid DNS operation: authoritative local naming plus recursive analytics in one pipeline.

You verify MikroTik distributes DNS and search domain via DHCP.
You verify BIND serves codeandcore.home authoritatively.
You verify zone records are mirrored into PostgreSQL for governance.
You verify local queries are analyzed alongside internet traffic.

7.1 Configure MikroTik DHCP + DNS Domain

This RouterOS block is the client-distribution layer of your naming system. DHCP options push the actual resolver address and search domain to every device, so clients learn to ask 10.10.10.250 and append codeandcore.home automatically. Without this, your local zone can exist correctly on BIND but appear "broken" to users because clients never query it correctly.

routeros

MikroTik terminal

/ip dhcp-server network set [find address="10.10.10.0/24"] \
    gateway=10.10.10.10 dns-server=10.10.10.250 domain=codeandcore.home

/ip dns set allow-remote-requests=no servers=10.10.10.250

/ip dns static add name=router.codeandcore.home address=10.10.10.10 ttl=1d

This lesson teaches client-side DNS distribution through DHCP domain and resolver settings.

You verify renewed clients receive DNS 10.10.10.250 and suffix codeandcore.home.

7.2 Create Authoritative Zone: codeandcore.home

This block in /etc/bind/named.conf.local is the zone declaration, which tells BIND "I am authoritative for this namespace." type master means this server is the source of truth, and the file path points to the zone database containing records. allow-update { none; } disables dynamic updates so records only change through deliberate admin edits.

named.conf

/etc/bind/named.conf.local

zone "codeandcore.home" {
    type master;
    file "/etc/bind/db.codeandcore.home";
    allow-update { none; };
};

This zone file block is the authoritative DNS database itself. The SOA record sets lifecycle rules for secondaries and cache behavior, while NS and A records map stable names to device IPs in your 10.10.10.0/24 lab. Think of this as your local naming contract: every hostname your users depend on should be explicitly represented here.

dns-zone

/etc/bind/db.codeandcore.home

$TTL 86400
@   IN  SOA ns1.codeandcore.home. admin.codeandcore.home. (
        2026051201 ; serial
        3600       ; refresh
        1800       ; retry
        1209600    ; expire
        86400 )    ; minimum

@       IN  NS  ns1.codeandcore.home.
ns1     IN  A   10.10.10.250
router  IN  A   10.10.10.10
dns     IN  A   10.10.10.250
nas     IN  A   10.10.10.20
cam1    IN  A   10.10.10.30
printer IN  A   10.10.10.40

This lesson teaches authoritative local namespace design with deterministic host mappings.

You verify local records resolve with NOERROR responses from 10.10.10.250.

7.3 Store Zone Records in PostgreSQL

This SQL block mirrors DNS inventory into the analytics database so operations can audit and report on intended state versus observed query behavior. The unique constraint prevents duplicate logical records, and indexes support fast lookups by zone and record type. This turns static DNS files into queryable governance data for lessons and reviews.

sql

dns_analytics schema additions

CREATE TABLE IF NOT EXISTS dns_zone_records (
    id BIGSERIAL PRIMARY KEY,
    zone_name TEXT NOT NULL,
    fqdn TEXT NOT NULL,
    record_type VARCHAR(10) NOT NULL,
    record_value TEXT NOT NULL,
    ttl INTEGER NOT NULL DEFAULT 86400,
    source TEXT NOT NULL DEFAULT 'bind',
    active BOOLEAN NOT NULL DEFAULT TRUE,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (zone_name, fqdn, record_type, record_value)
);

CREATE INDEX IF NOT EXISTS idx_zone_fqdn ON dns_zone_records (zone_name, fqdn);
CREATE INDEX IF NOT EXISTS idx_zone_type ON dns_zone_records (zone_name, record_type);

INSERT INTO dns_zone_records (zone_name, fqdn, record_type, record_value, ttl) VALUES
('codeandcore.home','ns1.codeandcore.home','A','10.10.10.250',86400),
('codeandcore.home','router.codeandcore.home','A','10.10.10.10',86400),
('codeandcore.home','dns.codeandcore.home','A','10.10.10.250',86400),
('codeandcore.home','nas.codeandcore.home','A','10.10.10.20',86400),
('codeandcore.home','cam1.codeandcore.home','A','10.10.10.30',86400),
('codeandcore.home','printer.codeandcore.home','A','10.10.10.40',86400)
ON CONFLICT DO NOTHING;
COMMIT;

This lesson teaches treating DNS inventory as queryable governance data.

You verify seeded zone records are returned from dns_zone_records for codeandcore.home.

7.4 Analytics for Local Zone + Full Traffic

These queries teach two different analytical questions. The first measures demand for local names inside codeandcore.home, which helps validate adoption of your local namespace. The second compares local traffic against internet-bound lookups, giving you an immediate ratio of internal service usage versus external dependency.

sql

local zone insights

-- Local zone traffic (last 7 days)
SELECT domain, qtype, COUNT(*) AS hits, COUNT(DISTINCT client_ip) AS unique_clients
FROM dns_queries
WHERE domain LIKE '%.codeandcore.home'
  AND queried_at >= NOW() - INTERVAL '7 days'
GROUP BY domain, qtype
ORDER BY hits DESC;

-- Compare local-zone vs internet queries
SELECT
  CASE WHEN domain LIKE '%.codeandcore.home' THEN 'local_zone' ELSE 'internet' END AS traffic_class,
  COUNT(*) AS total_queries,
  ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct
FROM dns_queries
WHERE queried_at >= NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY total_queries DESC;

This lesson teaches local-versus-internet query mix analysis for the home lab zone.

You verify local hosts such as nas.codeandcore.home appear in recent query outputs.

✓

Chapter 7 Complete

MikroTik clients use 10.10.10.250 as resolver, local zone codeandcore.home resolves authoritatively, and zone records plus analytics are stored in PostgreSQL.

Chapter 08

Custom Dashboards — Your SQL, Your SVG, No Grafana

This chapter turns the analytics schema into a purpose-built HTML dashboard using your existing design system, raw SQL, and custom SVG or canvas rendering. The goal is not to hide the database behind a tool, but to teach that every visual component is just a readable projection of a query you already understand.

✓

Views Now Built

The live operations view is available at dashboard.html and the printable executive view is available at report.html. These pages are now part of the workspace, not just written as chapter notes.

Teaching Lens

This lesson teaches dashboard design as SQL literacy plus visual encoding, not drag-and-drop tooling.

You verify every chart maps to a concrete SQL query you can run by hand.
You verify colors and labels stay consistent across daily, weekly, and executive views.
You verify live diagnostics refresh safely without reloading the whole page.
You verify the dashboard remains explainable because you own the rendering layer.

8.1 Daily Operations Dashboard

Visualization	SQL source	What it teaches
Live Query Rate Gauge	dns_queries + dns_hourly_heatmap	Current load versus recent hourly baseline
24-Hour Heatmap Strip	dns_hourly_heatmap	When traffic peaks and whether patterns repeat
Top 10 Domains Bar View	dns_daily_stats + domain_categories	What dominates resolver workload today
Traffic Category Donut	dns_daily_stats + domain_categories	How subscriber behavior groups into service classes
Unique Clients Per Hour	dns_queries	Whether spikes come from many users or a few heavy hitters

ⓘ

Design Rule

Keep the codeandcore palette stable across every chart. Gold should mean primary emphasis, navy should define structure, and category colors should not change between daily and monthly views.

8.2 Weekly and Executive Views

Visualization	SQL source	Decision value
7-Day Volume Trend	dns_daily_stats	Shows week-over-week demand direction
Domain League Table	dns_weekly_stats	Reveals new leaders and rank movement
NXDOMAIN Timeline	dns_daily_stats	Tracks failure rate against threshold
Monthly Volume Bars	dns_monthly_stats	Tells the growth story for planning and budget
Category Mix Evolution	dns_monthly_stats + domain_categories	Shows whether traffic is shifting toward streaming, work, or social
Provider Traffic Table	dns_monthly_stats + domain_categories	Supports caching and peering decisions

8.3 NOC Diagnostics Build Order

Build these views in phases so you get value early and keep the implementation understandable.

build-plan

dashboard roadmap

Phase 1: Top domains, category donut, hourly heatmap
Phase 2: Weekly trend, domain league table, NXDOMAIN timeline
Phase 3: Live gauge, unique-client area, NXDOMAIN live leaderboard
Phase 4: Monthly bars, category mix evolution, provider table
Phase 5: Query-type small multiples and RPZ block-rate diagnostics

✓

Outcome

Your dashboard is now an extension of the guide itself: same SQL, same design system, and no dependency on external charting products.

Chapter 09

RPZ — Turn Passive Analytics into Blocking Policy

Response Policy Zones close the loop between observation and enforcement. Domains that your analytics classify as ads, abuse, or unwanted noise can be exported into an RPZ zone so BIND starts returning NXDOMAIN intentionally. This chapter makes the pipeline feed itself: analytics informs policy, and policy outcomes are measured again in the logs.

Teaching Lens

This lesson teaches how DNS analytics becomes operational policy instead of remaining passive reporting.

You verify policy inputs come from your own database, not a disconnected list.
You verify BIND loads the RPZ zone deterministically.
You verify blocked domains return NXDOMAIN as expected.
You verify RPZ hits remain observable through query-errors logging.

9.1 Build the RPZ zone from PostgreSQL

Start with categories that you already trust. In this guide, cache_priority = 9 marks ad-heavy domains that are good first candidates for local blocking policy.

sql

domains to block

SELECT apex_domain
FROM domain_categories
WHERE cache_priority = 9
ORDER BY apex_domain;

named.conf + zone

/etc/bind/db.rpz.local

response-policy {
    zone "rpz.local";
};

zone "rpz.local" {
    type master;
    file "/etc/bind/db.rpz.local";
};

$TTL 60
@ IN SOA localhost. admin.codeandcore.home. (2026051201 60 60 86400 60)
@ IN NS localhost.
doubleclick.net.        CNAME .
googlesyndication.com.  CNAME .

9.2 Regenerate RPZ automatically

Automate the zone rebuild from PostgreSQL so policy follows your classifications instead of drifting over time.

bash

/opt/dns-ingestor/scripts/build_rpz.sh

#!/usr/bin/env bash
set -euo pipefail
tmp=$(mktemp)
echo '$TTL 60' > "$tmp"
echo '@ IN SOA localhost. admin.codeandcore.home. ('$(date +%Y%m%d%H)' 60 60 86400 60)' >> "$tmp"
echo '@ IN NS localhost.' >> "$tmp"
psql -U dns_user -d dns_analytics -h localhost -At -c "SELECT apex_domain FROM domain_categories WHERE cache_priority = 9 ORDER BY apex_domain" | while read -r d; do echo "$d. CNAME ." >> "$tmp"; done
sudo mv "$tmp" /etc/bind/db.rpz.local
sudo named-checkzone rpz.local /etc/bind/db.rpz.local
sudo rndc reload

cron

policy refresh

15 1 * * * /opt/dns-ingestor/scripts/build_rpz.sh >> /var/log/dns-ingestor/rpz.log 2>&1

You verify blocked domains return NXDOMAIN and related events appear in query-errors.log so RPZ hit rate can be measured over time.

Appendix A

Firewall (UFW)

Teaching Lens

This lesson teaches security-by-segmentation: expose only what each service requires.

You verify DNS is reachable only from trusted private ranges.
You verify PostgreSQL remains local unless intentionally tunneled.
You verify default-deny inbound posture is preserved.
You verify firewall policy matches resolver and analytics design.

This firewall rule set enforces minimum-exposure networking. It allows DNS only from private address spaces used by your clients, keeps PostgreSQL loopback-only, and denies unsolicited inbound traffic by default. The goal is a resolver that serves your network without becoming an internet-facing attack surface.

bash

sudo ufw enable && sudo ufw allow ssh
sudo ufw allow from 192.168.0.0/16 to any port 53 proto udp
sudo ufw allow from 192.168.0.0/16 to any port 53 proto tcp
sudo ufw allow from 10.0.0.0/8 to any port 53 proto udp
sudo ufw allow from 10.0.0.0/8 to any port 53 proto tcp
sudo ufw allow from 172.16.0.0/12 to any port 53 proto udp
sudo ufw allow from 172.16.0.0/12 to any port 53 proto tcp
sudo ufw allow from 127.0.0.1 to any port 5432 proto tcp
sudo ufw default deny incoming && sudo ufw default allow outgoing

Appendix B

Troubleshooting

Teaching Lens

This lesson teaches path-based troubleshooting from resolver output to final reports.

You verify resolver output before checking ingestion.
You verify parser and service health before database assumptions.
You verify database writes before rollup and report checks.
You verify each layer in order to reduce diagnosis time.

Symptom	Check	Command
BIND9 won't start	Config syntax	`sudo named-checkconf`
No queries.log	Log dir ownership	`ls -la /var/log/named/`
dig returns REFUSED	Client not in ACL	`sudo journalctl -u bind9 -n 20`
Ingestor not inserting	Wrong DB password	`sudo journalctl -u dns-ingestor -n 40`
Partition error	Partition absent	Create partition manually
psql auth fail	pg_hba.conf peer	Add md5 line below
Empty daily_stats	Rollup not run yet	Run rollup_daily.sql manually

This repair snippet addresses a common local-auth mismatch where PostgreSQL expects peer authentication instead of password-based login for your application role. Adding the loopback md5 rule and reloading PostgreSQL aligns server auth with your ingestor connection method without a full database restart.

bash

fix pg_hba.conf if psql auth fails

sudo nano /etc/postgresql/16/main/pg_hba.conf
# Add ABOVE existing local lines:
# host  dns_analytics  dns_user  127.0.0.1/32  md5
sudo systemctl reload postgresql

✓

Done

BIND9 capturing. Python ingesting. PostgreSQL storing. Cron aggregating. Run the Chapter 5 queries after 7 days and you have everything to design caching, CDN peering, and BGP strategy.

Appendix C

Partition Retention & Cleanup

At home-lab scale, raw DNS query partitions stay small enough to inspect directly. At multi-month scale, they become operational baggage unless you define a retention policy. Keep rollup tables long-term, and treat raw partitions as short-lived evidence that you preserve only as long as your troubleshooting needs justify.

Retention window	Use case	Disk estimate
30 days raw	Home lab	~150 MB
90 days raw	Small ISP	~450 MB
7 days raw + permanent rollups	Production	Minimal

sql

/opt/dns-ingestor/sql/cleanup_partitions.sql

DO $$
DECLARE
    tbl TEXT;
BEGIN
    FOR tbl IN
        SELECT tablename FROM pg_tables
        WHERE tablename LIKE 'dns_queries_2%'
          AND TO_DATE(SUBSTRING(tablename FROM 13), 'YYYYMMDD')
              < CURRENT_DATE - INTERVAL '30 days'
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || tbl;
        RAISE NOTICE 'Dropped %', tbl;
    END LOOP;
END $$;

cron

weekly cleanup

0 4 * * 0  psql -U dns_user -d dns_analytics -h localhost -f /opt/dns-ingestor/sql/cleanup_partitions.sql >> /var/log/dns-ingestor/cleanup.log 2>&1

Appendix D

Backup & Restore

This guide creates analytics data with real operational value, so recovery must be documented. The practical strategy is to back up the rollup tables and metadata daily because they are compact and high-value, then test restore regularly so backup success is proven rather than assumed.

bash

/opt/dns-ingestor/scripts/backup.sh

#!/usr/bin/env bash
set -euo pipefail
mkdir -p /var/backups/dns-analytics
pg_dump -U dns_user -h localhost dns_analytics \
  --table=dns_daily_stats \
  --table=dns_weekly_stats \
  --table=dns_monthly_stats \
  --table=domain_categories \
  --table=dns_zone_records \
  -Fc -f /var/backups/dns-analytics/dns_analytics_$(date +%Y%m%d).dump

bash

restore drill

createdb -U postgres dns_analytics_test
pg_restore -U postgres -d dns_analytics_test /var/backups/dns-analytics/dns_analytics_YYYYMMDD.dump
psql -U postgres -d dns_analytics_test -c "SELECT COUNT(*) FROM dns_daily_stats;"
dropdb -U postgres dns_analytics_test

cron

daily backup

0 2 * * * /opt/dns-ingestor/scripts/backup.sh >> /var/log/dns-ingestor/backup.log 2>&1

ⓘ

Operational Extras

Add /var/log/dns-ingestor/*.log to logrotate, and if you want encrypted upstream privacy later, place stubby on 127.0.0.1:8053 and point BIND forwarders there instead of directly at public resolvers.

DNS Analyticsfrom Zero to 2M

How Everything Connects

Teaching Lens

Installing & Configuring BIND9

Teaching Lens

1.1 Install BIND9

Update package index

Install BIND9 and tools

Verify and enable on boot

1.2 Configure named.conf.options

1.3 Configure Query Logging

1.4 Validate, Restart, Test

PostgreSQL — Installation & Schema

Teaching Lens

2.1 Install

2.2 Create Database and User

2.3 Schema

2.4 Indexes

The Log Ingestor — Python Daemon

Teaching Lens

3.1 Python Setup

3.2 The Script

3.3 Systemd Service

3.4 End-to-End Test

Rollup Jobs — Daily, Weekly & Monthly

Teaching Lens

4.1 Daily Rollup

4.2 Weekly Rollup

4.3 Monthly Rollup

4.4 Cron Schedule

SQL Analytics — Reading Your Traffic

Teaching Lens

5.1 Top 50 Domains (7-Day)

5.2 By Category

5.3 24-Hour Heatmap

5.4 NXDOMAIN Analysis

Scaling to 2 Million Subscribers

Teaching Lens

6.1 Production Architecture

6.2 TimescaleDB

6.3 dnstap

MikroTik Home Network Lesson (10.10.10.0/24)

Teaching Lens

7.1 Configure MikroTik DHCP + DNS Domain

7.2 Create Authoritative Zone: codeandcore.home

7.3 Store Zone Records in PostgreSQL

7.4 Analytics for Local Zone + Full Traffic

Custom Dashboards — Your SQL, Your SVG, No Grafana

Teaching Lens

8.1 Daily Operations Dashboard

8.2 Weekly and Executive Views

8.3 NOC Diagnostics Build Order

RPZ — Turn Passive Analytics into Blocking Policy

Teaching Lens

9.1 Build the RPZ zone from PostgreSQL

9.2 Regenerate RPZ automatically

Firewall (UFW)

Teaching Lens

Troubleshooting

Teaching Lens

Partition Retention & Cleanup

Backup & Restore

DNS Analytics
from Zero to 2M