# DNS Analytics Site — Improvement Backlog
`dns-analytics.codeandcore.dev` · Review date: May 2026

---

## Priority 1 — Bugs (fix before promoting the guide)

### BUG-01 · NXDOMAIN / response_rcode always NOERROR

**Problem**
The Python ingestor hardcodes `response_rcode='NOERROR'` for every inserted row.
Standard BIND9 query logging does not include the response code in the `queries` log
line — that information only appears in `query-errors` category log lines, which are
written separately. As a result, `nxdomain_count` and `servfail_count` in
`dns_daily_stats` will always be zero, making Chapter 5.4 (NXDOMAIN Analysis) silently
useless.

**Fix options (pick one)**

Option A — Parse `query-errors` log separately
- Keep `category queries { queries_log; }` as-is
- Change `category query-errors { queries_log; }` to write to a separate channel
- Extend the ingestor to tail both files and UPDATE existing rows on a NXDOMAIN match
  using `(queried_at, client_ip, domain)` as the correlation key

Option B — Switch to `querylog` + response logging in BIND9
- Enable `responses-per-second` logging via `response-policy` or use a custom channel
  that captures both query and response in the same line
- Adjust `QUERY_RE` regex to capture the new format

Option C — Use `dnstap` even for small deployments
- dnstap carries the full response code in the protobuf message
- Brings the guide into alignment with Chapter 6 (scale path) earlier

**Suggested approach:** Option A is the least invasive for the existing guide structure.
Add a `BUG NOTE` callout in Chapter 5.4 explaining the limitation until the fix
chapter is written.

---

### BUG-02 · Partition naming inconsistency

**Problem**
The schema in Chapter 2 creates an initial partition named `dns_queries_today`:
```sql
CREATE TABLE dns_queries_today PARTITION OF dns_queries
    FOR VALUES FROM (CURRENT_DATE) TO (CURRENT_DATE + INTERVAL '1 day');
```
The cron job in Chapter 4.4 creates partitions named `dns_queries_YYYYMMDD`:
```bash
CREATE TABLE IF NOT EXISTS dns_queries_$(date -d tomorrow +\%Y\%m\%d) ...
```
These two naming conventions are inconsistent. Worse, `dns_queries_today` covers
`CURRENT_DATE` to `CURRENT_DATE+1`, while the cron creates a partition for tomorrow
starting at `CURRENT_DATE+1`. At midnight on day 2, the `dns_queries_today` partition
and the first cron-created partition may have overlapping or adjacent ranges depending
on session time zones, causing insert failures.

**Fix**
Replace the static partition in Chapter 2 with a named, date-stamped version:
```sql
-- Replace dns_queries_today with a consistent format
CREATE TABLE dns_queries_20260101 PARTITION OF dns_queries
    FOR VALUES FROM ('2026-01-01') TO ('2026-01-02');
```
Then update the teaching note to explain that the reader should substitute today's
date and that the cron job takes over from that point forward.

Add a `fact_check` step verifying that `\d+ dns_queries` in psql shows only
consistently named child tables.

---

### BUG-03 · Ingestor running as root

**Problem**
`/etc/systemd/system/dns-ingestor.service` sets `User=root`. The script only needs:
- Read access to `/var/log/named/queries.log`
- TCP connection to PostgreSQL on localhost

Running it as root violates least-privilege and is unnecessary.

**Fix**
```bash
# Create dedicated system user
sudo useradd --system --no-create-home --shell /usr/sbin/nologin dns-ingestor

# Give read access to named log dir
sudo setfacl -m u:dns-ingestor:rx /var/log/named
sudo setfacl -m u:dns-ingestor:r /var/log/named/queries.log
```

Update the unit file:
```ini
[Service]
User=dns-ingestor
Group=dns-ingestor
```

Add a `fact_check` verifying `ps aux | grep ingestor` shows the correct user.

---

### BUG-04 · Hardcoded database password in ingestor.py

**Problem**
`ChangeThisPassword!` is hardcoded in the Python script. Anyone with read access to
the file or the repository gets the credential.

**Fix**
Use a systemd `EnvironmentFile`:

```ini
# /etc/systemd/system/dns-ingestor.service
[Service]
EnvironmentFile=/etc/dns-ingestor/db.env
```

```bash
# /etc/dns-ingestor/db.env  (chmod 600, owned by dns-ingestor)
DB_PASSWORD=YourActualPassword
```

Update the Python `DB` dict:
```python
import os
DB = {
    'dbname':   'dns_analytics',
    'user':     'dns_user',
    'password': os.environ['DB_PASSWORD'],
    'host':     '127.0.0.1',
    'port':     5432
}
```

---

### BUG-05 · IPv6 client address regex failure

**Problem**
`QUERY_RE` captures the client IP with `[\d\.a-fA-F:]+` which will partially match
IPv6 addresses, but dual-stack BIND9 logs write IPv6 clients as bare addresses
(e.g. `::1#53`) without brackets. The regex may capture the port separator `:` as
part of the IP, producing a malformed `INET` value that fails on INSERT.

**Fix**
Test with a known IPv6 client:
```bash
dig @::1 google.com A
sudo tail -5 /var/log/named/queries.log
```
If the log line format differs, update `QUERY_RE` with a non-greedy IP group and
add a `parse()` test case for an IPv6 log line.

---

## Priority 2 — New Chapters

### CH-08 · Grafana Dashboard

**Why it belongs here**
The guide builds a complete analytics schema and never shows how to see it. Grafana
connecting to PostgreSQL is the natural payoff of everything in Chapters 1–5.

**Suggested panels**
| Panel | Query source | Viz type |
|---|---|---|
| Top 20 domains (7-day) | `dns_daily_stats` | Bar chart |
| Traffic by category | `dns_daily_stats + domain_categories` | Pie chart |
| Hourly heatmap | `dns_hourly_heatmap` | Heatmap |
| NXDOMAIN rate trend | `dns_daily_stats` | Time series |
| Unique clients per day | `dns_daily_stats` | Stat + sparkline |
| Live query rate | `dns_queries` (last 60s) | Gauge |

**Teaching Lens angle**
Grafana panels are SQL in disguise. Show the raw SQL first, then show how it maps
to the panel data source query editor. Teach readers to read the dashboard and the
database as the same thing.

**Install block**
```bash
sudo apt-get install -y grafana
sudo systemctl enable --now grafana-server
# Add PostgreSQL data source: localhost:5432 / dns_analytics / dns_user
```

---

### CH-09 · RPZ — Response Policy Zone (Blocking with Your Own Data)

**Why it belongs here**
Chapter 1 null-routes `category rpz` in the BIND logging config — the guide already
signals RPZ awareness. The `domain_categories` table has `cache_priority 9` entries
for ads (`doubleclick.net`, `googlesyndication.com`). This chapter closes the loop:
use the data you collected to make blocking decisions.

**Structure**

1. What RPZ is and how BIND9 loads it
2. Create a local RPZ zone file seeded from `domain_categories WHERE cache_priority = 9`
3. Write a cron script that regenerates the zone file from PostgreSQL and reloads BIND
4. Verify blocked domains return NXDOMAIN
5. Monitor RPZ hit counts via the `query-errors` log channel

**Teaching Lens angle**
RPZ turns passive observation into active policy. The data you collected in Chapter 5
now drives Chapter 9 decisions — the pipeline feeds itself.

---

### APP-C · Partition Retention & Cleanup

**Why it belongs here**
At 50K queries/day, the raw table grows indefinitely. Old `dns_queries_YYYYMMDD`
child tables accumulate on disk with no drop strategy documented.

**Suggested content**

Retention policy decision table:
| Retention window | Use case | Disk estimate |
|---|---|---|
| 30 days raw | Home lab | ~150 MB |
| 90 days raw | Small ISP | ~450 MB |
| 7 days raw + permanent rollups | Production | Minimal |

Cleanup script:
```sql
-- Drop partitions older than 30 days
DO $$
DECLARE
    tbl TEXT;
BEGIN
    FOR tbl IN
        SELECT tablename FROM pg_tables
        WHERE tablename LIKE 'dns_queries_2%'
          AND TO_DATE(SUBSTRING(tablename FROM 13), 'YYYYMMDD')
              < CURRENT_DATE - INTERVAL '30 days'
    LOOP
        EXECUTE 'DROP TABLE IF EXISTS ' || tbl;
        RAISE NOTICE 'Dropped %', tbl;
    END LOOP;
END $$;
```

Add to cron (run weekly):
```cron
0 4 * * 0  psql -U dns_user -d dns_analytics -h localhost -f /opt/dns-ingestor/sql/cleanup_partitions.sql >> /var/log/dns-ingestor/cleanup.log 2>&1
```

---

### APP-D · Backup & Restore

**Why it belongs here**
The guide builds a dataset with real operational value and provides no recovery path.

**Suggested content**

Daily pg_dump of rollup tables only (small, fast):
```bash
pg_dump -U dns_user -h localhost dns_analytics \
  --table=dns_daily_stats \
  --table=dns_weekly_stats \
  --table=dns_monthly_stats \
  --table=dns_monthly_stats \
  --table=domain_categories \
  --table=dns_zone_records \
  -Fc -f /var/backups/dns_analytics_$(date +%Y%m%d).dump
```

Restore test procedure (run monthly):
```bash
createdb -U postgres dns_analytics_test
pg_restore -U postgres -d dns_analytics_test /var/backups/dns_analytics_YYYYMMDD.dump
psql -U postgres -d dns_analytics_test -c "SELECT COUNT(*) FROM dns_daily_stats;"
dropdb -U postgres dns_analytics_test
```

Cron entry:
```cron
0 2 * * *  /opt/dns-ingestor/scripts/backup.sh >> /var/log/dns-ingestor/backup.log 2>&1
```

---

## Priority 3 — Content Gaps

### CONTENT-01 · domain_categories seed data — add African/regional domains

The 26 seeded domains are all Western CDN/social. For a guide written from Uganda
and aimed at ISP operators in East Africa, the seed data should reflect actual
subscriber traffic patterns.

**Suggested additions**
```sql
-- East African and regional additions
INSERT INTO domain_categories VALUES
('safaricom.com','telco','Safaricom',5,now()),
('mtn.com','telco','MTN',5,now()),
('airtel.com','telco','Airtel',5,now()),
('dstv.com','streaming','MultiChoice',1,now()),
('showmax.com','streaming','MultiChoice',1,now()),
('gotv.com','streaming','MultiChoice',1,now()),
('jumia.com','ecommerce','Jumia',6,now()),
('pesapal.com','fintech','Pesapal',5,now()),
('mpesa.co.ke','fintech','Safaricom',5,now()),
('equity.co.ke','fintech','Equity',5,now()),
('1.1.1.1.in-addr.arpa','infrastructure','Cloudflare',1,now()),
('akamaitechnologies.com','cdn','Akamai',1,now()),
('ggpht.com','cdn','Google',2,now()),
('gstatic.com','cdn','Google',2,now()),
('ytimg.com','cdn','Google',2,now()),
('fbcdn.net','cdn','Meta',2,now()),
('cdninstagram.com','cdn','Meta',2,now()),
('telegram.org','messaging','Telegram',3,now()),
('web.telegram.org','messaging','Telegram',3,now()),
('zoom.us','work','Zoom',5,now()),
('zoomgov.com','work','Zoom',5,now())
ON CONFLICT DO NOTHING;
```

---

### CONTENT-02 · Logrotate for ingestor log

BIND9 handles its own rotation (versions 7 in the logging block) but
`/var/log/dns-ingestor/ingestor.log` is unmanaged.

**Add to Appendix A or Chapter 3.3:**
```
/var/log/dns-ingestor/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    postrotate
        systemctl kill --signal=USR1 dns-ingestor.service 2>/dev/null || true
    endscript
}
```
```bash
sudo nano /etc/logrotate.d/dns-ingestor
# paste above, then test:
sudo logrotate --debug /etc/logrotate.d/dns-ingestor
```

---

### CONTENT-03 · Encrypted upstream DNS (stubby / DoT)

The forwarders currently use plain UDP to `8.8.8.8` and `1.1.1.1`. A short note or
optional block showing `stubby` as a local DoT proxy gives the reader a privacy
upgrade path without changing the BIND9 architecture.

```bash
sudo apt-get install -y stubby
# stubby listens on 127.0.0.1:8053 by default
```

Then update `named.conf.options` forwarders:
```
forwarders { 127.0.0.1 port 8053; };
```

Scope: 1 page, optional callout box. Mark as `[Optional — Privacy Upgrade]`.

---

### CONTENT-04 · Alerting (NXDOMAIN spike / resolver silence)

A simple Python script on cron that queries `dns_daily_stats` and posts to a
Telegram bot or webhook if NXDOMAIN rate exceeds a threshold, or if the resolver
has produced zero inserts in the last 10 minutes.

**Suggested placement:** End of Chapter 5, after the SQL queries. Turns passive
reporting into active operations.

---

## Summary Checklist

| Ref | Type | Priority | Status |
|---|---|---|---|
| BUG-01 | NXDOMAIN always NOERROR | 🔴 Critical | Open |
| BUG-02 | Partition naming mismatch | 🔴 Critical | Open |
| BUG-03 | Ingestor runs as root | 🟠 High | Open |
| BUG-04 | Hardcoded DB password | 🟠 High | Open |
| BUG-05 | IPv6 regex | 🟡 Medium | Open |
| CH-08 | Grafana chapter | 🟠 High | Open |
| CH-09 | RPZ chapter | 🟡 Medium | Open |
| APP-C | Partition retention | 🟠 High | Open |
| APP-D | Backup & restore | 🟠 High | Open |
| CONTENT-01 | African domain seed data | 🟡 Medium | Open |
| CONTENT-02 | Logrotate for ingestor | 🟡 Medium | Open |
| CONTENT-03 | Encrypted upstream DNS | 🟢 Low | Open |
| CONTENT-04 | Alerting | 🟡 Medium | Open |

---
