﻿# SQL Lessons
## Guided Analytics Labs for code::core DNS Server

This file is now organized as lessons, not a raw query dump.

Each lesson follows the same pattern:
1. Objective: What decision this lesson supports.
2. Run: The exact SQL to execute.
3. Interpret: How to read the result.
4. Checkpoint: What confirms success.

All lessons assume you're connected to the `dns_analytics` database as `dns_user`.

Before you run this command, understand why it exists: this opens an interactive PostgreSQL session as the analytics application user, so every lesson query is executed with the same permissions and context used by your platform services.

```bash
psql -U dns_user -d dns_analytics -h localhost
```

---

## Lesson 1: Traffic Baseline

Objective: identify the domains and categories driving most resolver traffic.

### 1.1 Run: Top 50 Domains (Last 7 Days)
Before you run this query, understand the question it answers: "Which domains dominate resolver workload right now?" This is your concentration view, useful for cache strategy and upstream planning.

```sql
SELECT
    ds.apex_domain,
    dc.category,
    dc.provider,
    dc.cache_priority,
    SUM(ds.hit_count) AS total_queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(),2) AS pct,
    ROUND(AVG(ds.hit_count),0) AS avg_per_day,
    MAX(ds.last_seen) AS last_query
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY ds.apex_domain, dc.category, dc.provider, dc.cache_priority
ORDER BY total_queries DESC
LIMIT 50;
```

Interpret:
- Use total_queries and pct to find concentration.
- A few domains usually dominate traffic in home and ISP networks.

Checkpoint:
- Top rows should reflect real user behavior (search, streaming, updates).

### 1.2 Run: Traffic by Category (Last 7 Days)
Before you run this query, understand the perspective shift: it groups traffic by service type instead of individual domains, helping you reason about user behavior patterns rather than hostname noise.

```sql
SELECT
    COALESCE(dc.category, 'uncategorised') AS category,
    COUNT(DISTINCT ds.apex_domain) AS unique_domains,
    SUM(ds.hit_count) AS total_queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(),2) AS pct_of_total,
    ROUND(AVG(ds.hit_count),0) AS avg_queries_per_domain
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY 1
ORDER BY total_queries DESC;
```

### 1.3 Run: Top 20 Streaming & CDN Domains
Before you run this query, understand the operational purpose: streaming and CDN domains are usually high-impact for bandwidth and latency, so this cut helps prioritize peering and caching decisions.

```sql
SELECT
    ds.apex_domain,
    dc.provider,
    dc.cache_priority,
    SUM(ds.hit_count) AS queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(),2) AS pct
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '7 days'
  AND dc.category IN ('streaming', 'cdn')
GROUP BY ds.apex_domain, dc.provider, dc.cache_priority
ORDER BY queries DESC
LIMIT 20;
```

Interpret:
- Category totals help justify cache and peering priorities.

Checkpoint:
- Category percentages should roughly align with your user profile.

## Lesson 2: Peak Hours and Usage Rhythm

Objective: determine when demand peaks so you can tune cache warmup and maintenance windows.

### 2.1 Run: 24-Hour Heatmap (Last 7 Days)
Before you run this query, understand the objective: this transforms hourly counts into a demand-shape view so you can identify predictable load windows.

```sql
SELECT
    LPAD(h.hour_of_day::TEXT, 2, '0') || ':00' AS hour,
    SUM(h.query_count) AS total_queries,
    REPEAT('█', (SUM(h.query_count)/NULLIF(MAX(SUM(h.query_count)) OVER()/40, 0))::INT) AS bar_chart,
    ROUND(SUM(h.query_count)*100.0/SUM(SUM(h.query_count)) OVER(), 2) AS pct
FROM dns_hourly_heatmap h
WHERE h.stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY h.hour_of_day
ORDER BY h.hour_of_day;
```

### 2.2 Run: Peak Hour (When do users wake up?)
Before you run this query, understand the focus: it extracts the single busiest hour over a broader window, which is useful for maintenance scheduling and cache warm-up timing.

```sql
SELECT
    LPAD(hour_of_day::TEXT, 2, '0') || ':00' AS peak_hour,
    SUM(query_count) AS queries,
    COUNT(DISTINCT stat_date) AS days_recorded
FROM dns_hourly_heatmap
WHERE stat_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY hour_of_day
ORDER BY queries DESC
LIMIT 1;
```

### 2.3 Run: Daily Trend (Last 30 Days)
Before you run this query, understand the value of smoothing: the 7-day rolling average helps separate real trend changes from normal daily variance.

```sql
SELECT
    stat_date,
    SUM(hit_count) AS daily_queries,
    COUNT(DISTINCT apex_domain) AS unique_domains,
    ROUND(AVG(SUM(hit_count)) OVER (ORDER BY stat_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW), 0) AS avg_7day
FROM dns_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY stat_date
ORDER BY stat_date DESC;
```

Interpret:
- Heatmap and peak-hour queries should point to the same high-demand window.

Checkpoint:
- Peak hour must be plausible for your household/office routine.

## Lesson 3: Failure Analysis

Objective: detect domains that repeatedly fail and separate bad domains from resolver problems.

### 3.1 Run: NXDOMAIN Domains (Non-existent - Top 30)
Before you run this query, understand what NXDOMAIN indicates: repeated non-existent lookups often reveal typos, stale configs, or noisy client behavior.

```sql
SELECT
    apex_domain,
    SUM(hit_count) AS total_queries,
    SUM(nxdomain_count) AS nxdomain_count,
    ROUND(SUM(nxdomain_count)*100.0/NULLIF(SUM(hit_count), 0), 2) AS nxdomain_pct,
    stat_date AS most_recent
FROM dns_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '7 days'
  AND nxdomain_count > 0
GROUP BY apex_domain
HAVING SUM(nxdomain_count) > 10
ORDER BY nxdomain_pct DESC
LIMIT 30;
```

### 3.2 Run: Service Failures (SERVFAIL - Top 20)
Before you run this query, understand its diagnostic angle: SERVFAIL is usually resolver-path or upstream health related, not just bad domain naming.

```sql
SELECT
    apex_domain,
    SUM(servfail_count) AS failures,
    SUM(hit_count) AS total_attempts,
    ROUND(SUM(servfail_count)*100.0/NULLIF(SUM(hit_count), 0), 2) AS fail_rate_pct
FROM dns_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '7 days'
  AND servfail_count > 0
GROUP BY apex_domain
ORDER BY failures DESC
LIMIT 20;
```

### 3.3 Run: Total Failure Rate (Daily)
Before you run this query, understand the monitoring purpose: it gives a day-by-day reliability score so you can detect drift before users report problems.

```sql
SELECT
    stat_date,
    SUM(hit_count) AS total_queries,
    SUM(nxdomain_count) AS nxdomain_errors,
    SUM(servfail_count) AS servfail_errors,
    ROUND((SUM(nxdomain_count) + SUM(servfail_count))*100.0/NULLIF(SUM(hit_count), 0), 2) AS overall_error_pct
FROM dns_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY stat_date
ORDER BY stat_date DESC;
```

Interpret:
- NXDOMAIN usually indicates typos, dead domains, or noisy devices.
- SERVFAIL can indicate upstream or resolver path problems.

Checkpoint:
- overall_error_pct should stay low and stable over time.

## Lesson 4: Client Behavior

Objective: identify heavy clients and unusual request patterns.

### 4.1 Run: Most Active Client IPs (Last 7 Days)
Before you run this query, understand the behavior lens: it ranks clients by request volume and diversity to highlight heavy or unusual generators.

```sql
SELECT
    client_ip,
    COUNT(*) AS total_queries,
    COUNT(DISTINCT domain) AS unique_domains_queried,
    COUNT(DISTINCT apex_domain) AS unique_apex_domains,
    MAX(queried_at) AS last_seen
FROM dns_queries
WHERE queried_at >= NOW() - INTERVAL '7 days'
GROUP BY client_ip
ORDER BY total_queries DESC
LIMIT 20;
```

### 4.2 Run: Client Subnet Analysis (If PII concerns, aggregate by /24)
Before you run this query, understand the privacy trade-off: /24 aggregation reduces identifiability while preserving enough signal for network-level diagnostics.

```sql
SELECT
    set_masklen(client_ip, 24)::text AS client_subnet,
    COUNT(*) AS queries,
    COUNT(DISTINCT domain) AS unique_domains
FROM dns_queries
WHERE queried_at >= NOW() - INTERVAL '7 days'
GROUP BY client_subnet
ORDER BY queries DESC
LIMIT 20;
```

### 4.3 Run: Per-Client Domain Distribution
Before you run this query, understand the investigative use: this zooms into one client to see whether its request mix looks normal for that device role.

```sql
SELECT
    client_ip,
    apex_domain,
    COUNT(*) AS queries,
    MAX(queried_at) AS last_query
FROM dns_queries
WHERE client_ip = '192.168.1.100'  -- Change to target client
  AND queried_at >= NOW() - INTERVAL '7 days'
GROUP BY client_ip, apex_domain
ORDER BY queries DESC
LIMIT 30;
```

Interpret:
- High-query clients with low domain diversity can indicate automation, updates, or IoT chatter.

Checkpoint:
- Top clients list should map to known devices on your network.

## Lesson 5: Decision Support

Objective: turn DNS analytics into cache, peering, and routing priorities.

### 5.1 Run: Cache Prioritization (What should we prefetch?)
Before you run this query, understand the decision output: it converts traffic plus cache priority metadata into tiered action groups for rollout planning.

```sql
SELECT
    ds.apex_domain,
    dc.provider,
    dc.cache_priority,
    SUM(ds.hit_count) AS total_queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(), 2) AS pct,
    CASE
        WHEN dc.cache_priority <= 2 THEN 'TIER 1: CRITICAL'
        WHEN dc.cache_priority <= 4 THEN 'TIER 2: HIGH'
        WHEN dc.cache_priority <= 7 THEN 'TIER 3: MEDIUM'
        ELSE 'TIER 4: LOW'
    END AS cache_tier
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY ds.apex_domain, dc.provider, dc.cache_priority
ORDER BY total_queries DESC
LIMIT 50;
```

### 5.2 Run: CDN Peering Candidates (Top CDN domains)
Before you run this query, understand the strategy: this highlights CDN-heavy domains that may justify direct peering due to sustained query demand.

```sql
SELECT
    ds.apex_domain,
    dc.provider,
    SUM(ds.hit_count) AS monthly_queries,
    ROUND(SUM(ds.hit_count)*100.0/SUM(SUM(ds.hit_count)) OVER(), 2) AS pct_of_traffic,
    ROUND(AVG(ds.unique_clients), 0) AS avg_unique_clients,
    CASE
        WHEN SUM(ds.hit_count) > 10000000 THEN '[stop_circle] PRIORITY 1'
        WHEN SUM(ds.hit_count) > 1000000 THEN '[warning] PRIORITY 2'
        ELSE '[check_circle] PRIORITY 3'
    END AS peering_priority
FROM dns_daily_stats ds
LEFT JOIN domain_categories dc ON dc.apex_domain=ds.apex_domain
WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '30 days'
  AND dc.category = 'cdn'
GROUP BY ds.apex_domain, dc.provider
ORDER BY monthly_queries DESC
LIMIT 15;
```

Interpret:
- Use cache tiers for rollout planning.
- Use peering_priority to build provider negotiation lists.

Checkpoint:
- Priority outputs should be stable week-over-week unless workload shifts.

## Lesson 6: Operational Diagnostics

Objective: verify pipeline health from ingestion to rollups.

### 6.1 Run: Raw Query Count (Test if ingestor is working)
Before you run this query, understand the health signal: it proves data is being written and shows earliest/latest timestamps to validate continuity.

```sql
SELECT
    COUNT(*) AS total_raw_queries,
    COUNT(DISTINCT date_trunc('day', queried_at)) AS days_with_data,
    MIN(queried_at) AS first_query,
    MAX(queried_at) AS last_query
FROM dns_queries;
```

### 6.2 Run: Ingest Rate (Queries per second)
Before you run this query, understand why rate matters: hourly QPS gives an operational baseline for capacity checks and anomaly comparison.

```sql
SELECT
    DATE_TRUNC('hour', queried_at) AS hour,
    COUNT(*) AS queries_in_hour,
    ROUND(COUNT(*)::numeric / 3600, 2) AS queries_per_second
FROM dns_queries
WHERE queried_at >= NOW() - INTERVAL '24 hours'
GROUP BY DATE_TRUNC('hour', queried_at)
ORDER BY hour DESC
LIMIT 24;
```

### 6.3 Run: Database Size
Before you run this query, understand the planning value: table and database size trends tell you when to optimize storage, retention, or indexing.

```sql
SELECT
    pg_size_pretty(pg_total_relation_size('dns_queries')) AS raw_queries_size,
    pg_size_pretty(pg_total_relation_size('dns_daily_stats')) AS daily_stats_size,
    pg_size_pretty(pg_database_size('dns_analytics')) AS total_db_size;
```

### 6.4 Run: Daily Rollup Status (Did rollups run?)
Before you run this query, understand the scheduler check: it validates that rollup jobs are actually materializing daily summary rows.

```sql
SELECT
    stat_date,
    COUNT(DISTINCT domain) AS unique_domains,
    SUM(hit_count) AS total_hit_count,
    MIN(first_seen) AS rollup_earliest,
    MAX(last_seen) AS rollup_latest
FROM dns_daily_stats
WHERE stat_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY stat_date
ORDER BY stat_date DESC;
```

### 6.5 Run: Partition Check (Are new partitions being created?)
Before you run this query, understand the ingestion risk: missing future partitions can break inserts at date boundaries, so this is a preventive control.

```sql
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE tablename LIKE 'dns_queries_%'
ORDER BY tablename DESC
LIMIT 10;
```

Interpret:
- If raw count grows but rollups do not, cron/SQL scheduling is the issue.

Checkpoint:
- rows in daily stats should appear each day without manual reruns.

## Lesson 7: Custom Reporting Lessons

Objective: build periodic reports and anomaly detection from existing aggregates.

### 7.1 Run: Weekly Traffic Report
Before you run this query, understand the reporting level: weekly aggregates provide planning context without the volatility of day-level noise.

```sql
SELECT
    week_start,
    apex_domain,
    category,
    total_queries,
    unique_clients,
    peak_day,
    ROUND(total_queries::numeric / 7, 0) AS avg_daily
FROM dns_weekly_stats
WHERE week_start >= DATE_TRUNC('week', CURRENT_DATE - INTERVAL '4 weeks')::DATE
ORDER BY week_start DESC, total_queries DESC
LIMIT 50;
```

### 7.2 Run: Month-over-Month Comparison
Before you run this query, understand the trend objective: MoM ranking highlights sustained shifts in demand and service mix.

```sql
SELECT
    year_month,
    apex_domain,
    category,
    total_queries,
    rank_in_month,
    ROUND(avg_daily, 0) AS daily_average
FROM dns_monthly_stats
WHERE year_month >= TO_CHAR(CURRENT_DATE - INTERVAL '3 months', 'YYYY-MM')
ORDER BY year_month DESC, rank_in_month ASC
LIMIT 100;
```

### 7.3 Run: Anomaly Detection (Traffic spikes)
Before you run this query, understand the method: z-score compares recent values to historical mean and variance, so high scores are flags for investigation.

```sql
WITH daily_avg AS (
    SELECT
        apex_domain,
        AVG(hit_count) AS avg_hits,
        STDDEV_POP(hit_count) AS stddev_hits
    FROM dns_daily_stats
    WHERE stat_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY apex_domain
),
recent_data AS (
    SELECT
        ds.stat_date,
        ds.apex_domain,
        ds.hit_count,
        da.avg_hits,
        da.stddev_hits,
        ROUND((ds.hit_count - da.avg_hits) / NULLIF(da.stddev_hits, 0), 2) AS z_score
    FROM dns_daily_stats ds
    JOIN daily_avg da ON ds.apex_domain = da.apex_domain
    WHERE ds.stat_date >= CURRENT_DATE - INTERVAL '7 days'
)
SELECT
    stat_date,
    apex_domain,
    hit_count,
    avg_hits,
    z_score,
    CASE
        WHEN z_score > 3 THEN '[stop_circle] EXTREME SPIKE'
        WHEN z_score > 2 THEN '[warning] SIGNIFICANT SPIKE'
        ELSE '[check_circle] NORMAL'
    END AS status
FROM recent_data
WHERE ABS(z_score) > 2
ORDER BY stat_date DESC, z_score DESC;
```

Interpret:
- Anomalies with high z-score are investigation candidates, not automatic incidents.

Checkpoint:
- You can produce weekly and monthly outputs without touching raw tables.

## Lesson 8: Export and Backup

Objective: extract reporting data safely for archive, sharing, and recovery.

### 8.1 Run: Export Top Domains to CSV
Before you run this command, understand data scope: export from aggregated tables first to reduce sensitivity and keep file size manageable.

```bash
psql -U dns_user -d dns_analytics -h localhost \
  --csv \
  -c "SELECT apex_domain, SUM(hit_count) as queries FROM dns_daily_stats
      WHERE stat_date >= CURRENT_DATE - 7
      GROUP BY apex_domain ORDER BY queries DESC LIMIT 100" \
  > top_100_domains.csv
```

### 8.2 Run: Daily Stats Export (for archive)
Before you run this command, understand backup intent: this creates a compressed, data-only archive of summary data suitable for restore or handoff.

```bash
pg_dump -U dns_user -d dns_analytics \
  --table=dns_daily_stats \
  --data-only \
  | gzip > daily_stats_$(date +%Y%m%d).sql.gz
```

### 8.3 Run: Backup Single Table
Before you run this command, understand the use case: this generates a portable CSV slice for quick analysis or lightweight recovery testing.

```bash
psql -U dns_user -d dns_analytics -h localhost -c \
  "COPY (SELECT * FROM dns_daily_stats WHERE stat_date >= CURRENT_DATE - 30)
   TO STDOUT WITH CSV HEADER" > dns_daily_stats_30day.csv
```

Interpret:
- Export aggregated tables for reporting; keep raw query exports intentional and scoped.

Checkpoint:
- CSV and compressed SQL outputs are restorable and timestamped.

## Lesson 9: MikroTik Local Zone Lesson (10.10.10.0/24)

Objective: verify that local domain codeandcore.home is being resolved and analyzed alongside internet traffic.

Lab assumptions:
- Network: 10.10.10.0/24
- Router and DNS: 10.10.10.10
- Local zone: codeandcore.home

### 9.1 Run: Zone Records in Database
Before you run this query, understand the inventory check: it confirms your authoritative local zone records are mirrored in PostgreSQL as expected state.

```sql
SELECT zone_name, fqdn, record_type, record_value, ttl, active
FROM dns_zone_records
WHERE zone_name = 'codeandcore.home'
ORDER BY fqdn, record_type;
```

### 9.2 Run: Local Host Query Volume
Before you run this query, understand the usage check: it measures whether clients are actively resolving local names inside codeandcore.home.

```sql
SELECT
        domain,
        qtype,
        COUNT(*) AS hits,
        COUNT(DISTINCT client_ip) AS unique_clients,
        MAX(queried_at) AS last_seen
FROM dns_queries
WHERE domain LIKE '%.codeandcore.home'
    AND queried_at >= NOW() - INTERVAL '7 days'
GROUP BY domain, qtype
ORDER BY hits DESC;
```

### 9.3 Run: Local vs Internet Split
Before you run this query, understand the ratio insight: it quantifies internal versus external DNS dependency over the same time window.

```sql
SELECT
        CASE WHEN domain LIKE '%.codeandcore.home' THEN 'local_zone' ELSE 'internet' END AS traffic_class,
        COUNT(*) AS total_queries,
        ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS pct
FROM dns_queries
WHERE queried_at >= NOW() - INTERVAL '7 days'
GROUP BY 1
ORDER BY total_queries DESC;
```

### 9.4 Run: Zone Records Never Queried (cleanup candidates)
Before you run this query, understand the hygiene goal: records never queried for long periods are potential cleanup candidates if still marked active.

```sql
SELECT zr.fqdn, zr.record_type, zr.record_value
FROM dns_zone_records zr
LEFT JOIN dns_queries dq
    ON dq.domain = zr.fqdn
 AND dq.queried_at >= NOW() - INTERVAL '30 days'
WHERE zr.zone_name = 'codeandcore.home'
    AND zr.active = TRUE
GROUP BY zr.fqdn, zr.record_type, zr.record_value
HAVING COUNT(dq.id) = 0
ORDER BY zr.fqdn;
```

Interpret:
- Local zone hits confirm internal naming is in active use.
- Never-queried records indicate stale DNS entries to review.

Checkpoint:
- You can prove both zone inventory and zone usage from PostgreSQL.

## Lesson 10: Querying Discipline

1. **Save queries as views** for reuse:
    Before you run this example, understand why views help: they standardize repeated logic so teams stop copying and drifting query text.
   ```sql
   CREATE VIEW top_50_domains_7day AS
   SELECT ... (your query here)
   ```

2. **Auto-refresh dashboard** with cron:
    Before you run this example, understand the operational pattern: scheduled execution turns analysis into repeatable reporting instead of manual ad hoc pulls.
   ```bash
   0 */6 * * * psql -U dns_user -d dns_analytics -f /path/to/query.sql > /var/www/html/report.html
   ```

3. **Get help on any table:**
    Before you run these commands, understand the purpose: schema introspection is the fastest way to verify field names and data types before writing new queries.
   ```sql
   \d dns_daily_stats   -- Shows schema
   \d+ dns_queries      -- Shows with size info
   ```

4. **Measure query performance:**
    Before you run this command, understand the tuning loop: analyze execution plans before adding indexes so changes are evidence-based.
   ```sql
   EXPLAIN ANALYZE SELECT ... your query ...
   ```

5. **Check index usage:**
    Before you run this query, understand the maintenance value: low-scan indexes may be dead weight and high-scan indexes show where performance depends.
   ```sql
   SELECT schemaname, tablename, indexname, idx_scan
   FROM pg_stat_user_indexes
   WHERE schemaname = 'public'
   ORDER BY idx_scan DESC;
   ```

---

1. Save stable queries as SQL files by lesson.
2. Promote repeated analysis into views.
3. Use EXPLAIN ANALYZE before adding new indexes.
4. Keep one business question per query.
5. Add checkpoints to every report runbook.

**For more help:** See README.md or index.html





