SSH Tarpit on a NixOS VPS: endlessh-go, Prometheus, and a quieter auth log

Moving SSH off port 22 and running endlessh-go with metrics

SSH Tarpit on a NixOS VPS: endlessh-go, Prometheus, and a quieter auth log

Why I set this up

For no particular reason I looked at recent failed SSH logins on my Linux VPS which has a public IPv4 address.

λ sudo lastb -F

I was astonished by the volume and the variety of usernames, e.g.:

ubnt     ssh:notty    80.94.95.116     Tue Feb 24 21:29:36 2026 - Tue Feb 24 21:29:36 2026
(00:00)
operator ssh:notty    80.94.95.115     Tue Feb 24 21:19:42 2026 - Tue Feb 24 21:19:42 2026
(00:00)
...
oracle   ssh:notty    104.248.200.5    Sun Feb  1 00:00:21 2026 - Sun Feb  1 00:00:21 2026  (00:00)

btmp begins Sun Feb  1 00:00:21 2026

Who tried?

To be clear…noone besides a single user is allowed to login and authentication is only allowed passwordless through keys, so none of these failed attempts was me just misstyping my password when trying to clumsily sshing into my server.

Since I only ever log in with a single user, I briefly considered adding a Fail2Ban jail that bans any SSH login attempt using a different username. I decided not to go with it (for now), but here’s the configuration I drafted:

    environment.etc = {
      ...
      # SSH invalid-user jail (ban any user != me).
      "fail2ban/filter.d/sshd-invalid-user.conf".text = ''
        [Definition]
        failregex = ^%(__prefix_line)sInvalid user (?!me\b).*$\n
      ^%(__prefix_line)sFailed password for invalid user (?!me\b).*$\n
      ^%(__prefix_line)sFailed publickey for invalid user (?!me\b).*$\n
      ^%(__prefix_line)sUser not known to the underlying authentication module for user (?!
      me\b).*$\n  ignoreregex =
      '';
    };
    services.fail2ban = {
      enable = true;
      ...
      jails = {
        # SSH invalid-user jail (ban any user != me).
        "sshd-invalid-user" = ''
          enabled = true
          filter = sshd-invalid-user
          backend = systemd
          journalmatch = _SYSTEMD_UNIT=sshd.service
          maxretry = 1
          findtime = 600
        '';
      };

If I ever want to enforce “single‑user only” bans, this is the exact snippet I’d enable but for now I just kept my nginx-probing jail and filter.

So next, I wanted to actually figure out who gave it a try. Therefore I then counted the most common usernames:

λ sudo lastb -F | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 20

5165 admin
3387 user
3154 ubuntu
1400 debian
1337 thym
1333 oracle
1200 xnberwa*
...

This pulls the username field from lastb, counts occurrences, and sorts them by frequency. It’s a fast way to see which default usernames bots try first (admin, ubuntu, user, etc.).

I also turned the output into JSON so it was easy to post‑process with jq:

λ python3 - <<'PY' | jq
import subprocess, json
cmd = "sudo lastb -F | awk '{print $1}' | sort | uniq -c | sort -nr"
out = subprocess.check_output(cmd, shell=True, text=True)
data = []
for line in out.splitlines():
parts = line.strip().split(None, 1)
if len(parts) != 2:
continue
count, user = parts
data.append({"user": user, "count": int(count)})
print(json.dumps(data))
PY

Next I wanted to see the most active source IPs to figure out how often they had tried:

λ sudo lastb -F | awk '{print $3}' | sort | uniq -c | sort -nr | head -n 30
   1776 185.246.128.171
    636 193.32.162.151
    627 213.176.0.19
    549 80.94.92.186
    429 165.227.230.243
    374 218.23.120.71
    374 181.177.142.103
    374 178.21.164.139
    374 103.88.112.66
    367 64.227.79.57
    350 14.139.239.130
    346 89.187.7.147
    346 51.79.104.129
    346 34.124.164.19
    346 220.130.131.117
    346 185.79.158.198
    346 156.236.75.178
    346 123.58.212.100
    346 1.165.130.37
    346 112.203.69.83
    346 103.90.67.3
    334 2.57.122.238
    334 213.209.159.158
    332 27.155.92.28
    330 103.239.252.132
    325 220.194.79.36
    282 189.89.20.45
    275 134.209.182.177
    274 167.99.214.186
    273 193.32.162.145

This brought up an interesting detail: multiple IPs showed up with identical attempt counts (e.g. 346, 374).

That likely hints at automated scanners or botnets:

  • Rate‑limited scanners that stop after a fixed number of attempts
  • Distributed scan batches where each node runs the same attempt list
  • Cloud scanning fleets using uniform limits

At that point I had seen enough. I decided to move real SSH off port 22 and place a tarpit there instead to detect these silly attempts and waste their time and compute.

What I did (high level)

  • Moved real SSH away from port 22 to some obscure port
  • Put endlessh‑go on port 22 as a tarpit
  • Enabled Prometheus metrics for endlessh‑go
  • Visualized the results in Grafana

Why endlessh‑go?

Classic endlessh is great, but endlessh‑go exposes Prometheus metrics. That means I can track how often port 22 is hit, how long connections stay open, and how many concurrent connections are active. Most importantly, I get a nice‑looking Grafana dashboard visualizing these Prometheus metrics and some sweet validation that the decision was indeed correct, as the internet has become a burning pile of trash and no open port is safe from unscrupulous beings trying to find a way onto your machine with its sweet public IP.

NixOS setup

Move real SSH off port 22

I moved SSH to a non‑default port, and opened that port in the firewall:

services.openssh = {
  enable = true;
  ports = [ <OBSCURE_PORT> ];
  openFirewall = true;
};

Run endlessh‑go on port 22 on this host

services.endlessh-go = {
  enable = true;
  listenAddress = "0.0.0.0";
  port = 22;
  openFirewall = true;
  prometheus = {
    enable = true;
  };
  extraOptions = [
    "-geo_ip_supplier=ip-api"
  ];
};

Prometheus scrape config

The endlessh‑go module exposes Prometheus metrics, so I just added a scrape job (conditional on it being enabled):

(mkIf (config.services.endlessh-go.enable && config.services.endlessh-go.prometheus.enable) {
  job_name = "endlessh-go";
  static_configs = [{
    targets = mkTargetsFor config.services.endlessh-go.prometheus.port
      (if cfg.scrapeHosts ? endlesshGo then cfg.scrapeHosts.endlesshGo else null);
  }];
})

And for my VPS’s scrape hosts config:

scrapeHosts = {
  ...
  endlesshGo = [ "localhost" ];
  ...
}

Grafana dashboard

I used the official endlessh‑go dashboard from the endlessh-go project:

https://grafana.com/grafana/dashboards/15156-endlessh/

It worked without any adaptations.

(I’ll add a screenshot here later.)

Closing thoughts

This was a small change with a noticeable impact:

  • The auth log noise dropped
  • Scanners now waste time and compute on a tarpit
  • I get metrics to see the scale of the problem

If you run any public server, I’d recommend doing the same.

Built with Hugo
Theme Stack designed by Jimmy