No Open Ports To the Data Center
Our metal-stack partitions typically have open ports for metal-stack native services, these are:
- SSH port on the firewalls
- bmc-reverse-proxy for serial console access through the metal-console
These open ports are potential security risks. For example, while SSH access is possible only with private key it's still vulnerable to DoS attack.
Therefore, we want to get rid off these open ports to reduce the attack surface to the data center.
- Access to firewall SSH only via VPN
- Easy to update VPN components
As a next step, we can also consider joining the management servers to the VPN mesh, which would replace typical WireGuard setups for operators to enter resources inside the partition.
High Level Design
Simplified drawing showing old vs. new architecture.
There's few concerns when using WireGuard for implementing VPN:
- WireGuard doesn't implement dynamic cipher substitution. Which is important in case one of the crypto methods, used by WireGuard will be broken. The only possible solution for that will be to update WireGuard to a fixed version.
- Coordination server(Headscale) is a single point of failure. In case it fails, it potentially can disconnect existing members of the network, as WireGuard can't manage dynamic IPs by itself.
- Headscale is already falls behind Tailscale coordination server implementation. Which can complicate the upgrade to newer version of Tailscale client in case of emergency.
Solutions to concerns
- Tailscale node software is using userspace implementation of WireGuard –
wireguard-go. One of the options is to inject Tailscale client into
metalctl. And make it available as
metalctl vpnor similar command. It should be possible to do as
tailscalenode is already available as open sourced Go pkg. That would allow us to control, what version of Tailscale users are using and in case of any critical changes to enforce them to update
metalctlto use VPN functionality.
- Would it be a considerable risk? We could look into
wg-dynamicproject to cover this problem.
- At the moment, repository looks well maintained and the metal-stack team already contributes to it.
metal-roles will be responsible for deployment of
headscale server(via new
headscale role). It also should provide sufficient config to
metal-api so it establishes connection with
headscale gRPC server.
metalctl will be responsible for client-side implementation of this MEP. Specifically, it's by using
metalctl user expected to connect to firewalls.
metalctl vpn– section for VPN related commands:
metalctl vpn get key [vpn name] --namespace [namespace name]– returns auth key to be used with
tailscaleclient for establishing connection.
metalctl firewall ssh [ID]– connect to firewall via SSH.
metalctl machine ssh [ID]– connect to machine via SSH.
metalctl will be able to connect to firewall and machines by running
tailscale in container.
metal-api should be made, so that it's able to add firewalls to VPNs. There should be one Tailscale namespace per project. So if multiple firewalls are created in single project, they will join the same namespace.
Two new flags should be introduced to connect
headscale gRPC server:
headscale-addr– specifies address of Headscale grpc API.
headscale-api-key– specifies temporary API key to connect to Headscale. It should be replaced and then rotated by
metal-api initialized with
headscale connection it should automatically join all created firewalls to VPN.
Add new endpoint, that will be used by
metalctl to connect to VPN:
/v1/vpn GET– requests auth key from
metal-hammer acts as an intermediary for machine configuration between
metal-api and machine's image. Specifically it writes to
/etc/metal/install.yaml file, data from which later will be used by image's
To implement VPN support we have to add authentication key and VPN server address to
install.yaml file. This key will be used to join machine to a VPN.
install.sh script have to be updated to work with authentication key and VPN server address, provided in
install.yaml file. If this key is present, machine should connect to VPN.
metal-networker also have to know if VPN was configured. In that case we need to disable public access to SSH and allow all(?) traffic from WireGuard interface.
firewall-controller have to monitor changes in
Firewall resource and keep
tailscaled version up-to-date.
Firewall resource to include desired/actual
Firewall: Spec: tailscale: Version: Minimal version ... Status: ... VPN: Status: Boolean field tailscale: Version: Actual version ...