No Open Ports To the Data Center
Our metal-stack partitions typically have open ports for metal-stack native services, these are:
- SSH port on the firewalls
- bmc-reverse-proxy for serial console access through the metal-console
These open ports are potential security risks. For example, while SSH access is possible only with private key it's still vulnerable to DoS attack.
Therefore, we want to get rid off these open ports to reduce the attack surface to the data center.
Requirements
- Access to firewall SSH only via VPN
- Easy to update VPN components
As a next step, we can also consider joining the management servers to the VPN mesh, which would replace typical WireGuard setups for operators to enter resources inside the partition.
High Level Design
Simplified drawing showing old vs. new architecture.
Concerns
There's few concerns when using WireGuard for implementing VPN:
- WireGuard doesn't implement dynamic cipher substitution. Which is important in case one of the crypto methods, used by WireGuard will be broken. The only possible solution for that will be to update WireGuard to a fixed version.
- Coordination server(Headscale) is a single point of failure. In case it fails, it potentially can disconnect existing members of the network, as WireGuard can't manage dynamic IPs by itself.
- Headscale is already falls behind Tailscale coordination server implementation. Which can complicate the upgrade to newer version of Tailscale client in case of emergency.
Solutions to concerns
- Tailscale node software is using userspace implementation of WireGuard –
wireguard-go
. One of the options is to inject Tailscale client intometalctl
. And make it available asmetalctl vpn
or similar command. It should be possible to do astailscale
node is already available as open sourced Go pkg. That would allow us to control, what version of Tailscale users are using and in case of any critical changes to enforce them to updatemetalctl
to use VPN functionality. - Would it be a considerable risk? We could look into
wg-dynamic
project to cover this problem. - At the moment, repository looks well maintained and the metal-stack team already contributes to it.
Implementation Details
metal-roles
metal-roles
will be responsible for deployment of headscale
server(via new headscale
role). It also should provide sufficient config to metal-api
so it establishes connection with headscale
gRPC server.
New metalctl
commands
metalctl
will be responsible for client-side implementation of this MEP. Specifically, it's by using metalctl
user expected to connect to firewalls.
metalctl vpn
– section for VPN related commands:metalctl vpn get key [vpn name] --namespace [namespace name]
– returns auth key to be used withtailscale
client for establishing connection.
Extend metalctl firewall
:
metalctl firewall ssh [ID]
– connect to firewall via SSH.
Extend metalctl machine
:
metalctl machine ssh [ID]
– connect to machine via SSH.
metalctl
will be able to connect to firewall and machines by running tailscale
in container.
metal-api
Updates to metal-api
should be made, so that it's able to add firewalls to VPNs. There should be one Tailscale namespace per project. So if multiple firewalls are created in single project, they will join the same namespace.
Two new flags should be introduced to connect metal-api
to headscale
gRPC server:
headscale-addr
– specifies address of Headscale grpc API.headscale-api-key
– specifies temporary API key to connect to Headscale. It should be replaced and then rotated bymetal-api
.
If metal-api
initialized with headscale
connection it should automatically join all created firewalls to VPN.
Add new endpoint, that will be used by metalctl
to connect to VPN:
/v1/vpn GET
– requests auth key fromheadscale
server.
metal-hammer
metal-hammer
acts as an intermediary for machine configuration between metal-api
and machine's image. Specifically it writes to /etc/metal/install.yaml
file, data from which later will be used by image's install.sh
file.
To implement VPN support we have to add authentication key and VPN server address to install.yaml
file. This key will be used to join machine to a VPN.
metal-images
Images install.sh
script have to be updated to work with authentication key and VPN server address, provided in install.yaml
file. If this key is present, machine should connect to VPN.
metal-networker
metal-networker
also have to know if VPN was configured. In that case we need to disable public access to SSH and allow all(?) traffic from WireGuard interface.
firewall-controller
firewall-controller
have to monitor changes in Firewall
resource and keep tailscaled
version up-to-date.
Resources
Update Firewall
resource to include desired/actual tailscale
version:
Firewall:
Spec:
tailscale:
Version: Minimal version
...
Status:
...
VPN:
Status: Boolean field
tailscale:
Version: Actual version
...
bmc-reverse-proxy
TODO