Reducing your attack surface with systemd

Running Linux services through systemd has huge positive impact on reducing their attack surface — and its authors are adding a lot of new functionality with each new version. What can you achieve in terms of security using systemd?

The biggest advantage of systemd service interface is that is offers a common interface to many distinct security features of Linux kernel in a single service definition, where they can be applied along with other service-specific flags. The relevant flags are documented in systemd.service, systemd.exec and systemd.resource-control but at the end of the day they are specified in one file.

For example below you can see my service definition for an IPFS node, but I'm also reusing it in very similar form for most my daemons. When this comes especially useful?

  • Any network server daemons as they are especially exposed to malicious traffic from the Internet trying to exploit known or unknown vulnerabilities in their code. Systemd flags allow to confine the daemon with very little options to access any filesystem resources they're not intended to, plant a web shell or escalate their privileges.
  • Any daemons running as root whose exploitation, without confinement, is practically equivalent to full system takeover. In systemd you can have a daemon running as root but still having very limited access to system resources.
  • Any legacy or third-party daemons that you aren't really sure about what they do or what terrible vulnerabilities may be buried in them, and even if you know, you can't do anything about them... except for confine them to reduce attack surface and mitigate possible exploit consequences.

IPFS as a network server daemon that connects and accepts connections from hundreds of peers is a very attractive target for such confinement and I have explained purpose of all the security-related directives in the comments. An important question is how do you actually build such service definition? Some hints:

  • Use an iterative approach: you enable as many flags as possible in their most restrictive version, try if the daemon works and if it doesn't, you relax some of them until it does.
  • Build an AppArmor or SELinux profile for the daemon using their native monitor-only policy-building tools (aa-genprof and audit2allow, respectively). This will tell you exactly what resources (files, network etc) and capabilities they are accessing and is especially useful for daemons that can access some resources only ocassionally, for example once per day, which you won't capture immediately.
  • Use systemd-analyze security (see below)

For new and custom daemon I would create a new service definition with systemctl edit --full --force ipfs.service and then just paste the whole text above in there, and run systemctl start ipfs. But you can also harden an existing daemons like Nginx by running systemctl edit nginx.service which will create an empty override file where you just create a [Service] section and only add the security-relevant flags there.

[Unit]
Description=IPFS Daemon Service
Wants=network-online.target
After=network-online.target

[Install]
WantedBy=multi-user.target
Alias=ipfs.service

[Service]
Type=simple
WorkingDirectory=/home/ipfs
ExecStart=/usr/bin/ipfs daemon --enable-namesys-pubsub
Restart=on-failure
SystemCallErrorNumber=EPERM

# run the process as unprivileged user and group
User=ipfs
Group=ipfs

# reduce CPU priority of the process - see nice(1) for details
Nice=5

# disallow any escalation of privileges (e.g. through calling SUID binaries)
NoNewPrivileges=yes
RestrictSUIDSGID=yes

# two of the most important confinement flags - first one disables all system calls except for a small pre-defined whitelist (in this case suitable for most system services)
# the second whitelists system capabilities that will be allowed for this daemon
CapabilityBoundingSet=CAP_NET_BIND_SERVICE CAP_DAC_OVERRIDE CAP_DAC_READ_SEARCH
SystemCallFilter=@system-service

# this further disables kernel system call interfaces that are not available on this particular CPU architecture
SystemCallArchitectures=native

# these (and more similar options) that allow to confine the process in a read-only file system
# with only defined directories allowed as read-write
ProtectSystem=strict
ReadWritePaths=/home/ipfs/.ipfs

# allocate a private writable /tmp mount (in addition to the above writable dir)
PrivateTmp=yes

# allocate a private /dev mount
PrivateDevices=yes

# only allow the process to open sockets of the specified address families (e.g. so it does not suddenly open a pcap socket)
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX AF_NETLINK


# for processes that don't care about storage (e.g. haveged) this is a very useful option that runs it in a completely
# private username space with dynamically allocated UID; unfortunately IPFS needs access to its home directory for storage
# I can't use it here
# DynamicUser=yes

# disallow real-time scheduling for the process
RestrictRealtime=yes
CPUSchedulingPolicy=batch

MountAPIVFS=yes
LockPersonality=yes
ProtectClock=yes
MemoryDenyWriteExecute=yes
ProtectKernelLogs=yes
ProtectControlGroups=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectHostname=yes
RestrictNamespaces=yes

Now, after installing such service we may use another cool utility: systemd-analyze security. When ran without any further parameters, it will display summary of security of all installed services (I trimmed it a bit to just show relevant ones):

# systemd-analyze security
UNIT                                 EXPOSURE PREDICATE HAPPY
dbus.service                              9.6 UNSAFE    :-{  
emergency.service                         9.5 UNSAFE    :-{  
getty@tty1.service                        9.6 UNSAFE    :-{  
haveged.service                           3.1 OK        :-)  
ipfs.service                              2.0 OK        :-)  
shadow.service                            9.6 UNSAFE    :-{  
sshd.service                              9.6 UNSAFE    :-{  
systemd-journald.service                  4.4 OK        :-)  
systemd-resolved.service                  2.2 OK        :-)  
systemd-timesyncd.service                 2.1 OK        :-)  
systemd-udevd.service                     6.9 MEDIUM    :-|  

Reasons why so many of the are "unsafe" (read: have little confinement options applied) may be different. For some of them the package authors might just have not applied them yet. Some, such as getty (text console) or sshd you can't really confine without preventing the administrator who may be using them from running basic commands such as sudo.

For further hardening of specific services you can run systemd-analyze as shown below (output trimmed for brevity). Based on the results, specifically the lines with - in front, you can harden the service even further, making sure it still works each time you enabled specific control.

# systemd-analyze security ipfs.service
  NAME                                                DESCRIPTION                                                   EXPOSURE
- PrivateNetwork=                                     Service has access to the host's network                           0.5
+ User=/DynamicUser=                                  Service runs under a static non-root user identity                    
+ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)        Service cannot change UID/GID identities/capabilities                 
+ CapabilityBoundingSet=~CAP_SYS_ADMIN                Service has no administrator privileges                               
+ CapabilityBoundingSet=~CAP_SYS_PTRACE               Service has no ptrace() debugging abilities                           
- RestrictAddressFamilies=~AF_(INET|INET6)            Service may allocate Internet sockets                              0.3
+ RestrictNamespaces=~CLONE_NEWUSER                   Service cannot create user namespaces                                 
+ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)   Service cannot change file ownership/access mode/capabilities         
- CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service may override UNIX file/IPC permission checks               0.2
+ CapabilityBoundingSet=~CAP_NET_ADMIN                Service has no network configuration privileges                       
+ CapabilityBoundingSet=~CAP_RAWIO                    Service has no raw I/O access                                         
+ CapabilityBoundingSet=~CAP_SYS_MODULE               Service cannot load kernel modules                                    
+ CapabilityBoundingSet=~CAP_SYS_TIME                 Service processes cannot change the system clock                      
- DeviceAllow=                                        Service has a device ACL with some special devices                 0.1
- IPAddressDeny=                                      Service does not define an IP address whitelist                    0.2
+ KeyringMode=                                        Service doesn't share key material with other services                
+ NoNewPrivileges=                                    Service processes cannot acquire new privileges                       
…

I'm on Mastodon and Twitter, feel free to comment!