Checking for critical infrastructure failures with Wazuh

One of my favourite features of Wazuh is command monitoring which, combined with rules, allows creating sophisticated sanity checks on critical infrastructure services.

Wazuh is feature-rich open-source host log analyser and intrusion detection system. In short, it reads log files and applies rules to detect events of interest. Instead of files however Wazuh can also read output of scripts and commands. For example:

<localfile>
    <log_format>command</log_format>
    <command>curl -s6I --compressed --tcp-fastopen https://krvtz.net/ | head -1</command>
    <alias>curl krvtz.net</alias>
    <frequency>120</frequency>
</localfile>

This command is expected to return just one output under normal circumstances: HTTP/2 200. In Wazuh logs it will appear as ossec: output: 'curl krvtz.net' ... which is what we're matching against. Usually in Wazuh rules we will specify patterns of attacks, errors and failures we want to detect — however, in this case number of bad outputs is huge (TCP timeouts, DNS resolution errors, HTTP errors) but there's just one good output. This is why we reverse the logic:

<group name="curl,">

    <rule id="100080" level="12">
        <if_sid>530</if_sid>
        <match>^ossec: output: 'curl</match>
        <description>Curl connection check</description>
    </rule>

    <rule id="100082" level="0">
        <if_sid>100080</if_sid>
        <match>HTTP/2 200</match>
        <description>Curl check: OK</description>
    </rule>

</group>

With these rules, any output from curl will internally produce a level 12 critical alert — but then the next rule will match the expected good output and mask the alert, simply keeping silent as long as long as everything is OK.

I'm applying the same logic to DNSSEC validation of my domains using BIND's delv utility:

<localfile>
    <log_format>command</log_format>
    <command>delv krvtz.net +cd | head -1</command>
    <alias>delv</alias>
    <frequency>3600</frequency>
</localfile>

Same as above, we expect just one good output (fully validated) and anything else will result in a critical alert:

<group name="dnssec,">

    <rule id="100090" level="12">
        <if_sid>530</if_sid>
        <match>^ossec: output: 'delv'</match>
        <description>DNSSEC validation check</description>
    </rule>

    <rule id="100091" level="0">
        <if_sid>100090</if_sid>
        <match>fully validated</match>
        <description>DNSSEC validated</description>
    </rule>

</group>

Wazuh is a feature-rich but complicated beast and its rule syntax is far from intuitive, to put that lightly. Wazuh documentation is your friend but feel free to contact me if you're looking for someone with experience in planning, configuring and maintaining very large Wazuh deployments on hundreds of servers.

One more thing: there's many systems designed for service monitoring that allow writing arbitrary tests (e.g. Sensu) so why even consider something that was initially written for intrusion detection only? Wazuh has one significant advantge: is very lightweight and blazignly fast.

I'm on Mastodon and Twitter, feel free to comment!