Systemd kills Ossec service immediately after start

http://unix.stackexchange.com/questions/200280/systemd-kills-service-immediately-after-start

Question:

I’m writing systemd unit file for OSSEC HIDS. Problem is that when systemd starts service it immediately stops them.

When I use that ExecStart directive all working fine.

ExecStart=/var/ossec/bin/ossec-control start

But when I make small improvement I fine in OSSEC logs, that it receive SIG 15 after start.

ExecStart=/bin/sh -c '${DIRECTORY}/bin/ossec-control start'

If i make another small change service will receive SIG 15 after 20 seconds.

ExecStart=/bin/sh -c '${DIRECTORY}/bin/ossec-control start && sleep 20'

So, I guess, that systemd kills /bin/sh process after service start, and bin/sh then kills OSSEC.

How can I solve this problem?

Answer:

readiness protocol mismatch

As Wieland implied, the Type of the service is important. That setting denotes what readiness protocol systemd expects the service to speak. A simple service is assumed to be immediately ready. A forking service is taken to be ready after its initial process forks a child and then exits. A dbus service is taken to be ready when a server appears on the Desktop Bus. And so forth.

If you don’t get the readiness protocol declared in the service unit to match what the service does, then things go awry. Readiness protocol mismatches cause services not to start correctly, or (more usually) to be (mis-)diagnosed by systemd as failing. When a service is seen as failing to start systemd ensures that every orphaned additional process of the service that might have been left running as part of the failure (from its point of view) is killed in order to bring the service properly back to the inactive state.

You’re doing exactly this.

First of all, the simple stuff: sh -c doesn’t match Type=simple or Type=forking.

In the simple protocol, the initial process is taken to be the service process. But in fact a sh -cwrapper runs the actual service program as a child process. So MAINPID goes wrong and ExecReload stops working, for starters. When using Type=simple, one must either use sh -c 'exec …' or not use sh -c in the first place. The latter is more often the correct course than some people think.

sh -c doesn’t match Type=forking either. The readiness protocol for a forking service is quite specific. The initial process has to fork a child, and then exit. systemd applies a timeout to this protocol. If the initial process doesn’t fork within the allotted time, it’s a failure to become ready. If the initial process doesn’t exit within the allotted time, that too is a failure.

the unnecessary horror that is ossec-control

Which brings us to the complex stuff: that ossec-control script.

It turns out that it’s a System 5 rc script that forks off between 4 and 10 processes, which themselves in their turn fork and exit too. It’s one of those System 5 rc scripts that attempts to manage a whole set of server processes in one single script, with for loops, race conditions, arbitrary sleeps to try to avoid them, failure modes that can choke the system in a half-started state, and all of the other horrors that got people inventing things like the AIX System Resource Controller and daemontools two decades ago. And let’s not forget the hidden shell script in a binary directory that it rewrites on the fly, to implement idiosyncratic enable and disable verbs.

So when you /bin/sh -c '/var/ossec/bin/ossec-control start' what happens is that:

  1. systemd forks what it expects to be the service process.
  2. That’s the shell, which forks ossec-control.
  3. That in turn forks between 4 and 10 grandchildren.
  4. The grandchildren all fork and exit in turn.
  5. The great-grandchildren all fork and exit in parallel.
  6. ossec-control exits.
  7. The first shell exits.
  8. The service processes were the great-great-grandchildren, but because this way of working matches neither the forking nor the simple readiness protocol, systemd considers the service as a whole to have failed and shuts it back down.

None of this horror is actually necessary under systemd at all. None of it.

a systemd template service unit

Instead, one writes a very simple template unit:

[Unit]
Description=The OSSEC HIDS %i server
After=network.target 

[Service]
Type=simple
ExecStartPre=/usr/bin/env /var/ossec/bin/%p-%i -t
ExecStart=/usr/bin/env /var/ossec/bin/%p-%i -f

[Install]
WantedBy=multi-user.target

Save this this as /etc/systemd/system/ossec@.service.

The various actual services are instantiations of this template, named:

  • ossec@dbd.service
  • ossec@agentlessd.service
  • ossec@csyslogd.service
  • ossec@execd.service
  • ossec@agentd.service
  • ossec@logcollector.service
  • ossec@syscheckd.service
  • ossec@maild.service
  • ossec@analysisd.service
  • ossec@remoted.service
  • ossec@monitord.service

Then enable and disable function comes straight from the service management system (with RedHat bug 752774 fixed), with no need for hidden shell scripts.

 systemctl enable ossec@dbd ossec@agentlessd ossec@csyslogd ossec@maild ossec@execd ossec@analysisd ossec@logcollector ossec@remoted ossec@syscheckd ossec@monitord

Moreover, systemd gets to know about, and to track, each actual service directly. It can filter their logs with journalctl -u. It can know when an individual service has failed. It knows what services are supposed to be enabled and running.

By the way: Type=simple and the -f option are as right here as they are in many other cases. Very few services in the wild actually signal their readiness by dint of the exit, and these here are not such cases either. But that’s what the forking type means. Services in the wild in the main just fork and exit because of some mistaken received wisdom notion that that’s what dæmons are supposed to do. In fact, it’s not. It hasn’t been since the 1990s. It’s time to catch up.