We're updating the issue view to help you get more done. 

Remove PID files

Description

If we rely on SystemD for managing daemons, then we do not need PID files anymore.

This should be done for every daemon that is not a "forking" SystemD service. (e.i. all except asterisk, wazo-confgend and wazo-provd)

To remove PID file we also need to transfer restart logic from monit to systemd too.

And example of service file

Some questions remain:

  • Should we increase a delay between restart? (default: RestartSec=100ms)

    • Yes, if we have the use case: if service fail its restart because it wait to another service, then we should add a reasonable delay between retry)

    • Old behavior (with monit) tried to restart every 2 min

    • A nice delay would be 4-5 seconds

  • Should we decrease/change interval for StartLimitIntervalSec}}or {{StartLimitBurst}}because manual restart also count in the limit and it can be painful when we develop (e.i. use {{systemctl reset-failed to unlock it)

    • we can multiply these values by some cycle to detect loop but allow to dev without issue

      • ex: instead to have StartLimitIntervalSec=3m and StartLimitBurst=5
        we can have 3 cycles with StartLimitIntervalSec=9m and StartLimitBurst=15

To calculate StartLimitIntervalSec:

  • StartLimitBurst* (RestartSec+ <init service time>) = 5*(5s+5s) = 50s

If we multiple by 3 cycles to avoid manual restart issue, we have the following values

  • StartLimitBurst=15

  • StartLimitIntervalSec=150

  • RestartSec=5

NOTE:

  • services using twistd still use PID (i.e. wazo-provd and wazo-confgend)

  • services using celery still use PID (i.e. wazo-webhookd)

    • IMO (fblackburn): it should not be to the daemon to start sub process, but should be to the admin to scale processes as desired → thus PID would be removed in this scenario

  • services using pidfile for script still use PID (i.e. wazo-call-logs, xivo-stat, wazo-purge-db)

    • needed to avoid running command twice

  • services using custom pid logic still use PID (i.e. xivo-dxtora)

Environment

None

Activity

Show:
Sébastien Duthil
June 26, 2020, 6:28 PM

We could also configure monit to monitor systemd service status instead of PID file. I see no easy way to achieve the same behavior than monit (restart max 5 times) with systemd.

François Blackburn
June 30, 2020, 2:11 PM

I confirm that when you hit the StartLimitBurst and service went in failure, systemd will not tried to restart it after the StartLimitIntervalSec

Assignee

François Blackburn

Reporter

Sébastien Duthil

Labels

Approvers

None

Pair

None

Sprint

None

Fix versions

Epic Link

Priority

Medium
Configure