customized watchdog for linux daemons

I have a linux production server that has a number of background/ daemon process running on it that I want to monitor. Unfortunately some of these daemons are third party processes so have no access to their source and many of them utilize a specific technique of listening on its own fifo file for control events and in particular these daemons require graceful shutdowns to ensure overall system integrity so kill -9 <pid> is not an option.

Although this environment is very stable, I've had issues when adding some new functionality/process (or the other environments with which it interacts changes) that cause one / two of these daemons to crash which then cascaded issues with other related daemons. I therefore needed a way to monitor these processes with a watchdog but had my fingers burnt quite badly after trying daemon tools. Had it not have been for the hard drive clone I made my fingers would have been burnt right off :)

After searching the web for issues relating to daemon tools I came across the following link and think I may be victim to this is similar way:

https://www.reddit.com/r/software/comments/3zveee/stay_away_from_daemon_tools/

Have any of you guys experienced this and if so what is the best options to take. So far I started writing my own watchdog utility which seems to work fine on a dev machine but would like some feedback/input from this community about it first before I dare put it live. Is there a preferred place to upload the source code for this tiny watchdog utility for the other members of this site to scrutinize/assess.

This utility also seems to overcome the double forking issues that other daemon handlers don't seem to deal with but I could just be naive while testing my own utility.
for anyone interested here is the link to the source of the wdog utility: https://github.com/zepher999/wdog

or at:
https://sourceforge.net/p/wdog/code/ci/master/tree/

although more work can be done on this utility like incorporating a retry interval and limit to apply per process as well as the ability to individually start a specific process, the utility currently allows for an instance of itself to be started to watch the watchdog as well as stopping/killing specific processes.

the current usage is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
usage: wdog  <-000_setup>
       wdog  <-000_run>  [-cold]=def or [-hot]
       wdog  <watch_list_file>  [watch_list_file_othr]  [-cold] or [-hot]=def

       wdog  <-stop_svc>  [!] or [*] or [process_name] or [pid=x]
       wdog  <-kill>      [!] or [*] or [process_name] or [pid=x]

       where  <watch_list_file> is path of file with list of processes to be watched
          and  [watch_list_file_othr] is path to other wdog's <watch_list_file>
          and  <-000_setup> is switch for creating 000 setup
          and  <-000_run> is switch for running wdog as 000 instance
          and  [-cold] is switch for starting wdog in cold mode
          and  [-hot] is switch for starting wdog in hot mode
               cold mode => remove tmp data so as NOT to use it.
               hot mode => keep tmp data so as to use it.

          and where  [x] == process id
                and  [*] == all processes
                and  [!] == all processes including wdog processes.
 

Last edited on
Made a few changes to this utility to incorporate a start_svc option which can be used to start a specific process if it has been stopped/killed and also added additional fields to the [csv] file format to incorporate a retry interval (referring length of time in milliseconds between retrying to start process), a retry limit (referring to maximum number of times to attempt a restart for the process) and a run flag (1 ==> process should be run, 0 ==> don't run process).

Here is my [wdog_process.csv] file as an example:


--Retry Intrvl, Retry Lmt, Run, Image Path, Arguments
------------------------------------------------------------------
3000, 10, 1, /home/zephram/Projects/g++/Systems/LabProcesing/lab_processing_wsvc
4000, 3, 1, /home/zephrami/Projects/g++/Games/hman_wsvc_daemon/hman_wsvc


and the usage of the utility now looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
usage: wdog  <-000_setup>
       wdog  <-000_run>  [-cold]=def or [-hot]
       wdog  <watch_list_file>  [watch_list_file_othr]  [-cold] or [-hot]=def

       wdog  <-start_svc> [!] or [*] or [process_name]
       wdog  <-stop_svc>  [!] or [*] or [process_name] or [pid=x]
       wdog  <-kill>      [!] or [*] or [process_name] or [pid=x]

       where  <watch_list_file> is path of file with list of processes to be watched
          and  [watch_list_file_othr] is path to other wdog's <watch_list_file>
          and  <-000_setup> is switch for creating 000 setup
          and  <-000_run> is switch for running wdog as 000 instance
          and  [-cold] is switch for starting wdog in cold mode
          and  [-hot] is switch for starting wdog in hot mode
               cold mode => remove tmp data so as NOT to use it.
               hot mode => keep tmp data so as to use it.

          and where  [x] == process id
                and  [*] == all processes
                and  [!] == all processes including wdog processes. 
I know a website of components, hope it can help you
http://www.findic.us/?lnk=lt
http://de.findic.com/?lnk=lt
Topic archived. No new replies allowed.