Zabbix Issues
Jump to navigation
Jump to search
Solving the alert: Zabbix unreachable poller processes more than 75% busy https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/400962-solving-the-alert-zabbix-unreachable-poller-processes-more-than-75-busy
Solving the alert: Zabbix unreachable poller processes more than 75% busy 11-05-2020, 14:30 Hello, we were for a long time plagued by the alert: Zabbix unreachable poller processes more than 75% busy There is a lot of info about this message on the net, but none really helped me. My main problem was finding out what exactly the unreachable pollers were doing. So I thought I'd share what I've discovered, even it might not be 100% correct. I am still a zabbix newby, so feel free to correct where necessary or provide better methodology STEP 1: Cleaning up unreachable items Go to Configuration > Hosts, click on any random 'items' link. Open the filter, and clean all fields to emtpy/all/.... IMPORTANT: This includes the 'Host' field you just filled Change State from all to Not supported. This will cause Status to change to Enabled. Searching produces a report of all items that are unpollable. Unfortunately, it also includes items from disabled hosts. I disabled any item that had no chance of becoming available. STEP 2: Cleaning up unreachable hosts. Go again to Configuration > Hosts Look at the column 'Availablity' with Red/green leds for ZBX|SNMP|JMX|IPMI Everything red takes up capacity from an unreachable poller. Again I disabled any host that would never come up again STEP 3: Finding out what the unreachable pollers are doing. This is what led me to discover step 2. Open a linux terminal and do something like ps axu|grep -i unreachable Note the unreachable pollers that are slow. E.g. I had some saying 1 item in 60 seconds. Note the PID (of the thread, not of the whole zabbix process) Use strace to find out what that thread is doing, e.g. strace -p 1234 I got some IO on an IP adress (bingo) and a select on fd 0 with time out of 30 seconds. For the fd number, do something like ls -hal /proc/1234/fd/0 , this is for PID 1234 and FD 0. You can now see what file/socket/... is causing the slowdown. This also yielded an interesting fact: