About 1/3 agents "error (timeout)" on Detect schedules since moving server to 10.1 and agents to 10.1
Since moving K1000 server to 10.1 and agents to 10.1, about a third of our clients fail their usual Detect schedules with "error (timeout)". It's not 100% always the same clients each time, but about 95% the same clients. I've seen a few of the "bad clients" fail, then succeed on the next, and then fail again. I'm not great at reading the KAgent logs but it looks like when Detects start, the client starts detecting, but never finishes.
I've tried these steps:
Uninstalling AV
Forcing Server Retrust via KAT
Uninstalling and reinstalling 10.1 agent
Manually deleting Patches folder
Running a Detect on just 10 of the bad clients, timeout set at 8 hours
and all fail, usually "timeout".
Talked with Support who eyeballed our K1000 settings. The last I heard there was "upgrade Win10 version and retry" (most clients, including the 100s that are successful, are on 1803 soon to move to 1909). However this had no effect (one of the bad clients is my own PC! which has been on 1903 for some time).
The one thing I can think of that this group of "bad" clients has in common is that all of them have been in place for several years, and have lived through several Kace server and agent updates. However, there are many clients that fit this description that Detect just fine.
I really am at a loss and running out of things to try while I wait for Support to get back to me.
Answers (2)
Please use the Kace Agent Toolkit to collect the logs from a few of these devices that are having an issue and attach to your support ticket.
This will help support with the troubleshooting process. Thanks
Top Answer
I think the answer to this question was that the smartlabel I was detecting against had too many patches. After some editing of the label, detect success rate went way up. I still don't quite understand why some clients take far longer to detect than others (similar hardware, at the same location, etc) but at least it's quite improved.
One thing Support had me try today and we're retesting, is altering the SmartLabels for the patches being detected -- I hadn't limited it to Active patches only. So far it doesn't appear to be making a difference -- the "good" clients get done in an hour, and the "bad" ones are still going, but maybe they will succeed before the 6 hours runs out. Will update later today or tomorrow. Thank you! - agibbons 4 years ago
Are the devices local to the SMA or in a remote location? If remote, are you using replication? - KevinG 4 years ago