K1000 as Virtual Appliance - Tasks not running as expected
We recently migrated our K1000 (6.2) from a hardware K1100 appliance to Virtual running within VMware ESXi/vSphere on a Dell PowerEdge 2950.
All "appears" to be functioning correctly and we've been running with this setup for a few months without any real issues. However, after we started looking closer at some things we've noticed that all is not quite 100%.
At least once a week the K1000 clock suddenly loses sync. We’ll arrive in the office at 8am only to find the K1000 thinks its 4am which really messes up a lot of things. From the KB I’ve seen this is due to ESXi not being man enough for the job, but at the moment I don’t have another option.
Also...under Communication Settings we have our Inventory set to 2 hours (7.66 connections per minute). If I look at the Inventory though there are machines in the list that have active AMP connections but have a Last Inventory time of “3 hours, 25 minutes”. I don’t recall this being the case when we were on hardware. The machines would inventory every 2 hours as instructed and no machine with an active AMP connection would go over 2 hours without performing an Inventory. If I check Agent Tasks under troubleshooting I can see that as far as the K1000 is concerned these machines were due to check-in after 2 hours as planned, but haven’t. They will eventually check-in.
We have the same problem with Patching, if I set a Detect task to run at 9am it won’t actually start at 9am. I’ll see machines that have been on all day running their Detect job at 2pm! The same happens with Deploy tasks. Save and Run Now doesn’t work at all with Patching.
As you can imagine, this is makes scheduling downtime and maintenance impossible.
Under Communication Settings I can see that our Load Average bounces between 0.8-1.8. I haven’t seen it go much higher.
All clients are also using the latest KACE Agent, however this problem existed on the older Agent too.
I plan to raise a support ticket for these issues, but I thought I’d reach out for ideas here too.
Answers (1)
Comments:
-
Hi, thanks for the reply.
I've taken a look at your advice and a lot of it all looks fine.
We have 900 clients.
- Over the last month, our Apache processes have been no higher than 25. However, if I look at the "Number of Processes" graphs I can see that our K1 has 175 processes running on average.
- We only have two scripts enabled, and both of those are Offline Scripts.
- I've changed our agent settings to mirror yours just out of interest. I can't say this has made any difference.
Here is a quick snippet of our Konductor Log. The first thing that jumps out at me is that our lv values are often well over the lt values. I don't honestly know what that means though.
[2014-12-22 12:24:30 +0000] Konductor[1522] [main] stats [s:2352 t/s:0 t/tc:8 t:1268 tc:148 c:1226 cc:175 sl:30 sc:2271 tpl:14 apa:11 lt:1 lv:7.32]
[2014-12-22 12:25:03 +0000] Konductor[1522] [main] stats [s:2385 t/s:0 t/tc:8 t:1268 tc:148 c:1242 cc:176 sl:30 sc:2301 tpl:14 apa:13 lt:1 lv:9.35]
[2014-12-22 12:25:33 +0000] Konductor[1522] [main] stats [s:2415 t/s:0 t/tc:8 t:1268 tc:148 c:1256 cc:177 sl:30 sc:2331 tpl:14 apa:12 lt:1 lv:8.8] - Arcolite 9 years ago-
Your LV is really high. How many processors do you have allocated on your VM and how much memory?
How do your long tasks, task throughput, and mySQL long queries graphs look? - htomlinson 9 years ago
-
Also, under scripts, I can see K1000 Scripting Updater (ID=3) in the list of enabled scripts. I'm not sure I'm supposed to be able to see/modify that. - Arcolite 9 years ago
-
Ours is running at 171 processes, so I would assume that is probably normal. Regarding script 3, indeed that is a online script used by the kbox; do not change it.
Your LV is higher than ours normally is. We typically see a .0x load level or sometimes 5 or 6.
Have you rebooted? Might be worth putting in a support ticket to take a closer look. - Jbr32 9 years ago-
The appliance has had a few reboots the last few days. Sometimes it's OK, other times not.
I've raised a support ticket but with Christmas I think it'll be the new year before any answers (totally fine with that!). - Arcolite 9 years ago -
Now the New year madness is out of the way I've been able to sit down and look at this again.
As I test, I setup a small batch of machines to start a patching run on.
These machines have already been doing a Detect daily, so I created a Detect and Deploy task and clicked "Save and Run Now", expecting it would happen within the next few minutes.
Nothing happened...this was at about 10am this morning.
If I looked under Agent Tasks I could see that the tasks were ready, but nothing was happening. All the machines had a valid AMP connection.
An hour later, the machines started patching!
I managed to grab a screenshot of this from the Agent Tasks - http://imgur.com/A1v8Ptw
The same thing happens with Scheduled Patch runs. They never start when you tell them too. - Arcolite 9 years ago