Upgrade from 13.0 to 13.1 breaks the box

Question

We recently attempted to upgrade from KACE SMS 13.0 to 13.1 and the upgrade process completed successfully according to the upgrade webpage. However, upon restart, the box complained about missing files as soon as it attempted to initialize mysql. All consecutive services failed to initialize as well, and we ended up with a box that we can no longer access via web interface. KACE support suggested downloading a 13.0 image from their website and restoring the database from our backups. That got the box running again and we only lost 24 hours worth of work which was annoying but tolerable. We attempted to upgrade to 13.1 again with the same exact result. KACE support so far has been less than helpful with pre-canned answers that we reply to only to be asked same questions over and over again. Has anyone else experienced issues upgrading to 13.1 from 13.0? We are running the appliance in Server 2016 Hyper-V environment.

Accepted Answer

SMA Principal Developer here. The dev team is following this issue closely, but we cannot reproduce it in-house and are unable to diagnose it without investigating an affected system pre- and post-upgrade. There are only 2 customers in this thread with the same upgrade issue/symptoms (described in the original post) so far, and both are in touch with Support to continue investigating. If you are affected and have not yet reached out to our Support team, we encourage you to do so. We'll need your cooperation to get to the root cause.

This appears to be an isolated issue, as hundreds of customers have upgraded to 13.1 without issue. Unfortunately, until we understand what is going on here we cannot offer any sort of solution.

Rest assured we are taking this issue very seriously, and we apologize for the inconvenience caused to our affected customers.

EDIT 5/11/23 @ 12:19PM CDT:

Root cause discovered with assistance from dstarrisom (thanks so much!). This issue is being tracked as K1-34089 and can be referenced when discussing with support. The issue is caused by the secure remote database access certificate (configurable under Control Panel -> Security Settings) being either expired or too weak for the upgraded versions of openssl and MySQL in SMA 13.1.74. Customers can preemptively regenerate these certificates on that page prior to upgrade (if you've bumped into this and reverted snapshots or restored from backups) by choosing the "Override Default Certificates" option under the "Enable Secure database access (SSL)" option on Control Panel -> Security Settings page, checking the box labeled "Reset to Default Certificate Files" and clicking "Save and Restart Services". This will regenerate the certificates prior to upgrade and everything should come up fine after your next upgrade attempt. If you are unable to or would rather not revert/restore, the issue can also be resolved post-upgrade by Support staff but requires backend access and is not something you can DIY.

Thank you to everyone for your patience and assistance in finding a swift solution for this issue. We will take steps to prevent this from occurring with future upgrades.

Answer

Thanks for posting your findings team,@dstarrisomThese screenshots point to mysqld daemon being down
ot running.Without mysql many other servies will not start.We could start by checking the upgrade.log file, to see how far it went, and if it failed in the middle ground, might be a good a idea to restore from backups and perform a mysqlcheck.Contact support for this.Also to everyone here, snapshots are not supported:https://support.quest.com/kb/4368176/regarding-third-party-virtual-machine-backup-and-kace-appliancesSee this post:https://communities.vmware.com/t5/Virtual-Machine-Guest-OS-and-VM/Snapshots-of-servers-with-databases-how-stable-is-it/td-p/1051928Snapshotting a VM with MYSQLD writing transactions into the DB, is a VIP ticket to Filesystem and DB issues; ideally you should power off the VM, then take a snapshot, and then power ON the VM... but  this scenario does not occur in real life, where 24/7 business time and downtimes are limited.Back to the first Link.See this post for a related topic:https://communities.vmware.com/t5/VMware-vSphere-Discussions/VM-snapshot-problems-with-databases-circumvent-by-shutting/td-p/2890115You might want to share your case numbers here, since part of the support team is monitoring ITNinja.(the outcome might be the same, upgrade failed\crashed the SMA, but the reason behind might be different for some of you).

Answer

this is really unusual. but support is the only who can help.
Is this a physical or virtual appliance?
And usually restoring is the fastest way. So verify with support what happened.
It is helpful to open a tether before the update (since after it iseems not to be possible according what you wrote) and let

Answer

After the upgrade to 13.1, this flickers across the screen:Then the system boots to this:

Answer

Absolutely identical! The errors start appearing when mysql is getting initialized. The box retains its network connection and can be ssh'd into, but when I initially contacted support hoping they'd want to ssh and see what's happening they instead suggested starting with a new 13.0 image and DB restore. The restore worked as expected, but another update to 13.1 failed with same symptoms. What was your experience with Quest support?

Answer

Glad to hear it's not just us!  We're a VMware environment, so I'm guessing we can rule that out (not that I really thought it had anything to do with the problem).

Answer

In corresponding with support this morning, they are interested in me running a file system check.

Support has also reviewed this question/thread and been asked to consider that maybe there is a systematic issue. Response: "As for now no defect has being identify with the upgrade."

Answer

I am surprised they are not interested in ssh-ing and examining the logs

Answer

I have a late morning appointment tomorrow for a tech to do a remote meeting and execute a filesystem check via Putty using root credentials.  More to follow after that meeting and a subsequent update attempt.

Answer

I created another SMA VM on a different machine and restored our database to it, and then attempted to upgrade to 13.1. Same result with files missing on reboot. It's definitely not a host machine issue.

Answer

this is a virtual appliance running in Server 2016 Hyper-V. No error messages during the upgrade, see the screenshot

Answer

We had the same problem at the weekend. (VMWare)

[Sun May 7 11:59:56 CEST 2023] [notice] applying Infrastructure Upgrades...
[Sun May 7 11:59:50 CEST 2023] [notice] Starting software updates ...
[Sun May 7 11:59:50 CEST 2023] [notice] DB update completed.
[Sun May 7 11:59:50 CEST 2023] [notice] restore_report_schedules done

After this nothing more happend. So i contacted the Support. After they looked the appliance they said, that we need to reinstall the appliance with the last backup. I hope that the next try runs fine. But this time with a snapshot of the VM.

Answer

I believe we have the same issue. Case#02104841

Our system is virtual, so I did a snapshot pre-upgrade and just kept rolling back whenever it'd fail.

Support was given tether access to the system. They did something to the DB and asked us to try upgrading again. Still broken. Support gave up and asked us to start with a new OVF deployment and restore the backup.

I originally asked support if multiple customers were complaining about this and was told 'no, it's just you." I'll be replying with a link to this thread in my case suggesting they have a systemic issue.

Answer

I enabled tether and support just emailed saying they can't find anything wrong and that I should backup the DB and attempt to upgrade again. This has got to be something in the upgrade scripts since I am not the only one reporting this issue

Answer

I cannot leave it broken for any length of time and coordinating with support could take hours.  After a failed attempt, I usually just roll back.  Might have to consider some additional coordination or just rolling out the 13.1 OVF and trying to reload the 13.0 backups.

Answer

yes, their support response time and level of expertise seems to have gone downhill

Answer

Per support, we updated to 13.0 to see if an issue with shellscript dependencies NOT pulling from replication shares would be fixed... (NOT)

Then we updated to 13.1 (had NO ISSUES with the update) shellscript dependencies issue still BROKEN (and support confirmed the test lab also has same issues)

However, after the update to 13.0 (and 13.1) if we try to email in a process starting ticket to our employment queue the process does NOT launch properly!!

And support confirmed that they are have the same issue on their end... it has been over a week... many many logs, tether enabled etc...

just FYI to anyone else that might trigger major processes through email...

Luckily the 1st ticket (parent ticket) is created, and we can start a manual process with the info from the parent ticket, but what a pain in the arse, considering all the custom work (and paid professional services work) that is now not 100% functional because of their crappy update(s)

Also if you have reports that are scheduled to e-mail the results (in hmtl or txt format) into a ticket queue the .html or .txt file is somehow removed during the process ugh! and you get a ticket without the attachment ugh!

Hope they get this fixed soon!

J

Upgrade from 13.0 to 13.1 breaks the box

Answers (17)

Posted by:

View more:

Related Questions

Related Links

Related Posts

KACE Product Support

Upgrade from 13.0 to 13.1 breaks the box

Answers (17)

Posted by:

Don't be a Stranger!

View more:

Related Questions

Related Links

Related Posts

Share