Page 1 of 2

Node 22

Posted: Thu Aug 09, 2012 8:43 am
by Admin
Node 22 rebooted this night.
This is the one holding currently ongoing VZ3 offer on LEB.
Preliminary research into the cause indicate possible abuse.
Salvatore took the opportunity to upgrade kernel and distro to CentOS 6.3.
We apologise for the inconvenience.
Will keep you informed.
Admin

Re: Node 22

Posted: Sun Aug 12, 2012 7:25 pm
by Admin
10 minutes ago, node 22 rebooted again.
Salvatore in on it, we'll keep you posted.

Admin

Re: Node 22

Posted: Thu Aug 23, 2012 3:31 pm
by Admin
While there were no more reboots, we still have no idea why it happened, we are waiting for an OVZ kernel patch that will hopefully solve this problem because it does fix some lockups.
Once that is in place there will be again serious stock on low end plans for ovz.

Admin

Re: Node 22

Posted: Fri Aug 24, 2012 7:35 am
by Admin
Yes, it happened again.
3 times in a month is no coincidence (totalling a bit over an hour of downtime) and Uncle Sal is taking the steps needed to move all containers from node 22. Some will end up on node 15 some on node 24 which is same E5-24 HT cores model which was prepared after the last crash as an emergency replacement in case it happens again.
This time there was a kernel panic, and, even though the machine is brand new, there might be a hardware fault some place (probably CPU, munin was showing some weird stuff before last crash) in my view.
Even though Uncle Sal doesn't think it is a hw problem and his faith in Supermicro is unshaken, he is moving everyone off node 22 and will test it thoroughly afterwards. Will keep you posted, as usual.

Admin

Re: Node 22

Posted: Fri Aug 24, 2012 3:37 pm
by Admin
Migration started already and there was already an incident in which a PPTP module caused node 15 to crash.
Now migration proceeds offline, so the downtime will be longer, but at least nodes should no longer crash.

Admin

Re: Node 22

Posted: Fri Aug 24, 2012 10:09 pm
by unclesal
Migration is still proceding. I expect it will proceed for all the (Italian) night and more :)

UPDATE

Posted: Sat Aug 25, 2012 9:58 am
by unclesal
all vps with IP 192.X.X.X are now on PM24. During the migration we noticed that the same CPU oddity observed on the munin graphs showed on PM24, sign that it is a software problem more likely (take that @admin!). So I decided to install an older kernel on PM22 and keep there some vps for a few days.

S.

Re: UPDATE

Posted: Sat Aug 25, 2012 8:13 pm
by Admin
unclesal wrote: sign that it is a software problem more likely (take that @admin!).
OK, you win :)
Hopefully node 24 won't crash and you will find the right hw/kernel combination for those Supermicros.
Good luck because this is no longer science, more like trial and error, man against machine, etc...

Admin

Re: Node 22

Posted: Sun Aug 26, 2012 8:06 am
by unclesal
Tonight (Italian time) PM24 rebooted, this confirm that the last kernel is the worst of the row.

First of all I'm sorry for this.

I'm now going to downgrade the kernel to an older version.

Re: Node 22

Posted: Sun Aug 26, 2012 10:21 am
by unclesal
The node was just rebooted with the older kernel release. Crossing the fingers :)