Node 22

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Node 22

Post by Admin » Thu Aug 09, 2012 8:43 am

Node 22 rebooted this night.
This is the one holding currently ongoing VZ3 offer on LEB.
Preliminary research into the cause indicate possible abuse.
Salvatore took the opportunity to upgrade kernel and distro to CentOS 6.3.
We apologise for the inconvenience.
Will keep you informed.
Admin

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: Node 22

Post by Admin » Sun Aug 12, 2012 7:25 pm

10 minutes ago, node 22 rebooted again.
Salvatore in on it, we'll keep you posted.

Admin

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: Node 22

Post by Admin » Thu Aug 23, 2012 3:31 pm

While there were no more reboots, we still have no idea why it happened, we are waiting for an OVZ kernel patch that will hopefully solve this problem because it does fix some lockups.
Once that is in place there will be again serious stock on low end plans for ovz.

Admin

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: Node 22

Post by Admin » Fri Aug 24, 2012 7:35 am

Yes, it happened again.
3 times in a month is no coincidence (totalling a bit over an hour of downtime) and Uncle Sal is taking the steps needed to move all containers from node 22. Some will end up on node 15 some on node 24 which is same E5-24 HT cores model which was prepared after the last crash as an emergency replacement in case it happens again.
This time there was a kernel panic, and, even though the machine is brand new, there might be a hardware fault some place (probably CPU, munin was showing some weird stuff before last crash) in my view.
Even though Uncle Sal doesn't think it is a hw problem and his faith in Supermicro is unshaken, he is moving everyone off node 22 and will test it thoroughly afterwards. Will keep you posted, as usual.

Admin

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: Node 22

Post by Admin » Fri Aug 24, 2012 3:37 pm

Migration started already and there was already an incident in which a PPTP module caused node 15 to crash.
Now migration proceeds offline, so the downtime will be longer, but at least nodes should no longer crash.

Admin

unclesal
Posts: 44
Joined: Thu Aug 02, 2012 3:31 pm

Re: Node 22

Post by unclesal » Fri Aug 24, 2012 10:09 pm

Migration is still proceding. I expect it will proceed for all the (Italian) night and more :)

unclesal
Posts: 44
Joined: Thu Aug 02, 2012 3:31 pm

UPDATE

Post by unclesal » Sat Aug 25, 2012 9:58 am

all vps with IP 192.X.X.X are now on PM24. During the migration we noticed that the same CPU oddity observed on the munin graphs showed on PM24, sign that it is a software problem more likely (take that @admin!). So I decided to install an older kernel on PM22 and keep there some vps for a few days.

S.

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: UPDATE

Post by Admin » Sat Aug 25, 2012 8:13 pm

unclesal wrote: sign that it is a software problem more likely (take that @admin!).
OK, you win :)
Hopefully node 24 won't crash and you will find the right hw/kernel combination for those Supermicros.
Good luck because this is no longer science, more like trial and error, man against machine, etc...

Admin

unclesal
Posts: 44
Joined: Thu Aug 02, 2012 3:31 pm

Re: Node 22

Post by unclesal » Sun Aug 26, 2012 8:06 am

Tonight (Italian time) PM24 rebooted, this confirm that the last kernel is the worst of the row.

First of all I'm sorry for this.

I'm now going to downgrade the kernel to an older version.

unclesal
Posts: 44
Joined: Thu Aug 02, 2012 3:31 pm

Re: Node 22

Post by unclesal » Sun Aug 26, 2012 10:21 am

The node was just rebooted with the older kernel release. Crossing the fingers :)

Post Reply

Who is online

Users browsing this forum: No registered users and 7 guests