Issues with IWStack Milano DC1

Post Reply
Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Issues with IWStack Milano DC1

Post by Admin » Sat Jan 30, 2016 1:21 pm

We are experiencing some reboots among the Milano IWStack clusters.
While the orchestrator tries to cope restarting VMs on other nodes, due to the big number of events, the restarts take a long time, up to 30-50 minutes.
We are investigating the issue.

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: Issues with IWStack Milano DC1

Post by Admin » Sun Jan 31, 2016 12:21 pm

After a lot of investigations we believe there is a problem with the fiberchannel links, one of them is probably causing some issues affecting the others or the switches.
When the nodes lose connectivity, they are automatically rebooted to protect the data.
A lot of such events at the same time create backlogs, meaning the vms restart slowly.

Admin
Site Admin
Posts: 490
Joined: Wed Jul 25, 2012 10:54 pm

Re: Issues with IWStack Milano DC1

Post by Admin » Mon Feb 01, 2016 12:01 am

We believe we have found the issue, as the fc cards were overwhelmed by the number of commands. This cause them to stop accepting new to clear the backlog which inceased the iowait to the point the orchestrator considered the node dead starting the VMs on another which had the same issue in turn, bringing the whole cluster down in a cascade failure, like the ones that cause power grids to fail in countries or areas of countries.
As a result, we will balance the clusters to consider this issue too, hopefully stopping the reboots.

Post Reply

Who is online

Users browsing this forum: No registered users and 12 guests