Server unavailable yesterday (2021-06-21)

You are here: Home > Forum > General > Website / Forum > Server unavailable yesterday (2021-06-21)

Page 1 of 1

Server unavailable yesterday (2021-06-21) 22/06/2021 at 16:23 #140135
GeoffM
Avatar
6274 posts
Online
As many of you are aware, there was a prolonged Internet outage yesterday that affected all the customers in the data centre in which the SimSig server is housed. Our hosts, Memset, and BT Openreach took around 11 hours to trace and fix the fault which turned out to be a double failure in the London area - some 40 miles away from the data centre itself. An update this morning from them mentions a cable re-route to fix one of the issues, while they're still working on the other issue. Beyond that I don't know any more (including what the other issue is) until they have completed their investigations and updated us.

As we mentioned in our Facebook post, we were already working on a backup system for this eventuality but unfortunately it wasn't ready in time. It's not a trivial task. We will now escalate this to top priority.

Our apologies for the downtime.

SimSig Boss
Log in to reply
The following users said thank you: peterb, Hap, postal, JamesN, northroad, TUT, haydenrobertson, Soton_Speed, geswedey, andyb0607, tynie123, UKTrainMan, rodney30, officer dibble, y10g9, bugsy, zachpratt, mldaureol, 9pN1SEAp, JWNoctis, Silverstar, Lyn-Greenwood, WesternChampion, Jay_G, Gwasanaethau, simple68, phil1044, Chrisrail
Server unavailable yesterday (2021-06-21) 13/07/2021 at 22:44 #140561
GeoffM
Avatar
6274 posts
Online
Memset have provided a further update. As is often the case, it was not a single failure but a coming together of a few factors that contributed to both the primary and the backup diverse routing to fail.

From what I understand, in an attempt to summarise here:
1. Someone severed the primary cable in Clapham
2. The backup cable had eroded
3. The backup cable wasn't detected as a fault because nothing was routed over it
4. There was no monitoring on the backup cable because it was previously used as a primary with constant traffic

They are now monitoring the backup cable by routing some traffic over it which should lower the risk of this happening again.

SimSig Boss
Log in to reply
The following users said thank you: postal, JamesN, UKTrainMan, kaiwhara, Hap, Dick, Soton_Speed, ajax103, pedroathome, bugsy, sunocske, JWNoctis