July 26, 2024

Redesigned Hot Redundancy

CCS5

After more than 10y, we have decided to refactor the HOTRED (hot redundancy) capability of CCS5.  

The new approach exploits new features created to support constellations which were not available at the time of the original design.

We think the actions on failover, and robustness during a failure should be improved by these changes.

In the new approach, the nominal and redundant servers are seen as part of the same NATS cluster, as different “benches”. Both servers are directly visible in the dedicated HOTRED launch pad. It is possible to attach a CCS client console to either server at any time, at will. In the event of a failure of one server, the HOTRED launchpad allows to merely attach to the other server, and replicate the last known SCOE and GS connection status to the redundant server.

The new design eliminates code paths not tested in our daily testing, and improves test coverage.

The overall implementation is simpler, requires much less code to maintain it, and is easier to test.