HSM Synchronization: DINAMO Replication #1

September 30, 2013

The last time I’ve talked about DINAMO (not to be confused with Dynamo data stores), an important project milestone was mentioned: the Replication Layer (RL).

Now that a major HSM firmware version is available, it’s a good moment to review the design of this complex system component (as noted before, under an NDA, I can only discuss some details).


Going back to some internal Git logs, two commits strike me as important time references:

commit e1438469d26f56b46b1d055a982ad22753513a36
Date: Tue Mar 13 16:29:20 2012 -0300
commit 68efd6c58ff24f551fab4a62f03da2996cce577d Date: Fri Sep 20 18:40:23 2013 -0300

There’s 1.5 year gap between both! Why? I can’t stress this further: high quality distributed systems (DS) are really hard to develop (those who say different have never worked with any). Our RL had to be implemented on top of a complex hardware/software architecture, with almost 10 years of existence, that survived 3 different companies (and many models).

When I first approached the RL, one thing was clear: an HSM is in the business of security/cryptography. My goal was to bring in as much reliability, transparency, and performance as possible. With no security compromises.

I believe DINAMO RL accomplished its goal. Without violating FIPS 140-2/ITI-MCT7 constraints (although these standards don’t specify how HSM pools/clusters have to work, they’re very strict regarding security features; ITI-MCT7 is a FIPS superset, from Brazil).

Next post: RL overall architecture.