Through Andrew’s work on a TLA+ specification for force-move, we discovered an issue with force-move. It is still possible to run the protocol safely, but doing so requires storing > nParticipants
states in some circumstances, while we were aiming for the property that a client should only ever have to store the last nParticipants
states.
The problem manifests as follows: suppose that we have a force-move channel with participants = [Alice, Eve, Bob, Charlie]
(so that Alice signs states numbered 0, 4, 8, ...
, etc). Suppose that the largest turn number Alice has seen so far is 11, and she holds states s8, s9, s10, s11
, signed by Alice/Eve/Bob/Charlie respectively. Now suppose that Eve launches a force-move with states s6, s7, s8, s9'
, where s9' != s9
. How can Alice ensure that the channel concludes in state s11
or later? She has to cancel the force-move in some way, for which there are three options: (1) respond
, (2) refute
, (3) respondFromAlternative
. She can’t respond
, in general, because s9' -> s10
might not be a validTransition
. She can’t refute
, as she doesn’t hold a state with turnNum > 9
that is signed by Eve. In order to respondFromAlternative
, she would need to supply states s7, s8, s9, s10
, which is fine and works, but requires her to store s7
, at least temporarily.
This isn’t a massive deal - after all, Eve has just provided s7
in order to launch the force-move. In thinking through this, we realised that there might be a neater solution though.
The proposal is to replace the respondFromAlternative
method with a checkpoint
method. This method would take a supportedState
* with turnNum = n
and:
- if there’s an ongoing challenge
- for a state with
turnNum < n
- cancel the challenge
- increase the
turnNumRecord
ton
- for a state with
turnNum >= n
- do nothing
- for a state with
- otherwise
- increase the
turnNumRecord
ton
, unless it is already larger
- increase the
In the example above, this would mean Alice could call checkpoint(s8, s9, s10, 11)
, which would cancel Eve’s challenge.
Checkpointing has the additional feature that it can be used pre-emptively by a participant before going offline, as by increasing the turnNumRecord
it prevents a force-move being called with an older state.
* In the simplest case, a supportedState
is a sequence of signed states, of length nParticipants
, with each consecutive pair a validTransition
. It also allows for an optimization, where some participants sign the same state. See the force-move implementation for more details.