The full source code can be seen at examples/boost-asio/ping-pong-timer.cpp.
We would like to get some ping-pong system. There are enough examples of simple ping-pong messaging, however here we'd like to simulate unreliability of I/O via the ping-pong and the toolset, available in rotor, to overcome the unreliability.
The sources of the unreliability are: a) the pong
response message does not arrive in time; b) error happens in I/O layer. The additional requirement will be c) to allow the entire system to do clean shutdown at any time, e.g. on user press CTRL+C
on terminal.
Let enumerate the rules of the simulator. To simulate the unreliability, let's assume that ponger
actor will answer not immediately upon ping
request, but after some time and with some probability. "After some time" means that sometimes it will respond in time, and sometimes too late. The "some probability" will simulate I/O errors. As soon as pinger
receives successful pong
response it shuts down the entire simulator. However, if the pinger
actor does not receive any successful pong
response during some time despite multiple attempts, it should shut self down too. The rules are reified as the constants like:
The pinger
pings ponger during check_interval
or shuts self down. The ponger generates response during 50 + rand(70)
milliseconds with the 1 - failure_probability
.
Ok, let's go to the implementation. To make it reliable, we are going to use many patterns, but first of all, let's use the request-response one:
Since the v0.10
it is possible to cancel pending requests in rotor. Second pattern will be the discovery
(registry) pattern: here the ponger_t
actor will act as a server (i.e. it will announce self in a registry), and the pinger_t
actor will act as a client, (i.e. it will locate ponger
in a registry by name):
The mediator (aka registry) actor have to be created; we'll instruct supervisor to instantiate it for us:
What should pinger
do upon start? It should do ping
ponger and spawn a timer to shut self down upon timeout.
The on_start
method is rather trivial, except the two nuances. First, it must record the timer_id
, which may be necessary for the timer cancellation on shutdown initiation. Second, it acquires the timer resource, whose entire purpose is to delay shutdown (and, in general, the initialization) phase. Without the resource acquisition, the timer might trigger after actor shutdown, which usually is bad idea. In the timer handler (on_custom_timeout
) it performs the reverse actions: the timer resource is released (and, hence, shutdown will be continued if it was started), and timer_id
is reset to prevent cancellation in shutdown (will be shown soon). There is a guarantee, that timer handler will be invoked once anyway, whether it was cancelled or triggered. However, in accordance with our rules if it triggers, the pinger
actor should shut self down (i.e. do_shutdown()
method is invoked).
Let's demonstrate the pinger
shutdown procedure:
It's quite trivial: if there is a pending ping request, let's cancel it. If there is an active timer, let's cancel it too. Otherwise, let's continue shutdown. It should be noted, that the acquired resources are not released here; instead of the corresponding async operations are cancelled, and the resources will be released upon cancellation. The rotor internals knows about resources, so, it is safe to invoke rotor::actor_base_t::shutdown_start()
here (and it should). Any further resource release will continue suspended shutdown or initialization (see more about that in rotor::plugin::resources_plugin_t
).
Let's see the ping request and reaction to ping response (pong
) in pinger
:
Again, it follows the same pattern: initiate async operation (request), acquire resource, record it; and, upon response, release the resource, forget the request. Upon shutdown (as it is shown above), cancel request if it exists. As for the request processing flow, according to our rules, it shuts self down upon successful pong response; otherwise, if the actor is still operational (i.e. timer_id
does exist), it performs another ping attempt.
Let's move the ponger
overview. As the actor plays the server role it usually does not have on_start()
method. As ponger
does not reply immediately to ping requests, it should store them internally for further responses.
The key moment in the requests_map_t
is rotor::request_id_t
, which represents timer for each delayed ping response. So, when ping request arrives, timer is spawned and stored with the smart pointer to the original request:
As usual with async operations, the timer resource is acquired. However, there is additional check for the actor state, as we don't want even to start async operation (timer), when actor is shutting down; in case actor replies immediately with error.
The timer handler implementation isn't difficult:
In other words, if the timer isn't cancelled, it may be replied with success, or, if it was cancelled it replies with corresponding error code. Again, the timer resource is released, and request is erased from the requests map. Actually, it can be implemented in a little bit more verbose way: respond with error upon unsuccessful dice roll; however this is not necessary, due to the request-response pattern it is protected by timer on the request side (i.e. in pinger
).
Nonetheless it does response with error in the case of the cancellation, because the cancellation usually happens during shutdown procedure, which it is desirable to finish ASAP, otherwise the shutdown timer will trigger, and, by default, it will call std::terminate
. That can be worked around via tuning the shutdown timeouts (i.e. to let shutdown timeout be greater than the request timeout), however, it is rather shaky ground and it is not recommended to follow.
The cancellation implementation is rather straightforward: it finds the timer/request pair by the the original request id and origin (actor address), and then cancels the found timer. It should be noted, that timer might have already been triggered and, hence, it is not found in the request map.
The ponger
shutdown procedure is trivial: it just cancels all pending ping responses; responses are deleted during cancellation callback invocation.
The final piece in the example is a custom supervisor:
What is the need of it? First, because as for our rules, when pinger
shuts down, the entire system should shutdown too (Out of the box, in rotor a supervisor automatically shut self down only if it's child has been shut down, while the supervisor itself is in initialization stage). Second, it should stop boost::asio event loop and exit from main()
function.
That it is. In my opinion it has moderate complexity, however the clean shutdown scales well, if every actor has clean shutdown. And here is the demonstration of the thesis: you can add many ping clients, and it still performs correctly the main logic as well as the clean shutdown. That can be checked with tools like valgrind or memory/UB-sanitizers etc.
The output samples:
The full source code can be seen at examples/boost-asio/ping-pong-timer.cpp. There is a more advanced examples/boost-asio/beast-scrapper.cpp example too, however without detailed explanations.
Supervising actors is simply define some reaction upon managed child actor termination. Some of reactions are already build in rotor: the default supervisor_policy_t::shutdown_self
policy of a supervisor, will shut down the supervisor if (a) it is in initializing
state, and (b) one of its child actors shuts self down. This is a simple form of failure escalation: failure to initialize actor causes failure to initialize its supervisor.
However, if a supervisor has started, failure in child actor will cause no effect. To change that the special method escalate_failure(true)
should be called in actor builder. Another possibility is to invoke autoshutdown_supervisor(true)
, the difference is the following: autoshutdown_supervisor
unconditionally shuts supervisor down, when a child is down, while escalate_failure
analyzes shutdown reason code: if the code is normal (i.e. no error or failure caused an actor shutdown, but it shut self down because it successfully accomplished its job), then there is nothing to escalate.
The common reaction to actor shutdown is to give an actor another chance, i.e. restart it (spawn a new instance instead of the terminated one). Here an spawner
comes into play with its various policies applied: restart only on failure, restart only on successful termination, always restart, never restart, maximum number of restart attempts, whether the spawn failure should be escalated etc. To prevent the underlying system from chocking up, caused by frequently actor restarts, the restart frequency (i.e. minimum amount of time have to be passed before the next actor restart attempt) is introduced.
Here is an example, which demonstrates the features, described above. Let's introduce the rules for a ping-pong: pong might reply with failure on ping request; we'd like to shutdown pinger then and spawn a new instance upto 15 times. If there is no luck even after 15 attempts, the supervisor should fail and exit. Here is the relevant code:
The successful branch in pinger
can be like that:
The sample output (successfully terminated, eventually):
The whole code is available at examples/thread/ping-pong-spawner.cpp.
There is my another open-source project syncspirit, which uses rotor
under hood. I recommend to look at it, if the shipped examples are too-trivial, and don't give you an architectural insight of using rotor
.