rotor v0.09 release

Tags:

The original article was published at habr.com in English and Russian. Due to outstanding changes I decided to write a decicated article explaining rotor and key points in a new release.

rotor is a non-intrusive event loop friendly C++ actor micro framework, similar to its elder brothers like caf and sobjectizer. The new release came out under the flag of pluginization, which affects the entire lifetime of an actor.

Actor Linking

The actor system is all about interactions between actors, i.e. sending messages to each other (and producing side effects for the outer world or listening to messages it produces). However, to let a message be delivered to the final actor, the actor should be alive (1); in other words, if actor A is going to send message M to actor B, A should somehow be sure that actor B is online and will not go offline while M is routing.

Before rotor v0.09, that kind of warranty was only available due to child-parent relations, i.e. between supervisor and its child-actor. In this case, an actor was guaranteed that a message would be delivered to its supervisor because the supervisor owned the actor and said supervisor's lifetime covered the respective actor's lifetime. Now, with the release of v0.09, it is possible to link actor A with actor B that are not parent- or child-related to one another and to make sure that all messages will be delivered after successful linking .

So, linking actors is performed somewhat along these lines:

namespace r = rotor;

void some_actor_t::on_start() noexcept override {
    request<payload::link_request_t>(b_address).send(timeout);
}

void some_actor_t::on_link_response(r::message::link_response_t &response) noexcept {
    auto& ec = message.payload.ec;
    if (!ec) {
        // successful linking
    }
}

However, code like this should not be used directly as is... because it is inconvenient. It becomes more obvious if you try linking actor A with 2 or more actors (B1, B2, etc.), since some_actor_t should keep an internal count of how many target actors are waiting for (successful) link responses. And here the pluginization system featured in the v0.09 release comes to the rescue:

namespace r = rotor;

void some_actor_t::configure(r::plugin::plugin_base_t &plugin) noexcept override {
    plugin.with_casted<r::plugin::link_client_plugin_t>(
        [&](auto &p) {
            p.link(B1_address);
            p.link(B2_address);
        }
    );
}

Now, this is much more convenient, since link_client_plugin_t is included out of the box with the rotor::actor_base_t. Nevertheless, it's still not enough, because it does not answer a few important questions, such as: 1. When is actor linking performed (and a "by-question": when is actor unlinking performed)? 2. What happens if the target actor (aka "server") does not exist or rejects linking? 3. What happens if the target actor decides to self-shutdown when there are "clients" still linked to it?

To provide answers to these questions, the concept of actor lifetime should be revisited.

Async Actor Initialization And Shutdown

Represented in a simplified manner is, here is how an actor’s state usually changes: new (constructor) -> initializing -> initialized -> operational -> shutting down -> shut down

The main job is performed in the operational state, and it is up to the user to define what an actor is to do in its up-and-running mode.

In the I-phase (i.e. initializing -> initialized), the actor should prepare itself for further functioning: locate and link with other actors, establish connection to the database, acquire whichever resources it needs to be operational. The key point of rotor is that I-phase is asynchronous, so an actor should notify its supervisor when it is ready (2).

The S-phase (i.e. shutting down -> shut down) is complementary to the I-phase, i.e. the actor is being asked to shut down, and, when it is done, it should notify its supervisor.

While it sounds easy, the tricky bit lies in the composability of actors, when they form Erlang-like hierarchies of responsibilities (see my article on trees of Supervisors). In other words, any actor can fail during its I-phase or S-phase, and that can lead to asynchronous collapse of the entire hierarchy, regardless of the failed actor's location within it. Essentially, the entire hierarchy of actors becomes operational, or, if something happens, the entire hierarchy becomes shut down.

rotor seems unique with its init/shutdown approach. There is nothing similar in caf; in sobjectizer, there is a shutdown helper, which carries a function similar to the S-phase above; however, it is limited to one actor only and offers no I-phase because sobjectizer has no concept of hierarchies (see update below).

While using rotor, it was discovered that the progress of the I-phase (S-phase) may potentially require many resources to be acquired (or released) asynchronously, which means that no single component, or actor, is able, by its own will, to answer the question of whether it has or has not completed the current phase. Instead, the answer comes as a result of collaborative efforts, handled in the right order. And this is where plugins come into play; they are like pieces, with each one responsible for a particular job of initialization/shutdown.

So, here are the promised answers related to link_client_plugin_t:

  • Q: When is the actor linking or unlinking performed? A: When the actor state is initializing or shutting down respectively.
  • Q: What happens if the target actor (aka "server") does not exist or rejects linking? A: Since this happens when the actor state is initializing, the plugin will detect the fail condition and will trigger client-actor shutdown. That may trigger a cascade effect, i.e. its supervisor will be triggered to shut down, too.
  • Q: What happens if the target actor decides to self-shutdown when there are "clients" still linked to it? A: The "server-actor" will ask its clients to unlink, and once all "clients" have confirmed unlinking, the "server-actor" will continue the shutdown procedure (3).

A Simplified Example

Let's assume that there is a database driver with async-interface with one of the available event-loops for rotor, and there will be TCP-clients connecting to our service. The database will be served by db_actor_t and the service for serving clients will be named acceptor_t. The database actor is going to look like this:

namespace r = rotor;

struct db_actor_t: r::actor_base_t {

    struct resource {
        static const constexpr r::plugin::resource_id_t db_connection = 0;
    }

    void configure(r::plugin::plugin_base_t &plugin) noexcept override {
        plugin.with_casted<r::plugin::registry_plugin_t>([this](auto &p) {
            p.register_name("service::database", this->get_address())
        });
        plugin.with_casted<r::plugin::resources_plugin_t>([this](auto &) {
            resources->acquire(resource::db_connection);
            // initiate async connection to database
        });
    }

    void on_db_connection_success() {
        resources->release(resource::db_connection);
        ...
    }

    void on_db_disconnected() {
        resources->release(resource::db_connection);
    }

    void shutdown_start() noexcept override {
        r::actor_base_t::shutdown_start();
        resources->acquire(resource::db_connection);
        // initiate async disconnection from database, e.g. flush data
    }
};

The inner namespace resource is used to identify the database connection as a resource. It is good practice, better than hard-coding magic numbers like 0. During the actor configuration stage (which is part of initialization), when registry_plugin_t is ready, it will asynchronously register the actor address under a symbolic name of service::database in the registry (will be shown further down below). Then, with the resources_plugin_t, it acquires the database connection resource, blocking any further initialization and launching connection to the database. When connection is established, the resource is released, and the db_actor_t becomes operational. The S-phase is symmetrical, i.e. it blocks shutdown until all data is flushed to DB and connection is closed; once this step is complete, the actor will continue its shutdown (4).

The client acceptor code should look like this:

namespace r = rotor;
struct acceptor_actor_t: r::actor_base_t {
    r::address_ptr_t db_addr;

    void configure(r::plugin::plugin_base_t &plugin) noexcept override {
        plugin.with_casted<r::plugin::registry_plugin_t>([](auto &p) {
            p.discover_name("service::database", db_addr, true).link();
        });
    }

    void on_start() noexcept override {
        r::actor_base_t::on_start();
        // start accepting clients, e.g.
        // asio::ip::tcp::acceptor.async_accept(...);
    }

    void on_new_client(client_t& client) {
        // send<message::log_client_t>(db_addr, client)
    }
};

The key point here is the configure method. When registry_plugin_t is ready, it is configured to discover the name service::database and, when found, store it in the db_addr field; it then links the actor to the db_actor_t. If service::database is not found, the acceptor shuts down (i.e. on_start is not invoked); if the linking is not confirmed, the acceptor shuts down, too. When everything is fine, the acceptor starts accepting new clients.

The operational part itself is missing for the sake of brevity because it hasn't changed in the new rotor version: there is a need to define payload and message (including request and response types), as well as define methods which will accept the messages and finally subscribe to them.

Let's bundle everything together in a main.cpp. Let's assume that the boost::asio event loop is used.

namespace asio = boost::asio;
namespace r = rotor;

...
asio::io_context io_context;
auto system_context = rotor::asio::system_context_asio_t(io_context);
auto strand = std::make_shared<asio::io_context::strand>(io_context);
auto timeout = r::pt::milliseconds(100);
auto sup = system_context->create_supervisor<r::asio::supervisor_asio_t>()
               .timeout(timeout)
               .strand(strand)
               .create_registry()
               .finish();

sup->create_actor<db_actor_t>().timeout(timeout).finish();
sup->create_actor<acceptor_actor_t>().timeout(timeout).finish();

sup->start();
io_context.run();

The builder pattern is actively used in the v0.09 rotor. Here, the root supervisor sup was created with 3 actors instantiated on it: the user defined db_actor_t and acceptor_actor_t and implicitly created a registry actor. As is typical for the actor system, all actors are decoupled from one another, only sharing message types (skipped here).

All actors are simply created here, and the supervisor does not know the relations between them because actors are loosely coupled and have become more autonomous since v0.09.

Runtime configuration can be completely different: actors can be created on different threads, different supervisors, and even using different event loops, but the actor implementation remains the same (5). In that case, there will be more than one root supervisor; however, to let them find each other, the registry actor address should be shared between them. This is also supported via the get_registry_address() method of supervisor_t.

Summary

The most important feature of rotor v0.09 is the pluginization of its core. Among other plugins, the most important are: the link_client_plugin_t plugin, which maintains kind of a "virtual connection" between actors; the registry_plugin_t, which allows registering and discovering actor addresses by their symbolic names; and the resources_plugin_t, which suspends actor init/shutdown until external asynchronous events occur.

There are a few less prominent changes in the release, such as the new non-public properties access and builder pattern for actor construction.

Any feedback on rotor is welcome!

PS. I'd like to say thanks to Crazy Panda for supporting me in my actor model research.

Notes

(1) Currently, it will lead to segfault upon attempt to deliver a message to an actor whose supervisor is already destroyed.

(2) If it does not notify, init-request timeout will occur, and the actor will be asked by its supervisor to shut down, i.e. bypass the operational state.

(3) You might ask: what happens if a client-actor does not confirm unlinking on time? Well, this is somewhat of a violation of contract, and the system_context_t::on_error(const std::error_code&) method will be invoked, which, by default, will print error to std::cerr and invoke std::terminate(). To avoid contract violation, shutdown timeouts should be tuned to allow client-actors to unlink on time.

(4) During shutdown, the registry_plugin_t will unregister all registered names in the registry.

(5) With the exception of when different event loops are used, when actors use the event loop API directly, they will, obviously, change following the event loop change, but that's beyond rotor.

Update

During discussings with sobjectizer author below, it was clarified sobjectizer shutdowner and stop guard offer "long lasting" shutdown actions, however it's main purpose to give some actors additional time for shutdown, even if on the Environment stop was invoked. The asynchronous shutdown (and initialization) similar to rotor I-phase and S-phase can be modeled via actor's states, if needed. This is, however, framework users responsibility, contrary to rotor, where it is the framework responsibility.

C++ permission model

Tags:

Abstract

The problems of public, protected and private access are considered. The generic templated access approach with the on-demand specialization for a consumer is proposed; it's advantages and drawbacks are discussed. The synthetic solution satisfactory enough is proposed in the conclusion.

The problem

Out of the box C++ offers "classical" class access model: public properties (fields and methods) are accessible from anywhere, protected properties are accessible to descendant classes only, and finally private properties which permit access to class itself only.

Additionally it is possible to declare friend class (which might be templated) to provide maximum access to all properties (i.e. the same as private). This allows to access to the internals of a class to a related class.

For example, if there is an HTTP-library with Request and Connection classes and Request class would like to access Connection internals, this can be done as:

class Request;  /* forward declare */

enum class Shutdown { read, write, both };
class Connection {
    public:
        virtual void handle() { ... }
    private:
        void shutdown(Shutdown how) { ...;  }
        int skt;
        friend class Request;
};

class Request {
    public:
        virtual void process() {
            ...;
            /* I know what I'm doing */
            conn->shutdown(Shutdown::both);
        }
    protected:
      Connection* conn;
};

Now let's assume that there is an descendant

class HttpRequest: public Request {
    public:
        virtual void process() override {
            conn->shutdown(Shutdown::both); // Oops!
        }
};

Alas, there is no way in C++ to access to Connection::shutdown from it. To overcome this, with the current access model, there are the possibilities. First, it is possible to declare HttpRequest as a friend in the Connection. Whilst this will certainly work, the solution has strict limitation, that it applicable only for single library (project) to code of which you have access. Otherwise, it does not scales at all.

The second possibility if is to "expose" private connection from the Request class to all it's descendants, like:

class Request {
    protected:
        void connection_shutdown(Shutdown how) {  conn->shutdown(how); }
        int& connection_socket() {  conn->skt; }
        const int& connection_socket() const {  conn->skt; }
        Connection* conn;
};

This approach is better, because it scales to all descendant classes which can be located in different libraries. However, the price is quite high as there is need to provide access to all properties apriori even if some properties will not be needed. The more serious drawback is that the approach is limited to class inheritance; in other words, if there is need to access private properties of Connection not from Request's descendants, e.g. for tests.

Somebody might become disappointed at all and try to make everything public by default. This scales well and covers all abovedescribed issues though brings a new ones: the boundary between stable public API interface and private implementation details of a class is blurred and completion suggestions in an IDE can be overloaded with too many variants. In other words the proposed approach is too permissive.

Semantically identical would be to write simple accessors for all private properties; it just brings an illusion of the interface/implementation separation since a class author already exposed all class internals outside.

Let's summarize the requirements for the private properties (aka implementation details):

  • they should scale outside of a library

  • they should be accessible outside of class hierarchy

  • they should not "pollute" the class public API, i.e. somehow be not available by default, still be accessible

The possible solutions

It consists of two pieces, the first one is to declare possibility to access the private fields of a class, e.g.:

// my_library.h
namespace my::library {
    class Connection {
        public:
            virtual void handle() { ... }
            template<typename T...> auto& access() noexcept;
        private:
            void shutdown(Shutdown how) { ...;  }
            int skt;
    };
}

The second piece is actually provide full access specialization in the target place, e.g. :

// my_app.cpp
namespace to {
    struct skt{}; // matches private field
}

namespace my::library {
    auto& Connection::access<to::skt>() noexcept { return skt; }
}

// namespace does not matter
class HttpRequest: public Request {
    public:
        virtual void process() override {
            auto& s = conn->access<to::skt>();  // voila!
            shutdown(s, SHUT_RDWR);
        }
};

In other words, in the source class the generic templated accessor is defined, and in the place, where the access is needed, the specialization is provided as the actual access to the required fields.

The solution meets all requirements, however it still has it's own drawbacks. First, there is need of duplication of const and non-const access, i.e.

class Connection {
    public:
        virtual void handle() { ... }
        template<typename T...> auto& access() noexcept;
        template<typename T...> auto& access() const noexcept;
};

...

namespace my::library {
    auto& Connection::access<to::skt>() noexcept       { return skt; }
    auto& Connection::access<to::skt>() const noexcept { return skt; }
}

Although, you don't have to provide const and non-const access if you need only one.

The second drawback, that to let the approach work for methods, especially those, which return type can't be auto& (e.g. void or int). To overcome it the access should be rewritten as:

class Connection {
    public:
        template<typename T, typename... Args> T access(Args...);
}

namespace my::library {
    void Connection::access<void, Shutdown>(Shutdown how) {
        return shutdown(how);
    }
}

class HttpRequest: public Request {
    public:
        virtual void process() override {
            conn->access<void, Shutdown>(Shutdown::both);
        }
};

Another problem arises: if there are two or more private methods with identical signatures (return and arguments types), the artificial tag should be introduced again, i.e.

class Connection {
    template<typename T, typename Tag, typename... Args> T access(Args...);
};

namespace to {
    struct skt{};
    struct shutdown{};
}

namespace my::library {
    int& Connection::access<int&, to::skt>() { return skt; }
    void Connection::access<void, to::shutdown, Shutdown>(Shutdown how) {
        shutdown(how);
    }
}

...
conn->access<void, to::shutdown>(Shutdown::both); // voila!

The variadic Args... template parameter dos not force to duplicate the original arguments; it can have even add unrelated types to "inject" new methods with additional logic into the Connection class. For example:

namespace to {
    struct fun{}
}

namespace my::library {
    void Connection::access<void, to::fun>() {
        Shutdown how = std::rand() > 1000 ? Shutdown::read ? Shutdown::write;
        shutdown(how);
    }
}

It is known, that methods might have optional noexcept specification in addition to const. So, for the sake of generality, all four access cases should be provided, i.e.:

class Connection {
public:
template<typename T, typename Tag, typename... Args> T access(Args...);
template<typename T, typename Tag, typename... Args> T access(Args...) const;
template<typename T, typename Tag, typename... Args> T access(Args...) noexcept;
template<typename T, typename Tag, typename... Args> T access(Args...) const noexcept;
};

Alas, it was not the last problem with the approach: there is a problem with inheritance, e.g.:

class Connection {
    template<typename T> auto& access();
private:
    int skt;
};

enum class SslState { /* whatever */};

class SslConnection:public Connection {
public:
    template<typename T> auto& access();
private:
    SslState state;
};

namespace to {
    struct skt{};
    struct state{};
}

namespace my::library {
    auto& Connection::access<to::skt>() { return skt; }
    auto& SslConnection::access<to::state>() { return state; }
}

However, as soon as try to access to parent property via child class, i.e.:

SslConnection* conn = ...;
auto& skt = conn->access<to::skt>(); // oops!

It cannot resolve access to socket via SslConnection because there is no to::skt specialization for SslConnection; there is on in it's parent class, but in accordance with C++ rules a compiler does not see it. The solution is to cast to the base class:

SslConnection* conn = ...;
auto& skt = static_cast<Connection*>(conn)->access<to::skt>();

This becomes even more unhandy when an object is stored behind smart pointer.

Let's enumerate key points:

  • accessors multiplication due to const and noexcept variants

  • not so handy access for private methods (too verbose due to multiple template params), although "injection" of own accessor-methods seems an advantage

  • too clumpsy syntax to access private proreties in class hierarchy

Conclusion

The proposed solution is far from perfect. I found the following golden ratio for my projects on the implementation details access topic:

  • if the property is stable enough or it is the part of class interface, then public accessor should be written for it. It would be desirable for read only access, i.e. the accessor should be just a getter. For example, the address property in actor_base in rotor.

  • otherwise, if implementation details might be usable in descendants, make them private

  • provide generic templated accessor (template<typename T> auto& access()) but for properties only; no access to private methods, as I don't see possible use cases now. This point might be different for different projects.

The described approach is applied in to be released soon rotor v0.09.

Request Response Message Exchange Pattern

Tags:

Introduction

The plan is to examine request/response pattern in "abstract" actor framework, and find why it is not so trivial to implement as it might appear at the first. Later, we'll see how various C++ actor frameworks (CAF, sobjectizer, rotor) support the pattern.

The Request Response Message Exchange Pattern sounds quite simple: a client asks a server to proccess a request and return response to the client once it is done.

Synchronous analogy

This is completely artificial analogy, however it seem useful for further explanations.

It can be said, that in synchronous request-response can be simply presented as just a regular function call, where request is input parameter to a function, and the response is the return type, i.e.

struct request_t { ... };
struct response_t { ... };

response_t function(const request_t&) {  ... }

The "server" here is the function inself, and the "client" is the call-side side.

The semantic of this function says that the function always successfully processes a request and return a result.

Probably, the most modern way to express that a function might fail is to wrap the response into monad-like wrapper, like std::optional, std::expected, boost::outcome etc.

Probably, the mostly used way to let the caller know that request processing failed is to throw an exception. However, it is not expressible in the modern C++, so it should be mentioned somewhere in documentation or just assumed; in other words, from the function signature response_t function(const request_t&) never knows whether it always successfully processes request or sometimes it might fail and thrown an exception.

Problems of naive approach in actor framework

Let's realize the simple request-response approach within the imaginary actor framework: the on_request method of server-actor is called with the request payload, and it returns the response_t

response_t server_t::on_request(request_t &)

The naive implementation of request-response approach follows the same pattern as in synchronous analogy, when it "forgets" to express that the request processing might fail. The actor framework responsibility is to wrap payload (response_t, request_t) into messages (message<request_t>) and deliver them as messages to related actors (client-actor, server-actor). Later the payload will be unpacked.

The corresponding receiver interface of the client-actor will be like:

void client_t::on_response(response_t&)

While it looks OK, how will the client-actor be able to distinguish responses from different requests?

The first solution would be to use some conventions on the request/response protocol, i.e. inject some synthetic request_id into request_t and somewhere into response_t, generate and set it on request-side (client-actor) and presume that the server-actor will set it back together with the response. There is a variation of the solution, when the request_t is completely embedded into response_t.

While this will definitely work as it will be shown below with sobjectizer framework, this also means that there is no help from a actor framework and the burden of the implementation lies completely on a developer.

The second solution is to let it be managed somehow by an actor framework: let is suspend client-actor execution until the response will be received. The "suspending" in the context means that all messages for the client-actor will be queued until the response will be received.

This will work as soon as everything goes fine. However, as soon as something goes wrong (i.e. server-actor cannot process the request), the whole client-actor will stuck: It will not able to process any message, because it infinitely waits the particular one (the response).

In terms of actor-design the client-actor becomes non-reactive at the moment when it stops processing messages, which was a "feature" of the imaginary naive actor framework to "wait" the response.

You might guess, the per-request timeout timer, can resolve the problem. Yes, indeed, however how in the interface

void client_t::on_response(response_t&)

it is possible to tell client-actor about timeout trigger? I don't see an acceptable way to do that. It is paradoxical situation, that a failure can be detected, but there is no way to react on it.

Anyway, even if there would be a way to notify client-actor about failure, it still does not completely solves the non-reactivity problem: an actor becomes reactive only after timeout trigger, and until that it is still "suspended" waiting response_t or timeout trigger.

The root of the problem caused by "forgetfulness" of the server-actor interface to specify, that it might fail. It's time to review our interfaces, then.

No framework support for req/res pattern

Let's summarize the necessary pieces, which are required to implement request/response pattern in actor-like manner.

First, the server-actor might fail in processing the request, and it need to tell the framework and client-actor about the failure.

Second, the response should enriched to contain the original request_id to make it possible for client-actor to map the response to the request. (In other sources it might be named correlation_id, which serves the same purpose).

Third, the original request from the client-actor should also contain the request_id.

Forth, as the original response payload might be missing at all, it should be wrapped monad-like container (std::optional, std::unique_ptr etc.)

So, our basic structures should look like:

struct request_t { ... };
struct response_t { ... };
using request_id_t = std::uint32; /* as as example */

struct wrapped_request_t {
    request_id_t request_id;
    request_t req;
};

enum class request_error_t { SUCCESS, TIMEOUT, FAIL_REASON_1_1, ... };

struct wrapped_response_t {
    request_error_t request_error;
    request_id_t request_id;
    std::optional<response_t> res;  /* may be it'll contain the payload */
};

And the corresponding actor interfaces will be:

wrapped_response_t server_t::on_request(wrapped_request_t& ) { ... }

void client_t::on_response(wrapped_response_t& ) { ... }

The FAIL_REASON_1_1 and other error codes are desirable, if the server wants fail early and notify client about that. Otherwise, if server cannot process request, it silently ignores the request; however client will be notified only via timeout and it can only guess, what exactly was wrong. In other words, it is not good practice in general ignore wrong requests; react on them is much better.

So, sending an request from client to server should be like:

struct client_t {
    ...
    request_id_t last_request_id = 1;
};

void client_t::some_method() {
    auto req_id = ++last_request_id;
    auto request = request_t{ ... };
    framework.send(server_address, wrapped_request_t{ req_id, std::move(request) } );
}

However, the story does not end here, as the timeout-timer part is missing (i.e. for the case, when server-actor does not answer at all). The needed pieces are: 1) per request timeout timer; 2) when the response arrives in time, the timer should be cancelled; 3) otherwise, the message with empty payload and timeout-fail reason should be delivered; 4) if the response still arrives after timeout trigger, it should be silently discarded. There is a sense to have this things in dedicated methods.

/* whatever that is able to identify particular timer instance */
using timer_id_t = ...;

struct client_t {
    using timer_map_t = std::unordered_map<timer_id_t, request_id_t>;
    /* reverse mapping */
    using request_map_t = std::unordered_map<request_id_t, timer_id_t>;
    ...
    request_id_t last_request_id = 1;
    timer_map_t timer_map;
    request_map_t request_map;
};

void client_t::some_method() {
    auto req_id = ++last_request_id;
    auto request = request_t{ ... };
    framework.send(server_address, wrapped_request_t{ req_id, std::move(request) } );
    /* start timer */
    auto timer_id = timers_framework.start_time(timeout);
    timer_map.emplace(timer_id, req_id);
    request_map.emplace(req_id, timer_id);
}

void client_t::on_timer_trigger(timer_id_t timer_id) {
    auto request_id = timer_map[timer_id];
    this->on_response(wrapped_response_t{ request_error_t::TIMEOUT,  request_id });
    timer_map.erase(timer_id);
    request_map.erase(request_id);
}

void client_t::on_response_guard(wrapped_response_t& r) {
    if (request_map.count(r.request_id) == 0) {
        /* no timer, means timer already triggered and timeout-response was
        delivered, just discard the response */
        return;
    }
    auto timer_id = request_map[r.request_id];
    timers_framework.cancel(timer_id);
    this->on_response(r); /* actually deliver the response */
    timer_map.erase(timer_id);
    request_map.erase(request_id);
}

Now, the example is complete. It should be able to handle request-responses in a robust way. However, there is no actual request processing code, and a lot of auxiliary code to make it responsible and robust.

The worse thing, if that the boilerplate code have to be repeated for every request-response pair type. It is discouraging and error-prone way of development; an developer might end up frustrated with actor-design at all.

req/res approach with sobjectizer

The sobjectizer actor framework at the moment has version 5.6 and does not help in request-response pattern usage. So, basically, it is like the request-response sample above without framework support, with sobjectizer's specifics, of course.

In the example below the "probabilistic pong" (server role) is used: i.e. randomly it successfully answers to ping-requests, and sometimes it just ignores the requests. The pinger (client role) should be able to detect the both cases.

#include <so_5/all.hpp>
#include <optional>
#include <random>
#include <unordered_map>

using request_id_t = std::uint32_t;
using namespace std::literals;

struct ping {};

struct pong {};

struct timeout_signal {
    request_id_t request_id;
};

enum class request_error_t { SUCCESS, TIMEOUT };

struct wrapped_ping {
    request_id_t request_id;
    ping payload;
};

struct wrapped_pong {
    request_id_t request_id;
    request_error_t error_code;
    std::optional<pong> payload;
};

class pinger final : public so_5::agent_t {
    using timer_t = std::unique_ptr<so_5::timer_id_t>;
    using request_map_t = std::unordered_map<request_id_t, timer_t>;

    so_5::mbox_t ponger_;
    request_map_t request_map;
    request_id_t last_request = 0;

    void on_pong(mhood_t<wrapped_pong> cmd) {
        auto &timer = request_map.at(cmd->request_id);
        timer->release();
        request_map.erase(cmd->request_id);
        on_pong_delivery(*cmd);
    }

    void on_pong_delivery(const wrapped_pong &cmd) {
        bool success = cmd.error_code == request_error_t::SUCCESS;
        auto request_id = cmd.request_id;
        std::cout << "pinger::on_pong " << request_id << ", success: " << success << "\n";
        so_deregister_agent_coop_normally();
    }

    void on_timeout(mhood_t<timeout_signal> msg) {
        std::cout << "pinger::on_timeout\n";
        auto request_id = msg->request_id;
        request_map.erase(request_id);
        wrapped_pong cmd{request_id, request_error_t::TIMEOUT};
        on_pong_delivery(cmd);
    }

  public:
    pinger(context_t ctx) : so_5::agent_t{std::move(ctx)} {}

    void set_ponger(const so_5::mbox_t mbox) { ponger_ = mbox; }

    void so_define_agent() override { so_subscribe_self().event(&pinger::on_pong).event(&pinger::on_timeout); }

    void so_evt_start() override {
        auto request_id = ++last_request;
        so_5::send<wrapped_ping>(ponger_, request_id);
        auto timer = so_5::send_periodic<timeout_signal>(*this, so_direct_mbox(), 200ms,
                                                         std::chrono::milliseconds::zero(), request_id);
        auto timer_ptr = std::make_unique<so_5::timer_id_t>(std::move(timer));
        request_map.emplace(request_id, std::move(timer_ptr));
    }
};

class ponger final : public so_5::agent_t {
    const so_5::mbox_t pinger_;
    std::random_device rd;
    std::mt19937 gen;
    std::uniform_real_distribution<> distr;

  public:
    ponger(context_t ctx, so_5::mbox_t pinger) : so_5::agent_t{std::move(ctx)}, pinger_{std::move(pinger)}, gen(rd()) {}

    void so_define_agent() override {
        so_subscribe_self().event([this](mhood_t<wrapped_ping> msg) {
            auto dice_roll = distr(gen);
            std::cout << "ponger::on_ping " << msg->request_id << ", " << dice_roll << "\n";
            if (dice_roll > 0.5) {
                std::cout << "ponger::on_ping (sending pong back)" << std::endl;
                so_5::send<wrapped_pong>(pinger_, msg->request_id, request_error_t::SUCCESS, pong{});
            }
        });
    }
};

int main() {
    so_5::launch([](so_5::environment_t &env) {
        env.introduce_coop([](so_5::coop_t &coop) {
            auto pinger_actor = coop.make_agent<pinger>();
            auto ponger_actor = coop.make_agent<ponger>(pinger_actor->so_direct_mbox());

            pinger_actor->set_ponger(ponger_actor->so_direct_mbox());
        });
    });

    return 0;
}

Output sample:

ponger::on_ping 1, 0.475312
pinger::on_timeout
pinger::on_pong 1, success: 0

Other output sample:

ponger::on_ping 1, 0.815891
ponger::on_ping (sending pong back)
pinger::on_pong 1, success: 1

It should be noted, that request/response pattern was supported in sobjectizer before version 5.6, however it was dropped (well, moved to sobjectizer-extra, which has different licensing terms). The request/response was easy as the following like:

auto r = so_5::request_value<Result,Request>(mbox, timeout, params);

It is convenient; nevertheless, from the explanation sample "How does it work?", the following sample is available:

// Waiting and handling the result.
auto wait_result__ = f__.wait_for(timeout);
if(std::future_status::ready != wait_result__)
   throw exception_t(...);
auto r = f__.get();

it suffers the same non-reactivity taint as described above, i.e. lack of possibility to answer other messages, while waiting a response. Hence, you can see "Deadlocks" section in the documentation, and the developers responsibility to handle the situation.

req/res approach with CAF

The C++ actor framework (aka CAF), does support request/response approach.

#include <chrono>
#include <iostream>
#include <random>
#include <string>

#include "caf/all.hpp"
#include "caf/optional.hpp"
#include "caf/sec.hpp"

using std::endl;
using std::string;
using namespace std::literals;

using namespace caf;

using ping_atom = atom_constant<atom("ping")>;
using pong_atom = atom_constant<atom("pong")>;

void ping(event_based_actor *self, actor pong_actor) {
  aout(self) << "ping" << endl;

  self->request(pong_actor, 1s, ping_atom::value)
      .then([=](pong_atom ok) { aout(self) << "pong received" << endl; },
            [=](error err) {
              aout(self) << "pong was NOT received (timed out?), error code = "
                         << err.code() << endl;
            });
}

behavior pong(event_based_actor *self) {
  using generator_t = std::shared_ptr<std::mt19937>;
  using distrbution_t = std::shared_ptr<std::uniform_real_distribution<double>>;

  std::random_device rd;
  auto gen = std::make_shared<typename generator_t::element_type>(rd());
  auto distr = std::make_shared<typename distrbution_t::element_type>();
  return {[=](ping_atom) {
    auto dice = (*distr)(*gen);
    aout(self) << "pong, dice = " << dice << endl;
    if (dice > 0.5) {
      return optional<pong_atom>(pong_atom::value);
    }
    return optional<pong_atom>();
  }};
}

void caf_main(actor_system &system) {
  auto pong_actor = system.spawn(pong);
  auto ping_actor = system.spawn(ping, pong_actor);
}

CAF_MAIN()

Output sample:

ping
pong, dice = 0.571207
pong received

Another output sample:

ping
pong, dice = 0.270214
pong was NOT received (timed out?), error code = 2

The call client_actor->request(server_actor, timeout, args..) returns an intermediate future-like object, where then method can be invoked with forwarded one-shot actor behaviour. And, yes, there is await method too with non-reactive behaviour, where you can shoot easily yourself with deadlock. So, according to the documentation then method is what we need, as it "multiplexes the one-shot handler with the regular actor behaviour and handles requests as they arrive".

req/res approach with rotor

The rotor does support request/response approach since v0.04

#include <rotor/ev.hpp>
#include <iostream>
#include <random>

namespace payload {
struct pong_t {};
struct ping_t {
    using response_t = pong_t;
};
} // namespace payload

namespace message {
using ping_t = rotor::request_traits_t<payload::ping_t>::request::message_t;
using pong_t = rotor::request_traits_t<payload::ping_t>::response::message_t;
} // namespace message

struct pinger_t : public rotor::actor_base_t {

    using rotor::actor_base_t::actor_base_t;

    void set_ponger_addr(const rotor::address_ptr_t &addr) { ponger_addr = addr; }

    void on_initialize(rotor::message::init_request_t &msg) noexcept override {
        rotor::actor_base_t::on_initialize(msg);
        subscribe(&pinger_t::on_pong);
    }

    void on_start(rotor::message_t<rotor::payload::start_actor_t> &) noexcept override {
        request<payload::ping_t>(ponger_addr).send(rotor::pt::seconds(1));
    }

    void on_pong(message::pong_t &msg) noexcept {
        auto &ec = msg.payload.ec;
        if (!msg.payload.ec) {
            std::cout << "pong received\n";
        } else {
            std::cout << "pong was NOT received: " << ec.message() << "\n";
        }
        supervisor.do_shutdown();
    }

    rotor::address_ptr_t ponger_addr;
};

struct ponger_t : public rotor::actor_base_t {
    using generator_t = std::mt19937;
    using distrbution_t = std::uniform_real_distribution<double>;

    std::random_device rd;
    generator_t gen;
    distrbution_t dist;

    ponger_t(rotor::supervisor_t &sup) : rotor::actor_base_t{sup}, gen(rd()) {}

    void on_initialize(rotor::message::init_request_t &msg) noexcept override {
        rotor::actor_base_t::on_initialize(msg);
        subscribe(&ponger_t::on_ping);
    }

    void on_ping(message::ping_t &req) noexcept {
        auto dice = dist(gen);
        std::cout << "pong, dice = " << dice << std::endl;
        if (dice > 0.5) {
            reply_to(req);
        }
    }
};

int main() {
    try {
        auto *loop = ev_loop_new(0);
        auto system_context = rotor::ev::system_context_ev_t::ptr_t{new rotor::ev::system_context_ev_t()};
        auto timeout = boost::posix_time::milliseconds{10};
        auto conf = rotor::ev::supervisor_config_ev_t{
            timeout, loop, true, /* let supervisor takes ownership on the loop */
        };
        auto sup = system_context->create_supervisor<rotor::ev::supervisor_ev_t>(conf);

        auto pinger = sup->create_actor<pinger_t>(timeout);
        auto ponger = sup->create_actor<ponger_t>(timeout);
        pinger->set_ponger_addr(ponger->get_address());

        sup->start();
        ev_run(loop);
    } catch (const std::exception &ex) {
        std::cout << "exception : " << ex.what();
    }

    std::cout << "exiting...\n";
    return 0;
}

Output sample:

pong, dice = 0.90477
pong received

Another output sample:

pong, dice = 0.24427
pong was NOT received: request timeout

Comparing to CAF, rotor's version is more verbose in the terms of LOC (lines of code). Partly this is caused by omitted main in CAF, while in rotor the main cannot be shortened because it is assumed to work with different loop backends as well as in cooperation with them and other non-actor loop components; partly because of in CAF the message is hidden from user, while in rotor is is exposed outside due to performance reasons (i.e. allow the payload to be smart-pointer to have zero-copy); and finally because of CAFs intensive usage of lambdas, which leads to more compact code.

However, it is still what it needed: reactive reactive request-response.

Request/Response composability

On the top of request-response pattern, the ask pattern can be developed. In short, an client-actor makes several of requests, and then, depending on the results it makes an appropriate action. See akka docs as an example,

However, the ask pattern it is a little bit more general: it should be possible to access to the initial context (message) as well as to all responses (some of which might fail).

The sobjectizer does not offer support request-response patters, so it is out of comparison. The sobjectizer-extra offers std::future based solution, however, as we seen, it not reactive (while(!fututure.is_ready()){ ... }) and as the std::futures are not compose-able, the ask pattern cannot be implemented.

As we've seen with CAF lambda approach, there are 2 lambdas per request (one is for fail response and another is for success response); each one captures outer request context and has access to its own response. Nonetheless, none of the lambdas has access to the contexts of the other requests; in other words the common context (which can include the original message) should be shared between them, and the code compactness seems to lost.

Here is an example how to compose two ping-pong requests, where any of them might fail.

#include <chrono>
#include <iostream>
#include <random>
#include <string>

#include "caf/all.hpp"
#include "caf/optional.hpp"
#include "caf/sec.hpp"

using std::endl;
using std::string;
using namespace std::literals;

using namespace caf;

using ping_atom = atom_constant<atom("ping")>;
using pong_atom = atom_constant<atom("pong")>;

struct shared_context_t {
  std::size_t pings_left;
  std::size_t pings_success = 0;
  std::size_t pings_error = 0;

  shared_context_t(std::size_t pings_left_) : pings_left{pings_left_} {}

  void output_results() {
    if (pings_left == 0) {
      // unsafe, aout should be used, but how to capture it?
      std::cout << "success: " << pings_success << ", errors: " << pings_error
                << "\n";
    }
  }
  void record_success() {
    ++pings_success;
    --pings_left;
    output_results();
  }
  void record_fail() {
    ++pings_error;
    --pings_left;
    output_results();
  }
};

void ping(event_based_actor *self, actor pong_actor1, actor pong_actor2) {
  aout(self) << "ping" << endl;

  auto context = std::make_shared<shared_context_t>(2);
  self->request(pong_actor1, 1s, ping_atom::value)
      .then([=](pong_atom ok) { context->record_success(); },
            [=](error err) { context->record_fail(); });
  self->request(pong_actor2, 1s, ping_atom::value)
      .then([=](pong_atom ok) { context->record_success(); },
            [=](error err) { context->record_fail(); });
}

behavior pong(event_based_actor *self) {
  using generator_t = std::shared_ptr<std::mt19937>;
  using distrbution_t = std::shared_ptr<std::uniform_real_distribution<double>>;

  std::random_device rd;
  auto gen = std::make_shared<typename generator_t::element_type>(rd());
  auto distr = std::make_shared<typename distrbution_t::element_type>();
  return {[=](ping_atom) {
    auto dice = (*distr)(*gen);
    aout(self) << "pong, dice = " << dice << endl;
    if (dice > 0.5) {
      return optional<pong_atom>(pong_atom::value);
    }
    return optional<pong_atom>();
  }};
}

void caf_main(actor_system &system) {
  auto pong_actor1 = system.spawn(pong);
  auto pong_actor2 = system.spawn(pong);
  auto ping_actor = system.spawn(ping, pong_actor1, pong_actor2);
}

CAF_MAIN()

Output sample:

ping
pong, dice = 0.818207
pong, dice = 0.140753
success: 1, errors: 1

Another output sample:

ping
pong, dice = 0.832334
pong, dice = 0.744168
success: 2, errors: 0

I'm not CAFs expert, but it seems that in shared context it needs to be captured the original behaviour to access aout, and there is need to have two methods per each request type (or single composed one with takes composite monad-like result).

Let's see how it works with rotor, however actors' addressing should be explained first. Akka and CAF actor frameworks use the ActorRef notion to (globally) identify an actor. It seems that there is one-to-one matching between ActorRef and the actor. In rotor address is completely decoupled from actor, and it can process messages on any address it is subscribed to. There is "main" (or default) actors address which is used for main rotor mechanics, still it can be subscribed to any address and process messages on it.

That technique is shown below, when an ephemeral address is created and an unique association between that address and context is created. Here is a full code:

#include <rotor/ev.hpp>
#include <iostream>
#include <random>
#include <unordered_map>

namespace payload {
struct pong_t {};
struct ping_t {
    using response_t = pong_t;
};
} // namespace payload

namespace message {
using ping_t = rotor::request_traits_t<payload::ping_t>::request::message_t;
using pong_t = rotor::request_traits_t<payload::ping_t>::response::message_t;
} // namespace message

struct shared_context_t {
    std::size_t pings_left;
    std::size_t pings_success = 0;
    std::size_t pings_error = 0;
};

struct pinger_t : public rotor::actor_base_t {
    using map_t = std::unordered_map<rotor::address_ptr_t, shared_context_t>;

    using rotor::actor_base_t::actor_base_t;

    void set_ponger_addr1(const rotor::address_ptr_t &addr) { ponger_addr1 = addr; }
    void set_ponger_addr2(const rotor::address_ptr_t &addr) { ponger_addr2 = addr; }

    void on_start(rotor::message_t<rotor::payload::start_actor_t> &) noexcept override {
        reply_addr = create_address();
        subscribe(&pinger_t::on_pong, reply_addr);
        request_via<payload::ping_t>(ponger_addr1, reply_addr).send(rotor::pt::seconds(1));
        request_via<payload::ping_t>(ponger_addr2, reply_addr).send(rotor::pt::seconds(1));
        request_map.emplace(reply_addr, shared_context_t{2});
    }

    void on_pong(message::pong_t &msg) noexcept {
        auto &ctx = request_map[msg.address];
        --ctx.pings_left;
        auto &ec = msg.payload.ec;
        if (ec) {
            ++ctx.pings_error;
        } else {
            ++ctx.pings_success;
        }
        if (!ctx.pings_left) {
            std::cout << "success: " << ctx.pings_success << ", errors: " << ctx.pings_error << "\n";
            // optional cleanup
            unsubscribe(&pinger_t::on_pong, reply_addr);
            request_map.erase(msg.address);
            supervisor.do_shutdown();
        }
    }

    map_t request_map;
    rotor::address_ptr_t ponger_addr1;
    rotor::address_ptr_t ponger_addr2;
    rotor::address_ptr_t reply_addr;
};

struct ponger_t : public rotor::actor_base_t {
    using generator_t = std::mt19937;
    using distrbution_t = std::uniform_real_distribution<double>;

    std::random_device rd;
    generator_t gen;
    distrbution_t dist;

    ponger_t(rotor::supervisor_t &sup) : rotor::actor_base_t{sup}, gen(rd()) {}

    void on_initialize(rotor::message::init_request_t &msg) noexcept override {
        rotor::actor_base_t::on_initialize(msg);
        subscribe(&ponger_t::on_ping);
    }

    void on_ping(message::ping_t &req) noexcept {
        auto dice = dist(gen);
        std::cout << "pong, dice = " << dice << std::endl;
        if (dice > 0.5) {
            reply_to(req);
        }
    }
};

int main() {
    try {
        auto *loop = ev_loop_new(0);
        auto system_context = rotor::ev::system_context_ev_t::ptr_t{new rotor::ev::system_context_ev_t()};
        auto timeout = boost::posix_time::milliseconds{10};
        auto conf = rotor::ev::supervisor_config_ev_t{
            timeout, loop, true, /* let supervisor takes ownership on the loop */
        };
        auto sup = system_context->create_supervisor<rotor::ev::supervisor_ev_t>(conf);

        auto pinger = sup->create_actor<pinger_t>(timeout);
        auto ponger1 = sup->create_actor<ponger_t>(timeout);
        auto ponger2 = sup->create_actor<ponger_t>(timeout);
        pinger->set_ponger_addr1(ponger1->get_address());
        pinger->set_ponger_addr2(ponger2->get_address());

        sup->start();
        ev_run(loop);
    } catch (const std::exception &ex) {
        std::cout << "exception : " << ex.what();
    }

    std::cout << "exiting...\n";
    return 0;
}

Output sample:

pong, dice = 0.472509
pong, dice = 0.305997
success: 0, errors: 2

Another output sample:

pong, dice = 0.103796
pong, dice = 0.8862
success: 1, errors: 1

Rotor has special support of requests to be replied to custom addresses (i.e. request_via method). The main difference with the CAF that instead of multiple lambdas with additional methods (record_success and record_fail) and "gather-them-all" method (output_results), with rotor there is just single gather-them-all method (on_pong), which actually has exactly the same signature when as the previous example with rotor.

Conclusion

When you start thinking about possible failures the initially request response schema abruptly becomes non-trivial. Timeout and other errors should be handled and without framework support the code quite quickly becomes cumbersome.

There is still additional requirements, that the provided by a framework support of request/response pattern did not come of cost of loosing actor's reactivity; for simplicity, you may treat it as dead-lock avoidance. Another nice-to-have feature would be composability of the requests.

At the moment sobjectizer does not provides request/response pattern, however in the past it did, however it was non-reactive.

Both CAF and rotor do provide request/response pattern keeping still actors reactive. CAF has more compact code; the rotor's code is more verbose. It seems that in CAF you should roll you own composability of requests, i.e. develop context class and make it shared between different requests handlers. In rotor the composability of requests seems more natural via creating ephemeral reply addresses, which can associate the linked group of requests in single place.

Update

The sobjectizer author replied with separate article. , which I recommend to read.

So, it should be updated, that sobjetizer-extra provides support for request-response pattern, but only via a bit different name (async_op, in the case). It is completely asynchronous and free of dead-locks, i.e. reactive.

It is also composable, with the approximately same lines of code as rotor example. The composability is done via lambdas (as in CAF), but the responses are redirected to different mboxes (as the ephemeral addresses in rotor).

So, it is possible to get the same result with all considered frameworks.

Trees of Supervisors in C++ (caf, sobjectizer, rotor)

Tags:

The notion of supervisor & trees of supervisors

Erlang is famous for its erlang-supervisor approach, i.e. an actor which is responsible managing it's child actors lifetime (start, restart, shutdown).

In regular C++ code an error (exception) propagates towards the caller-side. In actor-based approach error is supervised by an actor owner (supervisor) and it's up to supervisor to decide how to react, e.g. restart actor but if it crashed no more than 3 times per minute. The caller-side is decoupled from this details at all.

In the case of child crash supervisor might decide either to restart it, or, if the local situation is considered too bad, then shutdown itself (and all it's children) and delegate the problem upwards, i.e. to is't own supervisor. The parent supervisor might, again, decide either restart or delegate the problem upwards again.

That form of delegating problem upto root supervisor naturally form tree of responsibilities, or, if you like, tree of supervision. This allows to an application slowly degrade and still be available (provide service) when non-fatal errors occur.

C++ supervising specifics

Erlang supervisors tolerate even programmer errors; this is known as let-it-crash principle, as a supervisor will just respawn crashed actor.

In C++ we cannot tolerate developer mistakes as in Erlang: there is no reasonable reaction in general to leaked memory or std::abort() call, right? Instead, in C++ world we assume that an actor is able to detect that it is in invalid state, and just terminate, possibly notifying it's exit reason upwards. In the upper layer the actor exit will be observed, and a decision will be made how to handle with the situation.

Supervisor-like capabilities of caf & sobjectizer

caf (c++ actor framework)

There is no explicit supervisor role in caf. However, according to documentation caf-monitor, it is possible to subscribe to special system down message and provide appropriate reaction. This can be done in recursive manner, i.e. tree of responsibilities can be build.

However, if I read documentation correctly, any actor can monitor any other actor death. The responsibility of reaction is somewhat blurred from the API perspective. Let's name the approach supervising-via-monitoring.

As the actors are spawned (and owned) by system, the application at the runtime does not creates an hierarchy, i.e. it is flat:

caf.png

There is actors groupping capabilities in caf, which allows to multicast messages to all actors inside the group, e.g. shutdown messsage.

sobjectizer

In sobjectizer actors (agents, in the sobjectizer's terminology) are spawned by cooperation; the cooperation itself is created by environment (the similar role to system in caf). cooperations can create child cooperations, thus making a hierarchy.

However, it should be noted, that cooperation in sobjectizer in not an actor; it is the type provided by sobjectizer and there is no possibility to roll your own implementation.

The cooperation in sobjectizer is some kind of supervisable group of actors, as it requires that all actors to be started and successfully registered; if any of them fails, coorepation will shutdown and unregister all (already registered) actors.

Actors are managed by cooperations; how actors can get back the information from cooperations? The notificator object should be created by cooperation with the messages destination mbox; then the notificator object should be added into the cooperation. Notificators are two kinds: reg and dereg, i.e. they observe actors startup or shutdown events. Lastly, actor should subscribe to the appropriate event on the message box. However,this is not an individual actor shutdown event, it is whole cooperation shutdown event (registration or deregistration).

Now follows my own vision how to get fine-gained supervising.

Thus, there is no direct actor shutdown observation; only indirect (i.e. via cooperation death notification). If there is need to observe individual actor shutdown, it should be the only actor on a cooperation, and the cooperation should be monitored; in the case of actor crash, it the new cooperation should be spawned, and the new actor should be spawned on it, and the monitoring subscription routine above should be repeated.

This can be visualized as the following:

sobjectizer.png

However, this looks a little bit overcomplicated. Indeed, sobjectizer way of supervising is coarse-grained supervising: not individual actors should be supervised, but group of related actors (cooperation).

sobjectizer-coarse.png

For more details, please refer sobjectizer-underhood.

Supervising in Rotor

In rotor we'd like to achieve fine-gained Erlang-like supervising:

rotor.png

Please note the absence of dotted lines, as they coincide with the solid lines, because the ownership of an actor is the same of supervising it. An supervisor in rotor is also an actor. The supervising hierarchy is naturally formed (unlike in caf), and still there are no doubling of the hierarchy as in sobjectizer, where framework supplied managing units (cooperations) are intermixed with user supplied actors.

However, parent-child relation between supervisors upto v0.02 version was available only for boost-asio based supervisors... because each supervisor incorporated own strand, which was generated by root io_context. That way it was a mixture of execution controller and supervising; so the real picture was like:

rotor-v0.02.png

Nonetheless, it imposed the sequential execution context (strand) for every supervisor, while it not necessary was needed. Also, the supervising hierarchy was not available for other event-loops, because strand seems the boost-asio unique feature.

To solve the situation, the locality notion was introduced. Under the hood, it is plain const void* marker. By default, locality just a pointer to root supervisor, if two supervisors has the same locality, they are executed in the same (thread-safe) context. This makes it possible to have supervising trees in all supported event loops in rotor.

How about boost-asio? locality here become a pointer to strand, making it possible to introduce executing context (implies supervising) on demand, i.e. something like this:

rotor-v0.03.png

Here supervisor_root, supervisor_1, supervisor_2 are executed on the context on one strand, while supervisor_2 has it's own strand.

This is available in rotor v0.03. (This led to API breaking changes, as now boost::asio supervisor takes strand shared pointer in config).

Rotor v0.03 messaging internals

Messaging in rotor done without private actor's message boxes unlike sobjectizer and caf. Every address (message delivery endpoint) is generated by a supervisor a reference to the supervisor is embedded into the address. So, during messages processing phase by each supervisor, it compared the reference in message destination address with the self (this), and if the match occurs the message was routed locally; otherwise the message was delivered for other supervisor for further (local) routing.

Since rotor v0.03 address object also embeds locality (const void*). If it matches to the locality of the supervisor, then the destination supervisor is taken from the address and the message is immediately locally routed in the context of the destination supervisor.

To make it possible, all supervisors with the same locality have to share the same queue of messages. The obvious way to achieve that is that child-supervisors with the same locality use root supervisor's messages queue.

In other words, a message is sent to locality, where all it's supervisors are peers entry points.