rotor-light announcement

Tags:

I was asked by a few friends of mine to develop something like rotor but to be usable on embedded, with the restrictions for embedded platform like no dynamic allocations, no RTTI etc. Please, meet the rotor-light, which is capable to run even on 8-bit AVR microcontroller (Arduino Uno R3)!

rotor v0.22 and thread unsafety

Tags:

There is not so well explained BUILD_THREAD_UNSAFE rotor build option. Technically it means, that boost's intrusive ptr is used in thread-unsafe manner, i.e. the underlying reference counter is not atomic. Accordingly, all objects (in our case messages and actors from rotor) cannot be accessed from different threads concurrently.

It should be explicitly mentioned, that rotor's cross-thread messaging facility also cannot be used, otherwise there is notorious UB (undefined behavior).

Practically that usually means, that you are building single-threaded application, most likely a service with asio/ev backend.

Why you might need that feature? Performance is the answer, i.e. when you need rotor supervising/messaging facilities in single-threaded app. According to my measurements, with the thread-unsafety you'll get ~30.8 millions of messages per second instead of ~23.5 with the feauture disabled, i.e. ~30% of performance boost for free.

The question arises, then, how to stop that single threaded application? With the thread safety it can be done via launching additional thread, which monitors some atomic flag, and, once it detects that it is set, it sends shutdown signal to the root supervisor. The atomic flag is set externally, i.e. in signal handler (NB: you cannot send shutdown message within signal handler as all memory allocations are prohibited).

...
struct sigaction action;
memset(&action, 0, sizeof(action));
action.sa_handler = [](int) { shutdown_flag = true; };
auto r = sigaction(SIGINT, &action, nullptr);
...
auto console_thread = std::thread([&] {
    while (!shutdown_flag) {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
    supervisor->shutdown();
});

supervisor->start();
io_context.run();
console_thread.join();

However, this is no longer possible with thread-unsafety option. What can be done instead, is periodical flag checking from the rotor thread itself, i.e. using timers from root supervisor. I found that I use that feature frequently, so I decided to include it in rotor: in supervisor builder there is needed to specify reference to the shutdown_flag and the frequency of checking it, i.e.:

rth::system_context_thread_t ctx;
auto timeout = r::pt::milliseconds{100};
auto sup = ctx.create_supervisor<rth::supervisor_thread_t>()
               .timeout(timeout)
               .shutdown_flag(shutdown_flag, timeout / 2)
               .finish();

(You still need to have to set it externally, like in the example above with sigaction call). When it detects the flag is set to true it shuts self down.

The full example of usage can be seen at examples/thread/sha512.cpp

BUILD_THREAD_UNSAFE is turned off by default. You should explicitly turn it on if you are knowning what you are doing.

Supervising in C++: how to make your programs reliable

Tags:

Supervising in real world

When some extraordinary situation is met it can be handled at the problem level or its handling can be delegated to some upper level. Usually, when it is really extraordinary, it is delegated or ... it becomes exception handling.

Imagine, you are in a supermarket, and suddenly smoke and fire appear and for some reason there is no fire alert signals. What would you do? You can try to extinguish fire by yourself, or notify a supermarket employee about the problem and let he handle the situation. It is likely an employee has codified instructions to notify his direct manager or a fire service.

The key point here is that the extraordinary situation is not handled by you, but by a person, who knows how to deal with it. Of course, you can try to handle it by your own, but there might be consequences if you are not the person, responsible for the situation.

Supervising in backend and end-user services

All non-trivial programs have bugs, however most of well-known cloud services are run smoothly and we rarely notice them. This happens, because our programs are externally supervised by devops programs like systemd or runit. Simplified, their job can be described as following: if a program "hangs" kill it and start again, if it exited, just restart it. In any case it leads to program restart.

There is, probably, the hardware supervising team too, and conceptually its job is similar: if a router or server rack does not operate properly, turn it off and then turn it on, i.e. restart.

For the regular end user of desktop application the situation is similar: when a program misbehaves, it is terminated by the user or its operating system, and then, probably the program will be started again by the user.

Despite different domains, the universal pattern is the same: give the buggy application another chance by restarting it.

Why there is no supervising in common C/C++/C#/Java/Perl/... programs

Did you frequently see a desktop program, which works with network, and, when you suddenly plug off the network cable (or turn off wifi router), it continues to operate with some disabled functions, and when you plug the cable back in, the program becomes fully operational as if there was no emergency at all?

Did you frequently see a backend app, which can easily outlive the loss of connection to a database or messages queue or other critical resource? My experience tells me, that level of error handling is very rare, and usually is not even discussed.

"We are not in the military/healthcare/aerospace/nuclear domain". That is true, in short. A little bit more verbose and technical answer is: handling all that exceptional cases requires writing additional special code, which is extremely difficult to test (manually or automatically), it will significantly increase code development and maintenance costs without any significant benefits...

Actors as a solution

Actor is an independent entity with its own lifetime and state, all the interaction with an actor happens only via messaging, including actor start and stop signal. Thus, if something bad happens, e.g., actor enters error state, it shuts self down and sends appropriate message to its supervisor actor.

Let's emphasize the point: all communications with an actor are performed via messaging, and if something wrong occurs, there will be appropriate message too, i.e. messaging is universal.

The normal flow looks like the following (consumer point of view): client-actor sends a request message to service-actor, and when the request processing by the service-actor is completed, the response message is sent back to the client. The supervisor of the service-actor does not participate in the communications; the same as in real life.

The error flow looks like the following from consumer point of view: client-actor sends a request message to service-actor, and it receives a response with error from the service-actor or, if something really terrible has happend, request timeout triggers, which is conceptually the same as receive message with error. The service-actor, however, has a few possibilities: if there is a problem with request, it can just reply with error code; if there is an unrecoverable problem during error processing, it can reply back with error to the client-actor and shut self down, i.e. send down message to its supervisor.

Thus, the error flow is "doubled": (1) the client receives error and takes its own decision what to do with error and (2) the supervisor decides how to deal with an actor's shutdown. Supervisor's decision is usually either to restart the problematic actor (which originally triggered the error), or, if (possibly a several) restarts do not fix the problem (i.e. service-actor still shuts self down), escalate the problem, which means, to shut the supervisor down, to shutdown all child actors, and then send down message to the upstream supervisor for further making decisions.

This approach is very different from widely used exception handling, where there is a context for handling immediate error (1), but there isn't a context for supervising (2). This is not accidental, because using the service (client role) is different from owning the service (supervising role).

A few questions might arise.

How to cope with unexpected or fatal errors? In theory, it is possible not to handle this errors manually (with code) at all, just specifying restart policy for the chosen framework/platform should be enough. It will simply just keep trying with restarts until some reasonable limit is reached, then escalate the problem, restarting the hierarchy of actors an so on... until it is solved, that after all possible attempts have been tried and further trying has no sense, and the problem should be escalated outside of the program, e.g. to human or to operating system.

OK, the service-actors are keeping restarting (supervisor side), does it affect client side? If it is OK for a client side to receive timeout responses when service-actors have already been down and have not been started yet, then the answer is "no". The technical explanation is, that the message destination endpoint is not bind to the concrete actor: in rotor any actor can subscribe to any address, in sobjectizer any agent can subscribe to any message box.

Does supervising tolerates developer errors? It depends on chosen platform. For the Erlang case, with its let it crash principle, developer errors lead to an actor crash, and supervisor can make further decision. For the C++ errors like use-after-free or null pointer dereference or memory leaks cannot be "catch", so they are not recoverable and program crash or memory abuse should be supervised externally by operating system or launchers like systemd.

Can you give more practical examples? Sure. Consider there is a backend application, which has a fast distributed cache and slow network connection to a database. Can the app continue to serve, if the connection is suddenly gone? May be, if it is OK to serve read-only requests via cache, trying to reconnect to DB "in background"; this might be better than just cold restarts of the whole app via system manager. Even if it cant, the time to become operational for the app is fasten than the cold restart, because there is no need of cache reloading. Can the app continue to serve, if the connection to network cache is lost? Surely, it can serve a bit slower is better and it is better than a cold restart. If backend and cache connections handling does not cost a lot in terms of development and maintenance, the approach is definitely worth.

The price

Supervising is not free of charge. If you chose Erlang as a platform, you receive the maximum flexibility in supervising, including tolerance up to let-it-crash and possibility to send messages transparently to actors located on non-local machine. However, the price is quite high, as you have to use rather specific erlang language, the platform itself is slow comparing to native binaries you get when you use C++/Go etc., and, if you want to speed up hot code path via writing native extensions, you immediately loose all the benefits of the platform. Somewhat specific syntax can be mitigated by using Elixir language.

In any case the messaging have to be used for actor environment, and it is not as fast as native methods call: the memory for a message have to be allocated, the message fields have to be filled, the message has to be dispatched etc. In summary a message delivery can be hundred or more times expensive than a native call.

Another indirect costs of using messaging are that a framework has to be used, because sending messages and especially receiving them cannot be performed without a context. For C++ it can be rotor, sobjectizer or C++ actor framework, while erlang it itself a platform and a framework (OTP).

So, what is the total cost ownership of supervising? In theory it is nearly zero cost in terms of writing special code (it should be done for you), but you will be bounded to the platform/framework and the usage of messaging also has its own performance price.

Technical details of supervising in C++

C++ actor framework (aka CAF) is considered the most influenced by Erlang, however, the supervising itself is missing in it. CAF is capable to run a cluster of nodes, each one can run arbitrary number of actors. The strong point of CAF is transparent messaging (actor addressing), i.e. when a message can be sent from one actor to another, independently from their locations, i.e. they can be located on different machines, on the same machine, or in the same process.

The situation with supervising is slightly better with sobjectizer, as it provides entity named cooperation, which has elemental supervising capabilities, such as synchronized actors startup (shutdown): either all actors, belonging to the same cooperation, do all start or no actor starts; and, similar, if an actor from cooperation stops, all actors on the same cooperation stop. It should be noted, that cooperation class is completely belongs to the sobjectizer framework, and it is not possible to override something in it or somehow customize. It is possible to hook actor shutdown event, and send shutdown notification message somewhere, but that's a bit wrong way of building supervising as it requires to handle a lot of things in your actors, which is violation of Single Responsibility Principle. With sobjectizer you can construct hierarchical finite state machines, which are tightly integrated with messaging, or you can use go-like channels for messaging. So, it is still good framework if you need those features.

Supervising is one of the key features of rotor since the beginning. There is the supervisor_t class, which manages its child-actors; it is fully customizable, i.e. child-actor start and stop events can be hooked etc. However, real erlang-like supervising was not part of the microframework until v0.20. In short, since v0.20 it is possible to

1) declaratively specify failure escalation of each actor upon it's construction:

supervisor->create_actor<actor_type>()
    .timeout(timeout)
    .escalate_failure()        /* will shut down in error case */
    .finish();

2) declaratively respawn stopped actor, until some condition is met, and, otherwise escalate failure

namespace r = rotor;
auto actor_factory = [&](r::supervisor_t &supervisor, const r::address_ptr_t &spawner) -> r::actor_ptr_t {
    return sup
        .create_actor<actor_type>()
        .timeout(timeout)
        // other actor properties, probably taken from supervisor
        .spawner_address(spawner)
        .finish();
};

supervisor->spawn(actor_factory)
    .max_attempts(15)                               /* don't do that endlessly */
    .restart_period(boost::posix_time::seconds{10})
    .restart_policy(r::restart_policy_t::fail_only) /* respawn only on failure */
    .escalate_failure()                             /* we did our best, shutdown supervisor */
    .spawn();

The full example of the spawner pattern for ping-pong (where pinger-actor shuts self down upon unsuccessful pong reply) can be seen here.

Conclusion

If something went wrong in your program, give that piece of program another chance, restart it. Maybe it was a temporal network issue, and it can disapper with the next attempt; just wait a little bit and try again, but don't be too assertive. One of the possible ways of organizing your program, into that self-contained pieces, with own resources and lifetime, is to model them as actors, which communicate with each other via messaging. Shape the individual actors into manageable hierarchies with supervisors, which provide fine-gained control of actors at a low-level and at a high-level. Make your program reliable.

terminator.jpg

rotor v0.09 release

Tags:

The original article was published at habr.com in English and Russian. Due to outstanding changes I decided to write a decicated article explaining rotor and key points in a new release.

rotor is a non-intrusive event loop friendly C++ actor micro framework, similar to its elder brothers like caf and sobjectizer. The new release came out under the flag of pluginization, which affects the entire lifetime of an actor.

Actor Linking

The actor system is all about interactions between actors, i.e. sending messages to each other (and producing side effects for the outer world or listening to messages it produces). However, to let a message be delivered to the final actor, the actor should be alive (1); in other words, if actor A is going to send message M to actor B, A should somehow be sure that actor B is online and will not go offline while M is routing.

Before rotor v0.09, that kind of warranty was only available due to child-parent relations, i.e. between supervisor and its child-actor. In this case, an actor was guaranteed that a message would be delivered to its supervisor because the supervisor owned the actor and said supervisor's lifetime covered the respective actor's lifetime. Now, with the release of v0.09, it is possible to link actor A with actor B that are not parent- or child-related to one another and to make sure that all messages will be delivered after successful linking .

So, linking actors is performed somewhat along these lines:

namespace r = rotor;

void some_actor_t::on_start() noexcept override {
    request<payload::link_request_t>(b_address).send(timeout);
}

void some_actor_t::on_link_response(r::message::link_response_t &response) noexcept {
    auto& ec = message.payload.ec;
    if (!ec) {
        // successful linking
    }
}

However, code like this should not be used directly as is... because it is inconvenient. It becomes more obvious if you try linking actor A with 2 or more actors (B1, B2, etc.), since some_actor_t should keep an internal count of how many target actors are waiting for (successful) link responses. And here the pluginization system featured in the v0.09 release comes to the rescue:

namespace r = rotor;

void some_actor_t::configure(r::plugin::plugin_base_t &plugin) noexcept override {
    plugin.with_casted<r::plugin::link_client_plugin_t>(
        [&](auto &p) {
            p.link(B1_address);
            p.link(B2_address);
        }
    );
}

Now, this is much more convenient, since link_client_plugin_t is included out of the box with the rotor::actor_base_t. Nevertheless, it's still not enough, because it does not answer a few important questions, such as: 1. When is actor linking performed (and a "by-question": when is actor unlinking performed)? 2. What happens if the target actor (aka "server") does not exist or rejects linking? 3. What happens if the target actor decides to self-shutdown when there are "clients" still linked to it?

To provide answers to these questions, the concept of actor lifetime should be revisited.

Async Actor Initialization And Shutdown

Represented in a simplified manner is, here is how an actor’s state usually changes: new (constructor) -> initializing -> initialized -> operational -> shutting down -> shut down

The main job is performed in the operational state, and it is up to the user to define what an actor is to do in its up-and-running mode.

In the I-phase (i.e. initializing -> initialized), the actor should prepare itself for further functioning: locate and link with other actors, establish connection to the database, acquire whichever resources it needs to be operational. The key point of rotor is that I-phase is asynchronous, so an actor should notify its supervisor when it is ready (2).

The S-phase (i.e. shutting down -> shut down) is complementary to the I-phase, i.e. the actor is being asked to shut down, and, when it is done, it should notify its supervisor.

While it sounds easy, the tricky bit lies in the composability of actors, when they form Erlang-like hierarchies of responsibilities (see my article on trees of Supervisors). In other words, any actor can fail during its I-phase or S-phase, and that can lead to asynchronous collapse of the entire hierarchy, regardless of the failed actor's location within it. Essentially, the entire hierarchy of actors becomes operational, or, if something happens, the entire hierarchy becomes shut down.

rotor seems unique with its init/shutdown approach. There is nothing similar in caf; in sobjectizer, there is a shutdown helper, which carries a function similar to the S-phase above; however, it is limited to one actor only and offers no I-phase because sobjectizer has no concept of hierarchies (see update below).

While using rotor, it was discovered that the progress of the I-phase (S-phase) may potentially require many resources to be acquired (or released) asynchronously, which means that no single component, or actor, is able, by its own will, to answer the question of whether it has or has not completed the current phase. Instead, the answer comes as a result of collaborative efforts, handled in the right order. And this is where plugins come into play; they are like pieces, with each one responsible for a particular job of initialization/shutdown.

So, here are the promised answers related to link_client_plugin_t:

  • Q: When is the actor linking or unlinking performed? A: When the actor state is initializing or shutting down respectively.
  • Q: What happens if the target actor (aka "server") does not exist or rejects linking? A: Since this happens when the actor state is initializing, the plugin will detect the fail condition and will trigger client-actor shutdown. That may trigger a cascade effect, i.e. its supervisor will be triggered to shut down, too.
  • Q: What happens if the target actor decides to self-shutdown when there are "clients" still linked to it? A: The "server-actor" will ask its clients to unlink, and once all "clients" have confirmed unlinking, the "server-actor" will continue the shutdown procedure (3).

A Simplified Example

Let's assume that there is a database driver with async-interface with one of the available event-loops for rotor, and there will be TCP-clients connecting to our service. The database will be served by db_actor_t and the service for serving clients will be named acceptor_t. The database actor is going to look like this:

namespace r = rotor;

struct db_actor_t: r::actor_base_t {

    struct resource {
        static const constexpr r::plugin::resource_id_t db_connection = 0;
    }

    void configure(r::plugin::plugin_base_t &plugin) noexcept override {
        plugin.with_casted<r::plugin::registry_plugin_t>([this](auto &p) {
            p.register_name("service::database", this->get_address())
        });
        plugin.with_casted<r::plugin::resources_plugin_t>([this](auto &) {
            resources->acquire(resource::db_connection);
            // initiate async connection to database
        });
    }

    void on_db_connection_success() {
        resources->release(resource::db_connection);
        ...
    }

    void on_db_disconnected() {
        resources->release(resource::db_connection);
    }

    void shutdown_start() noexcept override {
        r::actor_base_t::shutdown_start();
        resources->acquire(resource::db_connection);
        // initiate async disconnection from database, e.g. flush data
    }
};

The inner namespace resource is used to identify the database connection as a resource. It is good practice, better than hard-coding magic numbers like 0. During the actor configuration stage (which is part of initialization), when registry_plugin_t is ready, it will asynchronously register the actor address under a symbolic name of service::database in the registry (will be shown further down below). Then, with the resources_plugin_t, it acquires the database connection resource, blocking any further initialization and launching connection to the database. When connection is established, the resource is released, and the db_actor_t becomes operational. The S-phase is symmetrical, i.e. it blocks shutdown until all data is flushed to DB and connection is closed; once this step is complete, the actor will continue its shutdown (4).

The client acceptor code should look like this:

namespace r = rotor;
struct acceptor_actor_t: r::actor_base_t {
    r::address_ptr_t db_addr;

    void configure(r::plugin::plugin_base_t &plugin) noexcept override {
        plugin.with_casted<r::plugin::registry_plugin_t>([](auto &p) {
            p.discover_name("service::database", db_addr, true).link();
        });
    }

    void on_start() noexcept override {
        r::actor_base_t::on_start();
        // start accepting clients, e.g.
        // asio::ip::tcp::acceptor.async_accept(...);
    }

    void on_new_client(client_t& client) {
        // send<message::log_client_t>(db_addr, client)
    }
};

The key point here is the configure method. When registry_plugin_t is ready, it is configured to discover the name service::database and, when found, store it in the db_addr field; it then links the actor to the db_actor_t. If service::database is not found, the acceptor shuts down (i.e. on_start is not invoked); if the linking is not confirmed, the acceptor shuts down, too. When everything is fine, the acceptor starts accepting new clients.

The operational part itself is missing for the sake of brevity because it hasn't changed in the new rotor version: there is a need to define payload and message (including request and response types), as well as define methods which will accept the messages and finally subscribe to them.

Let's bundle everything together in a main.cpp. Let's assume that the boost::asio event loop is used.

namespace asio = boost::asio;
namespace r = rotor;

...
asio::io_context io_context;
auto system_context = rotor::asio::system_context_asio_t(io_context);
auto strand = std::make_shared<asio::io_context::strand>(io_context);
auto timeout = r::pt::milliseconds(100);
auto sup = system_context->create_supervisor<r::asio::supervisor_asio_t>()
               .timeout(timeout)
               .strand(strand)
               .create_registry()
               .finish();

sup->create_actor<db_actor_t>().timeout(timeout).finish();
sup->create_actor<acceptor_actor_t>().timeout(timeout).finish();

sup->start();
io_context.run();

The builder pattern is actively used in the v0.09 rotor. Here, the root supervisor sup was created with 3 actors instantiated on it: the user defined db_actor_t and acceptor_actor_t and implicitly created a registry actor. As is typical for the actor system, all actors are decoupled from one another, only sharing message types (skipped here).

All actors are simply created here, and the supervisor does not know the relations between them because actors are loosely coupled and have become more autonomous since v0.09.

Runtime configuration can be completely different: actors can be created on different threads, different supervisors, and even using different event loops, but the actor implementation remains the same (5). In that case, there will be more than one root supervisor; however, to let them find each other, the registry actor address should be shared between them. This is also supported via the get_registry_address() method of supervisor_t.

Summary

The most important feature of rotor v0.09 is the pluginization of its core. Among other plugins, the most important are: the link_client_plugin_t plugin, which maintains kind of a "virtual connection" between actors; the registry_plugin_t, which allows registering and discovering actor addresses by their symbolic names; and the resources_plugin_t, which suspends actor init/shutdown until external asynchronous events occur.

There are a few less prominent changes in the release, such as the new non-public properties access and builder pattern for actor construction.

Any feedback on rotor is welcome!

PS. I'd like to say thanks to Crazy Panda for supporting me in my actor model research.

Notes

(1) Currently, it will lead to segfault upon attempt to deliver a message to an actor whose supervisor is already destroyed.

(2) If it does not notify, init-request timeout will occur, and the actor will be asked by its supervisor to shut down, i.e. bypass the operational state.

(3) You might ask: what happens if a client-actor does not confirm unlinking on time? Well, this is somewhat of a violation of contract, and the system_context_t::on_error(const std::error_code&) method will be invoked, which, by default, will print error to std::cerr and invoke std::terminate(). To avoid contract violation, shutdown timeouts should be tuned to allow client-actors to unlink on time.

(4) During shutdown, the registry_plugin_t will unregister all registered names in the registry.

(5) With the exception of when different event loops are used, when actors use the event loop API directly, they will, obviously, change following the event loop change, but that's beyond rotor.

Update

During discussings with sobjectizer author below, it was clarified sobjectizer shutdowner and stop guard offer "long lasting" shutdown actions, however it's main purpose to give some actors additional time for shutdown, even if on the Environment stop was invoked. The asynchronous shutdown (and initialization) similar to rotor I-phase and S-phase can be modeled via actor's states, if needed. This is, however, framework users responsibility, contrary to rotor, where it is the framework responsibility.

C++ permission model

Tags:

Abstract

The problems of public, protected and private access are considered. The generic templated access approach with the on-demand specialization for a consumer is proposed; it's advantages and drawbacks are discussed. The synthetic solution satisfactory enough is proposed in the conclusion.

The problem

Out of the box C++ offers "classical" class access model: public properties (fields and methods) are accessible from anywhere, protected properties are accessible to descendant classes only, and finally private properties which permit access to class itself only.

Additionally it is possible to declare friend class (which might be templated) to provide maximum access to all properties (i.e. the same as private). This allows to access to the internals of a class to a related class.

For example, if there is an HTTP-library with Request and Connection classes and Request class would like to access Connection internals, this can be done as:

class Request;  /* forward declare */

enum class Shutdown { read, write, both };
class Connection {
    public:
        virtual void handle() { ... }
    private:
        void shutdown(Shutdown how) { ...;  }
        int skt;
        friend class Request;
};

class Request {
    public:
        virtual void process() {
            ...;
            /* I know what I'm doing */
            conn->shutdown(Shutdown::both);
        }
    protected:
      Connection* conn;
};

Now let's assume that there is an descendant

class HttpRequest: public Request {
    public:
        virtual void process() override {
            conn->shutdown(Shutdown::both); // Oops!
        }
};

Alas, there is no way in C++ to access to Connection::shutdown from it. To overcome this, with the current access model, there are the possibilities. First, it is possible to declare HttpRequest as a friend in the Connection. Whilst this will certainly work, the solution has strict limitation, that it applicable only for single library (project) to code of which you have access. Otherwise, it does not scales at all.

The second possibility if is to "expose" private connection from the Request class to all it's descendants, like:

class Request {
    protected:
        void connection_shutdown(Shutdown how) {  conn->shutdown(how); }
        int& connection_socket() {  conn->skt; }
        const int& connection_socket() const {  conn->skt; }
        Connection* conn;
};

This approach is better, because it scales to all descendant classes which can be located in different libraries. However, the price is quite high as there is need to provide access to all properties apriori even if some properties will not be needed. The more serious drawback is that the approach is limited to class inheritance; in other words, if there is need to access private properties of Connection not from Request's descendants, e.g. for tests.

Somebody might become disappointed at all and try to make everything public by default. This scales well and covers all abovedescribed issues though brings a new ones: the boundary between stable public API interface and private implementation details of a class is blurred and completion suggestions in an IDE can be overloaded with too many variants. In other words the proposed approach is too permissive.

Semantically identical would be to write simple accessors for all private properties; it just brings an illusion of the interface/implementation separation since a class author already exposed all class internals outside.

Let's summarize the requirements for the private properties (aka implementation details):

  • they should scale outside of a library

  • they should be accessible outside of class hierarchy

  • they should not "pollute" the class public API, i.e. somehow be not available by default, still be accessible

The possible solutions

It consists of two pieces, the first one is to declare possibility to access the private fields of a class, e.g.:

// my_library.h
namespace my::library {
    class Connection {
        public:
            virtual void handle() { ... }
            template<typename T...> auto& access() noexcept;
        private:
            void shutdown(Shutdown how) { ...;  }
            int skt;
    };
}

The second piece is actually provide full access specialization in the target place, e.g. :

// my_app.cpp
namespace to {
    struct skt{}; // matches private field
}

namespace my::library {
    auto& Connection::access<to::skt>() noexcept { return skt; }
}

// namespace does not matter
class HttpRequest: public Request {
    public:
        virtual void process() override {
            auto& s = conn->access<to::skt>();  // voila!
            shutdown(s, SHUT_RDWR);
        }
};

In other words, in the source class the generic templated accessor is defined, and in the place, where the access is needed, the specialization is provided as the actual access to the required fields.

The solution meets all requirements, however it still has it's own drawbacks. First, there is need of duplication of const and non-const access, i.e.

class Connection {
    public:
        virtual void handle() { ... }
        template<typename T...> auto& access() noexcept;
        template<typename T...> auto& access() const noexcept;
};

...

namespace my::library {
    auto& Connection::access<to::skt>() noexcept       { return skt; }
    auto& Connection::access<to::skt>() const noexcept { return skt; }
}

Although, you don't have to provide const and non-const access if you need only one.

The second drawback, that to let the approach work for methods, especially those, which return type can't be auto& (e.g. void or int). To overcome it the access should be rewritten as:

class Connection {
    public:
        template<typename T, typename... Args> T access(Args...);
}

namespace my::library {
    void Connection::access<void, Shutdown>(Shutdown how) {
        return shutdown(how);
    }
}

class HttpRequest: public Request {
    public:
        virtual void process() override {
            conn->access<void, Shutdown>(Shutdown::both);
        }
};

Another problem arises: if there are two or more private methods with identical signatures (return and arguments types), the artificial tag should be introduced again, i.e.

class Connection {
    template<typename T, typename Tag, typename... Args> T access(Args...);
};

namespace to {
    struct skt{};
    struct shutdown{};
}

namespace my::library {
    int& Connection::access<int&, to::skt>() { return skt; }
    void Connection::access<void, to::shutdown, Shutdown>(Shutdown how) {
        shutdown(how);
    }
}

...
conn->access<void, to::shutdown>(Shutdown::both); // voila!

The variadic Args... template parameter dos not force to duplicate the original arguments; it can have even add unrelated types to "inject" new methods with additional logic into the Connection class. For example:

namespace to {
    struct fun{}
}

namespace my::library {
    void Connection::access<void, to::fun>() {
        Shutdown how = std::rand() > 1000 ? Shutdown::read ? Shutdown::write;
        shutdown(how);
    }
}

It is known, that methods might have optional noexcept specification in addition to const. So, for the sake of generality, all four access cases should be provided, i.e.:

class Connection {
public:
template<typename T, typename Tag, typename... Args> T access(Args...);
template<typename T, typename Tag, typename... Args> T access(Args...) const;
template<typename T, typename Tag, typename... Args> T access(Args...) noexcept;
template<typename T, typename Tag, typename... Args> T access(Args...) const noexcept;
};

Alas, it was not the last problem with the approach: there is a problem with inheritance, e.g.:

class Connection {
    template<typename T> auto& access();
private:
    int skt;
};

enum class SslState { /* whatever */};

class SslConnection:public Connection {
public:
    template<typename T> auto& access();
private:
    SslState state;
};

namespace to {
    struct skt{};
    struct state{};
}

namespace my::library {
    auto& Connection::access<to::skt>() { return skt; }
    auto& SslConnection::access<to::state>() { return state; }
}

However, as soon as try to access to parent property via child class, i.e.:

SslConnection* conn = ...;
auto& skt = conn->access<to::skt>(); // oops!

It cannot resolve access to socket via SslConnection because there is no to::skt specialization for SslConnection; there is on in it's parent class, but in accordance with C++ rules a compiler does not see it. The solution is to cast to the base class:

SslConnection* conn = ...;
auto& skt = static_cast<Connection*>(conn)->access<to::skt>();

This becomes even more unhandy when an object is stored behind smart pointer.

Let's enumerate key points:

  • accessors multiplication due to const and noexcept variants

  • not so handy access for private methods (too verbose due to multiple template params), although "injection" of own accessor-methods seems an advantage

  • too clumpsy syntax to access private proreties in class hierarchy

Conclusion

The proposed solution is far from perfect. I found the following golden ratio for my projects on the implementation details access topic:

  • if the property is stable enough or it is the part of class interface, then public accessor should be written for it. It would be desirable for read only access, i.e. the accessor should be just a getter. For example, the address property in actor_base in rotor.

  • otherwise, if implementation details might be usable in descendants, make them private

  • provide generic templated accessor (template<typename T> auto& access()) but for properties only; no access to private methods, as I don't see possible use cases now. This point might be different for different projects.

The described approach is applied in to be released soon rotor v0.09.