Adaptive time-triggered systems use a set of schedules that the system can switch into at runtime. All schedules and their changing conditions are connected in a graph which is traversed based on the system state. If a condition to trigger a schedule change is met at the associated schedule will be changed into. Each schedule is computed w.r.t. the optimal energy usage and fault mitigation under the condition of the observed system state.
Critical domains like avionics and railway to profit from the advantages of online system reconfiguration because the schedule changes, while optional as they are based on the system state, are still planned within the potential system states. As no unpredicted changes can occur, the system's runtime behavior is predetermined by the schedule and thus providing the safety guarantees needed for safety critical systems.
While locally adapting single cores is already common practice, this thesis shows that performing the adaptation on a global level provides more advantages w.r.t. energy savings and fault mitigation. We show that the global approach leads to higher energy savings as the exploitable system state can be used not only by the tile where it occurred but other tiles can also profit.
The adaptation is performed decentralized within the distributed network of tiles of the MPSoC, which is why a common view on the global state is paramount. Before a schedule change decision can be taken a global view must be agreed on.
After introducing the adaptive architecture needed to perform a global adaptation this thesis proposes an agreement protocol for a network on chip. We show how such a protocol can be implemented while keeping the energy and transmission overhead minimal. The challenge within a network on chip is the fact that broadcast protocols are not available and each message has to be sent individually. Given that all tiles need to inform all other tiles about their current state during an agreement, this can to a heavy load on the network. The thesis shows that an additional and dedicated agreement network enables the system to run the agreement protocol without causing transmission delays on the normal network. We further show that by using such a protocol the adaptation can save up to 40\% of it's usual energy consumption.
In the second part of the thesis we will introduce a fault-tolerant version of the protocol. We introduce two fault tolerant architectures that enable the system to use the adaptation under the occurrence of a fault.
We will show that the fault-tolerant version of the protocol is able to cope with arbitrary hardware faults enabling the system to reconfigure itself. By providing the adaptation even if nodes fail, we can enable the system to enter specialized safe states, that can adapt to the specific fault that was observed. The functional system lifetime can be prolonged, as the fault can be handled by the rest of the system without having to enter a minimal functionality safe state.