2024-12-13
This was a really interesting paper from HotNets '24 because it flips (as I understand it) traditional perspective on its head by validating SDN inputs rather than their outputs. They challenge the assumption typically embedded in the process of validating outputs: the correctness of inputs. This is motivated by reports that find "over one third" of major outages (I presume at Google) were because of incorrect inputs to the SDN controller.
It reminded me of a presentation we had in my networking class (Security and Performance Challenges in Networked Systems) from Steffen Smolka titled P4-Based Automated Reasoning (P4-BAR) for the (Networking) Masses!. The premise of his work is to constrain the configuration space of switch ASICs from a variety of inputs as to reduce the complexity of the SDN controller (and thus make it easier to reason about). It similarly flips the paradigm -- instead of programs programming the ASICs they run on, ASICs "abide" by the programs themselves. I think his graphic is very intuitive. From the controller's perspective, the ASIC becomes very predictable despite its inherent complexity.
P4-BAR uses static tests to validate whether P4 programs run as intended on the ASICs. It's a case of validating outputs. I take this tangent to illustrate the novelty of Krentsel's paper where we shift focus to validating the inputs of the SDN controller instead.
The paper addresses how it is even possible for the SDN controller to receive "incorrect" inputs given that it "reads network state directly from routers". As always, it comes down to bugs. Bugs in network operator code and even in the underlying fabric itself. They observe that network operators rely on ad-hoc checks that are usually static and don't reflect the dynamism of the network--thus they are difficult to manage and largely insufficient.
A key observation is that
Perhaps more fundamentally, our analysis reveals that inputs are often incorrect not because they cannot possibly occur or are unlikely to occur, but because they are not currently occurring; i.e., they do not reflect the current state of the network.
...
We argue that input validation must be based on dynamic invariants that ensure an SDN controller’s inputs reflect current network state.
once again indicating a need for dynamic as opposed to static tests. Because the current network state, as observed by the SDN controllers through different signals, might not be completely correct, they rely on the symmetry inherent in networked systems. Most obviously, they use the conservation of flow (ie bytes_in $\approx$ bytes_out).
(nice name)
Design
Hodor's redundancy checks
For $R_4$, I'm curious whether this can affect the accuracy of the network signals, especially since they "believe that the surface area for bugs that Hodor introduces is relatively small". It certainly seems unlikely for a breaking bug to be introduced in Hodor, but the claim that "Hodor does not process or aggregate signals but only reads and compares them" should be accompanied by an asterik if they are sending active probes. If, however improbable, the probe sending rate is bugged, perhaps it can significantly affect the fidelity of the signals.
The flow conservation constraint is cool and succinct.
$$\forall v \in V, \sum_{e \in E_{in}(v)} counter(e) = \sum_{e \in E_{out}(v)} counter(e)+dropped(v)$$
where $V$ are the vertices/routers and $E$ are the physical links connecting them. This allows you to solve up to $|V| - 1$ unknowns.
My reading of this paper coincided with the poster presentations in networking class. One of the graduate students had been working on a new project to automate constraint mining from network data. The search space for symbolic regression (SR) is massive and automating the task (especially when results need to be sound) is difficult because of the computational complexity. This paper seemed tangentially connected to his work. The authors mention symbolic regression, but disregard it because they "may capture spurious relationships... that are not fundamental to the system's operation", essentially arguing that the the utility of the captured results rely too heavily on the fidelity of underlying raw data. I feel like this particular concern might be overexaggerated, but do recognize the difficulty of the problem. This is just to say that I feel like applying symbolic regression to hardened network signals (as in Hodor) could be quite interesting to investigate, and that I'm actively learning about the area.
The Case for Validating Inputs in Software-Defined WANs
Published: 2024
https://doi.org/10.1145/3696348.3696874