Optimization of Security-Oriented State Machine Inference using Low-Level Instrumentation Techniques and Checkpointing

Télécom SudParis

Theme Digital Trust

Active Automata Learning

State Machine Inference

Network Protocols

Syscalls

Checkpointing

Practical information

Thesis supervisor

Olivier Levillain

Supervisors

Olivier Levillain (Télécom SudParis/RST)

Thesis supervisory team

SAMOVAR/SCN

More information

Description

Most network stacks, e.g. TCP, TLS, or WPA, rely on an internal state machine, which can either be explicit (with the presence of state variables) or implicit (the actual state may be encoded in the call stack). Sometimes, this state machine deviates from the implemented specification and exhibits interesting bugs, including security flaws such as authentication bypass. In recent years, state machine issues have been identified in many high-profile network [1, 2, 3]. 

One known limitation of this approach, however, is that the inference can be a very long process. Indeed, the learning process consists in sending sequences of messages to the target (usually called the System Under Learning, or SUL), including malformed or inconsistent ones, and wait for the SUL’s answer to characterize its behavior. Since the response can consist of an arbitrary number of messages, we must rely on timeouts and hope we retrieved the complete answer. 

During Aina Rasoamanana’s thesis [4], we explored optimizations leveraging the deterministic aspect of the L* algorithm we rely on in our experiments. Despite promissing results (up to 25-fold speedups), we identified several bottlenecks, also discussed in the litterature [5]: the SUL initialization time, the time spent in context switches and the time wasted in the wait for timeouts. 

To overcome these limitations, parallelization and chepointing have been discussed [6], but the results regarding checkpointing were disappointing. We believe another approach is possible, and could lead to drastic improvements. We propose to rely on low-level instrumentation techniques using syscall capture; a similar idea has been proposed and implemented to improve stateful fuzzing tools [5]; this thesis consists in pushing the idea further in a more complex context. 

To capture syscall, we propose to preload a shared library overriding the interface provided by the C standard library, which would allow our inference tool to monitor closely what the SUL is currently doing. This would thus allow us to skip the timeouts entirely in a reliable way. The next step would be to embed the inference tool in the shared library to get rid of most context switches. Finally, this setup would be ideal to implement a new checkpointing approach, completely removing all the initialization steps. 

An inproved inference process could be fast enough to be included in Continuous Integration pipelines. Another application would be to use a wider set of input messages to explore new territories and produce more complex analyses. There are lots of interesting challenges to tackle to make this work in the general setting (e.g. multi-process and multi-thread programs, unconventional syscall sequences).

Bibliography

[1] Benjamin Beurdouche, Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cédric Fournet, Markulf Kohlweiss, Alfredo Pironti, Pierre-Yves Strub, Jean Karim Zinzindohoue. A Messy State of the Union: Taming the Composite State Machines of TLS. IEEE Symposium on Security and Privacy. 2015. 
[2] Mathy Vanhoef and Frank Piessens. Key Reinstallation Attacks: Forcing Nonce Reuse in WPA2. In Proceedings of the 24th ACM Conference on Computer and Communications Security (CCS). ACM. 2017. 
[3] Aina Rasoamanana, Olivier Levillain, Hervé Debar. Towards a Systematic and Automatic Use of State Machine Inference to Uncover Security Flaws and Fingerprint TLS Stacks. In ESORICS. 2022. 
[4] Aina Rasoamanana. Derivation and Analysis of Cryptographic Protocol Implementations. PhD Thesis. École Doctorale de l’Institut Polytechnique de Paris. 2023. 
[5] Seyed Behnam Andarzian, Cristian Daniele, Erik Poll. Green-Fuzz: Efficient Fuzzing for Network Protocol Implementations. Foundations and Practice of Security - 16th International Symposium. 2023. 
[6] M. Henrix. Performance improvement in automata learning. Master Thesis, Radboud University Nijmegen. 2015. 
[7] Arthur Tran Van. Improving Cryptographic Protocol Implementations using Grammatical Inference. PhD Thesis. École Doctorale de l’Institut Polytechnique de Paris. (to be defended in) 2025.