-
Notifications
You must be signed in to change notification settings - Fork 0
Automata Learning (Advanced)
Going a bit more in-depth, the learning is done by the Learner asking 2 different types of queries to the Teacher: membership and equivalence queries. With the answers to these queries, the Learner can extract behavioral information about the SUL and refine its hypothesis automaton (the automaton which the learner believes models the SUL’s behavior).
A membership query tests whether a sequence of input symbols is accepted by the SUL and what output is generated.
An equivalence query checks whether the hypothesis automaton constructed up till now correctly represents the behavior of the SUL. If it does, the learning procedure is successfully completed and the generated model correctly describes the SUL’s behavior. If it doesn't, a counterexample will be returned and the learning process continues.
Furthermore, we often use 2 different kind of alphabets; an abstract and a concrete alphabet. In HermieLab, abstract input symbols are the (generated) ID of symbols, concrete input symbols are pairs of method invocations (one method to cause behavior and one method to observe the results of this behavoir), the concrete output alphabet is a collection of Optional and the abstract output alphabet is made out of strings of booleans.
Automata learning has been applied create behavioral models for a plentitude of systems. Below a few examples can be found:
- Web applications
- Communication Protocol entities
- Bank cards (EMV protocol)
- The new biometric European passport
- Botnets
- Embedded Control Software of Printers
There are six main challenges which one faces when using active automata learning to learn real-world systems (inspired by this paper, I added the challenge of input applicability).
Queries consisting of abstract input symbols have to be translated to queries consisting of concrete input symbols. Furthermore, an interface to apply these queries on the SUL and observe their results has to be established.
Not all concrete input symbols can be executed by the SUL at any given moment. For example, a concrete input symbol representing a button press could not be executed if that button is not visible on screen.
Small learning experiments generally require only a few hundred membership queries, however learning real-world systems often requires several orders of magnitude more. For example, in this study where they learned the embedded control software of océ printers and copiers they needed around 30.000.000 membership queries to learn the complete model. Additionally, the speed of which the experimental simulation test environments can process membership queries is much faster than that of realistic test environments, in which a query could cost as much as a second or a minute to be fully answered.
Queries have to be independent. To achieve independence SULs might require to be reset, which can be a very time expensive task, especially for server-based systems or systems interacting with databases.
The abstract input and abstract output alphabets need to be a suitable abstraction of concrete system alphabet. However, since the concrete system alphabet contains all possible actions with all possible parameters we sometimes have almost infinite alphabets.
Realizing equivalence queries through testing is typically unrealistic, in particular when learning a black-box system. In practice, equivalence queries will have to be approximated using membership queries.
Due to equivalence queries having to be approximated through membership queries, black-box active learning in practice is inherently neither correct nor complete. Without assuming extra knowledge, e.g. about the number of states of the system under learning the possibility of not having tested extensively enough will always remain and the model could have incorrect or missing states and transitions.