Automata Learning (Advanced)

Going a bit more in-depth

Learning Setup

Now going a bit more in-depth, //Explain Mapper + Abstract/Concrete Alphabets.

Furthermore the learning is done by the Learner asking 2 different types of queries to the Teacher: membership and equivalence queries. With the answers to these queries, the Learner can extract behavioral information about the SUL and refine its hypothesis automaton (the automaton which the learner believes models the SUL’s behavior).

A membership query tests whether a sequence of input symbols is accepted by the SUL and what output is generated.

An equivalence query checks whether the hypothesis automaton constructed up till now correctly represents the behavior of the SUL. If it does, the learning procedure is successfully completed and the generated model correctly describes the SUL’s behavior. If it doesn't, a counterexample will be returned and the learning process continues.

Examples of automata learning in practice

Automata learning has been applied create behavioral models for a plenitude of systems. Below a few examples can be found, more can be seen in [X].

Web applications [X]
Communication Protocol entities [X] (TSP)
CTI systems [X]
The new biometric European passport [X]
Botnets [X]
Embedded Control Software [X]

Challenges in practical applications

There are six main challenges which one faces when using active automata learning to learn real-world systems (partly inspired by [10] [21]).

Interacting with real systems:

Queries consisting of abstract input symbols have to be translated to queries consisting of concrete input symbols. Furthermore, an interface to apply these queries on the SUL and observe their results has to be established.

Input applicability:

Quantity of required membership queries:

Small learning experiments generally require only a few hundred membership queries, however learning real-world systems often requires several orders of magnitude more [10]. EX: Case learning A piece of Embedded Control Software of printers and copiers cost around 30.000.000 membership queries. Additionally, the speed of which the experimental simulation test environments can process membership queries is much faster than that of realistic test environments, in which a query could cost as much as a second or a minute to be fully answered.

Independence of queries:

Queries have to be independent. To achieve independence SULs might require to be reset, which can be a very time expensive task, especially for server-based systems or systems interacting with databases.

Parameter and value domains:

The abstract input and abstract output alphabets need to be a suitable abstraction of concrete system alphabet. However, since concrete system alphabet -> all possible actions with all possible parameters and all possible. As a consequence, infinite alphabets.

Realizing equivalence queries:

Realizing equivalence queries through testing is typically unrealistic, in particular when learning a black-box system. In practice, equivalence queries will have to be approximated using membership queries.

Due to equivalence queries having to be approximated through membership queries, black-box active learning in practice is inherently neither correct nor complete. Without assuming extra knowledge, e.g. about the number of states of the system under learning the possibility of not having tested extensively enough will always remain and the model could have incorrect or missing states and transitions.