This repository contains the source code of the paper ``One Automaton To Rule Them All: Beyond Multiple Regular Expressions Execution'' where we deal with the problem of efficient Regular Expressions (REs) matching. The key idea of this work is to exploit REs morphological similarities within the same dataset to enable memory reduction when storing the patterns and improving the dataset-matching throughput. This idea is expressed through:
- the Multi-RE Finite State Automata (MFSA) model that extends the Finite State Automata (FSA) model to improve REs parallelization by leveraging similarities within a specific application ruleset
- a multi-level compilation framework to manage REs merging and optimization to produce MFSA(s) automatically
- an execution engine called iMFAnt, which is an extension of the state-of-the-art iNFAnt algorithm
- Repo Overview
- CGO'24 AE Instructions
- Artifacts Setup
- Running the Experiments
- Charts Visualization
- Customization Options
- Credits and Contributors
- benchmark_scriptsand- benchmark_resultscontain the scripts to evaluate datasets similarity and the numericalresult of such evaluation, respectively (Fig. 1).
- datasetscontains the RE datasets
- frameworkcontains the compilation framework
- input_streamscontains the data streams to match
- matchingcontains the iMFAnt engine
- mfsacontains the automata in ANML format, resulting from the compilation framework
- output_matchescontains the numeric results of MFSAs analysis (time, #matches)
- plotscontains the plots resulting from the execution of the scripts in- scriptsfolder
- plotting scriptscontains the scripts for the plotting of the numerical results
- raw_resultscontains the numerical results corresponding to the plots
- scriptscontains the scripts to plot the compression, compilation time, execution time and throughput results (Figs. 7, 8, 9, 10)
To customize the experiments go to the customization section
Run the scripts in iMFAnt/scripts folder to reproduce the experimental results reported in the paper.
These include the automatic generation of the plots with the default merging factors (0, 1, 2, 5, 10, 20, 50, 100)
The following instructions are to reproduce the CGO'24 results. The fast way is to go to the artifacts setup
An x86 CPU is mandatory for the Docker setup, other CPUs are usable in general but not tested.
Clone the repository:
git clone https://github.com/necst/iMFAnt.git
and navigate into the repo folder
cd iMFAnt
Please follow one of the two setups: 1) native Ubuntu-based machine; 2) any OS with Docker and then to the Running the Experiments.
To execute CGO'24 AE are required
- make cmake
- flex
- bison
- g++
- openmp(included in- g++installation)
- python3.10with- numpy pandas matplotlib
A possible command to install everything you need:
sudo apt-get install -y locales curl wget tar sudo git apt-utils nano
sudo apt-get install -y g++ python3 python3-pip automake make cmake flex bison
sudo pip3 install numpy pandas matplotlib
navigate into the scripts folder
cd scripts/
It is supposed you have installed and configured Docker according to the docker instructions. An x86 CPU is mandatory.
Open a terminal and then run the script
run_docker.sh
that will build and run the docker container.
navigate into the scripts folder
cd iMFAnt/scripts/
All the experiments take a shorter time than expected if you open them and reduce the number of repetitions, changing the value of -r flag in the execution of python3 merging.py -r XXX.
More info on experiment customization here
Each experiment will execute and collect the corresponding results and concur to the paper figure.
To reproduce Fig. 7 (ETA: 5 min)
./compression.sh
To visualize these results go here
To reproduce Fig. 8 (ETA: 30 min with 30 reps)
./compilation_time.sh
To visualize these results go here
To reproduce Figs. 9 and 10 (ETA: 12 h with 15 reps)
./iMFAnt_performance.sh
Figure 9 and 10 are expected to display different optimal merging factor according to testing machine characteristics. Anyhow, the MFSAs are expected to improve the simple multi-thread scaling according to the trends in the paper.
To visualize these results go here
The scripts run automatically the desired experiments and you will find the corresponding PDF plots in  iMFAnt/plots folder.
If you follow the Docker setup, exit the container and you will find there the results.
Then open the PDFs as you prefer.
- To merge a new dataset save the RE file and the input stream file in datasetandinput_streamsfolders, directly. Add the corresponding directory intoframework/compiler/merging.pyfile, ininputRElist and run it to perform the merging.
- To pattern-match a new dataset, add the corresponding mfsaandinput_streamsdirectories toimfant.pyfile and run it, ensuring that the merging factors are consistent with the ones employed during the merging.
- 
The following options are available for the compilation of REs in framework/re2automata/re2automata.cpp:Variable Meaning STATES if 1 eval. states compression, else eval. trans. compression ANML if 1 output ANML file, else output DOT file STACK_TIME if 1 eval. compilation time V if 1 enable verbose mode These parameters are active by default. 
- 
Enter the compilation framework and compile the compiler: cd framework/compilerand build the compiler, entering the following:
cd framework/compiler
bison -d compiler.yy -v
flex -o compiler.yy.cc compiler.lex
- At compile-time, set the desired values of STATESandSTACK_TIMEaccording to the values in the table:
g++ -o compiler compiler.yy.cc compiler.tab.cc ast.cpp ../re2automata/re2automata.cpp -DSTATES=1 -DSTACK_TIME=0 -w -std=c++11 -g -O3
- Run
python merging.py -r REPS -b [M1 ... Mn]
to generate the automata for the benchmark datasets. -r indicates the number of repetitions to evaluate the compilation time, and -b indicates the list of merging factors to reproduce the results in the paper. In practice, any value for M is valid. To merge the entire dataset, set M=0. Adjust the merging factors consistently in matching/imfant.py to run the pattern matching engine.
- The generated automata are in mfsa
- The compression results are in raw_results
- Compile the matching algorithm:
cd ../../matching/ && make clean all
- To match a single folder of MFSAs run:
export OMP_NUM_THREADS=N; make && ./multithreaded_imfant STREAM_IN + MFSA_DIR NUM OUTPUT_FILE
where N is the number of threads, STREAM_IN is the input stream to be matched, MFSA_DIR is the directory of the folder containing the MFSAs to match, NUM is the number of MFSAs to match in that directory, and OUTPUT_FILE contains the results of the matching (matching time, #matches).
Contributors: Luisa Cicolini, Filippo Carloni, Marco Domenico Santambrogio, Conficconi Davide
If you find this repository useful, please use the following citation(s):
@inproceedings{cicolini2024one,
  title={One Automaton to Rule Them All: Beyond Multiple Regular Expressions Execution},
  author={Cicolini, Luisa and Carloni, Filippo and Santambrogio, Marco D and Conficconi, Davide},
  booktitle={2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)},
  pages={193--206},
  year={2024},
  organization={IEEE}
}