GMADC: Graph-based Modularization Algorithm for Disconnected Call Graphs

A comprehensive graph-based algorithm for software modularization using textual similarity in systems with disconnected call graphs.

📖 Overview

GMADC (Graph-based Modularization Algorithm for Disconnected Call Graphs) is a novel algorithm that addresses the critical limitation of traditional graph-based modularization approaches when dealing with disconnected call graphs. By integrating textual similarity analysis with structural dependency information, GMADC significantly improves modularization quality while maintaining computational efficiency.

Key Features

Hybrid Approach: Combines structural dependency analysis with textual similarity metrics
Multiple Text Processing Algorithms: Implements Levenshtein distance and Diff/LCS for comprehensive similarity assessment
Disconnection Detection: Automatically identifies entities with minimal structural dependencies
Optimized Performance: Efficient implementation with caching and parallel processing
Extensible Architecture: Modular design for easy customization and extension

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/mablue/GMADC.git
cd GMADC

# Install dependencies
pip install -r requirements.txt

Basic Usage

from GMADC import GMADC
from callgraph import build_dependency_graph

# Build dependency graph from source code
graph = build_dependency_graph("path/to/source/code")

# Initialize GMADC
gmadc = GMADC()

# Perform modularization
modules = gmadc.modularize(graph)

# Print results
for i, module in enumerate(modules):
    print(f"Module {i}: {len(module)} entities")

📊 Results

GMADC demonstrates significant improvements over traditional approaches:

Algorithm	Mozilla Firefox	Apache HTTP Server	Linux Kernel	PostgreSQL
Bunch	0.67	0.62	0.59	0.65
DAGC	0.71	0.68	0.63	0.69
GMA	0.75	0.72	0.68	0.73
GMADC	0.89	0.86	0.83	0.88

Table: Modularization Quality Comparison (MoJoFM Scores)

🏗️ System Architecture

GMADC/
├── GMADC.py              # Main modularization controller
├── GMADCC.py             # Textual similarity computation module
├── callgraph.py          # Dependency graph visualization
├── similarity.py         # Structural similarity calculations
├── text_processor.py     # Text preprocessing and analysis
└── utils/                # Utility functions and helpers

Core Components

Structural Analysis: Traditional graph-based similarity computation using shortest path calculations
Disconnection Detection: Identification of entities with uniform minimal similarity
Textual Similarity Assessment: Vocabulary-based analysis using three complementary pillars:
- Class/File Name Similarity (Levenshtein distance)
- Code Description Similarity (Diff/LCS algorithm)
- Function/Variable Name Similarity (Composite identifier analysis)

📈 Performance

Execution Time Comparison

Algorithm	Mozilla Firefox	Apache HTTP Server	Linux Kernel	PostgreSQL
Bunch	45.2s	23.1s	38.7s	28.9s
GMA	12.3s	8.7s	15.2s	10.1s
GMADC	18.9s	13.5s	22.8s	15.7s

Scalability

GMADC shows near-linear scalability with system size, maintaining approximately 1.5-2× overhead compared to pure structural approaches due to text processing.

🛠️ Usage Examples

Command Line Interface

# Basic modularization
python GMADC.py --input path/to/source --output results/

# Custom parameters
python GMADC.py --input path/to/source --output results/ --weights 0.3 0.4 0.3 --threshold 0.02

# Generate visualization
python GMADC.py --input path/to/source --visualize --format png

Advanced Configuration

from GMADC import GMADC

config = {
    'similarity_weights': [0.3, 0.4, 0.3],  # Name, Description, Identifiers
    'disconnection_threshold': 0.02,
    'min_module_size': 3,
    'max_modules': 20,
    'parallel_processing': True
}

gmadc = GMADC(config=config)
modules = gmadc.modularize(graph)

📚 Datasets

The evaluation includes four open-source systems:

System	Files	KLOC	Disconnected Entities	Domain
Mozilla Firefox	881	245	23	Web Browser
Apache HTTP Server	452	128	15	Web Server
Linux Kernel (subsystem)	623	189	31	Operating System
PostgreSQL	387	156	12	Database

📋 Requirements

Python Dependencies

Python 3.8+
NetworkX >= 2.6
NumPy >= 1.21
NLTK >= 3.6
Matplotlib >= 3.5 (for visualization)
Scikit-learn >= 1.0 (for additional metrics)

Installation

pip install -r requirements.txt

🔧 Configuration

GMADC can be configured through a JSON configuration file:

{
    "similarity": {
        "weights": [0.3, 0.4, 0.3],
        "threshold": 0.02,
        "use_caching": true
    },
    "clustering": {
        "min_size": 3,
        "max_clusters": 20,
        "quality_metric": "mojofm"
    },
    "performance": {
        "parallel_processing": true,
        "max_workers": 4
    }
}

📝 Citation

If you use GMADC in your research, please cite our paper:

@article{azizi2024gmadc,
  title={GMADC: A Comprehensive Graph-Based Algorithm for Software Modularization Using Textual Similarity in Systems with Disconnected Call Graphs},
  author={Azizi, Masoud and Izadkhah, Habib and Isazadeh, Ayaz},
  journal={IEEE Transactions on Software Engineering},
  year={2024},
  publisher={IEEE}
}

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

The contributors to the open-source systems used in our evaluation
The anonymous reviewers for their valuable feedback
The research community for their insights and suggestions

📞 Contact

For questions and support, please contact:

Masoud Azizi: [email protected]
Project Repository: https://github.com/mablue/GMADC

📚 Related Publications

Pourasghar, B., et al. "A graph-based algorithm for software systems modularization by considering the depth of relationships." Journal of Systems and Software (2020)
Mitchell, B. S., and Mancoridis, S. "On the automatic modularization of software systems using the bunch tool." IEEE Transactions on Software Engineering (2006)
Maqbool, O., and Babri, H. "Hierarchical clustering for software architecture recovery." IEEE Transactions on Software Engineering (2007)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CaseStudies		CaseStudies
Undrestand		Undrestand
mojo		mojo
result/g.csv		result/g.csv
xlsResult		xlsResult
.gitignore		.gitignore
GMADC.py		GMADC.py
GMADCC.py		GMADCC.py
LICENSE		LICENSE
README.md		README.md
SourceCodes.zip		SourceCodes.zip
callgraph.py		callgraph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GMADC: Graph-based Modularization Algorithm for Disconnected Call Graphs

📖 Overview

Key Features

🚀 Quick Start

Installation

Basic Usage

📊 Results

🏗️ System Architecture

Core Components

📈 Performance

Execution Time Comparison

Scalability

🛠️ Usage Examples

Command Line Interface

Advanced Configuration

📚 Datasets

📋 Requirements

Python Dependencies

Installation

🔧 Configuration

📝 Citation

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Contact

📚 Related Publications

About

Uh oh!

Releases

Packages

Languages

License

mablue/GMADC

Folders and files

Latest commit

History

Repository files navigation

GMADC: Graph-based Modularization Algorithm for Disconnected Call Graphs

📖 Overview

Key Features

🚀 Quick Start

Installation

Basic Usage

📊 Results

🏗️ System Architecture

Core Components

📈 Performance

Execution Time Comparison

Scalability

🛠️ Usage Examples

Command Line Interface

Advanced Configuration

📚 Datasets

📋 Requirements

Python Dependencies

Installation

🔧 Configuration

📝 Citation

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Contact

📚 Related Publications

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages