Skip to content

Commit 2292ae4

Browse files
authored
Merge branch 'master' into testing-branch
Signed-off-by: Benji <[email protected]>
2 parents 8b99b68 + dede713 commit 2292ae4

File tree

1 file changed

+31
-240
lines changed

1 file changed

+31
-240
lines changed

README.md

Lines changed: 31 additions & 240 deletions
Original file line numberDiff line numberDiff line change
@@ -1,246 +1,37 @@
1-
# AnalyzeMFT
1+
### Brief Introduction
2+
**AnalyzeMFT** is a Python-based tool designed for parsing and analyzing the NTFS Master File Table (MFT).
23

3-
AnalyzeMFT is a comprehensive Python tool for parsing and analyzing NTFS Master File Table (MFT) files. It converts binary MFT data into human-readable formats suitable for digital forensics investigations, file system analysis, and incident response activities.
4-
5-
## Project Synopsis
4+
It transforms raw binary MFT data into structured, human-readable output suitable for digital forensics, incident response, and file system analysis. The tool supports a wide range of output formats and provides detailed metadata extraction, enabling investigators to examine file timestamps, attributes, and structural properties of NTFS volumes.
65

7-
This tool provides forensic analysts and security professionals with the ability to extract detailed file system metadata from NTFS volumes. AnalyzeMFT supports multiple output formats, configurable analysis profiles, and advanced features including hash computation, chunked processing for large files, and comprehensive timeline analysis.
8-
9-
The project has evolved from a simple MFT parser into a full-featured forensic analysis platform with SQLite database integration, multiprocessing support, and extensive export capabilities.
6+
The primary purpose of AnalyzeMFT is to assist forensic analysts in reconstructing file system activity by decoding MFT records. Each record contains critical information such as file names, creation and modification times, file sizes, and directory relationships. The tool handles both active and deleted entries, allowing for comprehensive timeline analysis and artifact recovery. It includes robust error handling to manage corrupted or incomplete MFT entries, ensuring reliable processing even on damaged file systems.
107

11-
## Features
12-
13-
### Core Analysis
14-
- Parse NTFS MFT files of any size
15-
- Extract file metadata, timestamps, and attributes
16-
- Support for all standard NTFS attribute types
17-
- Comprehensive record parsing with error handling
18-
19-
### Export Formats
20-
- CSV (Comma Separated Values)
21-
- JSON (JavaScript Object Notation)
22-
- XML (eXtensible Markup Language)
23-
- SQLite database with relational schema
24-
- Excel spreadsheets (.xlsx)
25-
- Body file format (for mactime)
26-
- TSK timeline format
27-
- Log2timeline CSV format
28-
29-
### Performance Features
30-
- Streaming/chunked processing for large MFT files
31-
- Multiprocessing support for hash computation
32-
- Configurable chunk sizes for memory optimization
33-
- Progress tracking for long-running operations
34-
35-
### Analysis Profiles
36-
- **Default**: Standard analysis suitable for most use cases
37-
- **Quick**: Minimal processing for rapid triage
38-
- **Forensic**: Comprehensive analysis with all metadata
39-
- **Performance**: Optimized settings for large MFT files
40-
41-
### Advanced Features
42-
- Hash computation (MD5, SHA256, SHA512, CRC32)
43-
- Configuration file support (JSON/YAML)
44-
- Test MFT generation for development and training
45-
- Extensible plugin architecture
46-
- Cross-platform compatibility
47-
48-
## Requirements
49-
50-
- Python 3.8 or higher
51-
- Required dependencies listed in requirements.txt
52-
- Optional: PyYAML for YAML configuration support
53-
54-
## Installation
55-
56-
### From Source
57-
```bash
58-
git clone https://github.com/rowingdude/analyzeMFT.git
59-
cd analyzeMFT
60-
pip install -e .
61-
```
62-
63-
### Dependencies
64-
Install required dependencies:
65-
```bash
66-
pip install -r requirements.txt
67-
```
68-
69-
Optional dependencies:
70-
```bash
71-
pip install PyYAML # For YAML configuration support
72-
```
73-
74-
## Usage
75-
76-
### Basic Usage
77-
```bash
78-
# Analyze MFT and export to CSV
79-
python analyzeMFT.py -f /path/to/MFT -o output.csv
80-
81-
# Export to SQLite database
82-
python analyzeMFT.py -f /path/to/MFT -o database.db --sqlite
83-
84-
# Use forensic analysis profile
85-
python analyzeMFT.py -f /path/to/MFT -o output.csv --profile forensic
86-
87-
# Compute file hashes during analysis
88-
python analyzeMFT.py -f /path/to/MFT -o output.csv --hash
89-
```
90-
91-
### Advanced Usage
92-
```bash
93-
# Use configuration file
94-
python analyzeMFT.py -f /path/to/MFT -o output.csv --config config.json
95-
96-
# Process large files with custom chunk size
97-
python analyzeMFT.py -f /path/to/MFT -o output.csv --chunk-size 500
98-
99-
# Generate test MFT for development
100-
python analyzeMFT.py --generate-test-mft test.mft --test-records 1000
101-
102-
# List available analysis profiles
103-
python analyzeMFT.py --list-profiles
104-
```
105-
106-
### Command Line Options
107-
```
108-
Usage: analyzeMFT.py -f <mft_file> -o <output_file> [options]
109-
110-
Export Options:
111-
--csv Export as CSV (default)
112-
--json Export as JSON
113-
--xml Export as XML
114-
--excel Export as Excel
115-
--body Export as body file (for mactime)
116-
--timeline Export as TSK timeline
117-
--sqlite Export as SQLite database
118-
--tsk Export as TSK bodyfile format
119-
120-
Performance Options:
121-
--chunk-size=SIZE Number of records per chunk (default: 1000)
122-
-H, --hash Compute hashes (MD5, SHA256, SHA512, CRC32)
123-
--no-multiprocessing-hashes
124-
Disable multiprocessing for hash computation
125-
--hash-processes=N Number of hash computation processes
126-
127-
Configuration Options:
128-
-c FILE, --config=FILE
129-
Load configuration from JSON/YAML file
130-
--profile=NAME Use analysis profile (default, quick, forensic, performance)
131-
--list-profiles List available analysis profiles
132-
--create-config=FILE
133-
Create sample configuration file
134-
135-
Verbosity Options:
136-
-v Increase output verbosity
137-
-d Increase debug output
138-
```
139-
140-
## Output Example
141-
142-
```
143-
Starting MFT analysis...
144-
Processing MFT file: /evidence/MFT
145-
Using chunk size: 1000 records
146-
MFT file size: 83,886,080 bytes, estimated 81,920 records
147-
Processed 10000 records...
148-
Processed 20000 records...
149-
MFT processing complete. Total records processed: 81,920
150-
Writing output in csv format to analysis_results.csv
151-
Analysis complete.
152-
153-
MFT Analysis Statistics:
154-
Total records processed: 81,920
155-
Active records: 45,231
156-
Directories: 12,847
157-
Files: 69,073
158-
Unique MD5 hashes: 31,256
159-
Analysis complete. Results written to analysis_results.csv
160-
```
161-
162-
## Upcoming Features
163-
164-
The following enhancements are planned for future releases:
165-
166-
### Performance & Scalability
167-
- Parallel processing for record parsing
168-
- Progress bars with ETA calculations
169-
- Enhanced memory optimization
170-
171-
### Analysis Features
172-
- Anomaly detection for timeline gaps
173-
- Suspicious file size detection
174-
- Parent-child directory tree mapping
175-
- Orphaned file detection
176-
- Timestamp comparison analysis
177-
178-
### User Experience
179-
- Date range filtering
180-
- File type and size filtering
181-
- Interactive web interface
182-
- Enhanced CLI with auto-completion
183-
184-
### Export & Integration
185-
- STIX/TAXII format support
186-
- Elasticsearch/Splunk integration
187-
- Neo4j graph database export
188-
- Custom field selection
189-
190-
## Contributing
191-
192-
Contributions are welcome and encouraged. To contribute:
193-
194-
### Requirements for Contributions
195-
- Python 3.8+ compatibility
196-
- Comprehensive unit tests for new features
197-
- Type hints for all new code
198-
- Documentation for new functionality
199-
- Cross-platform compatibility (Windows, Linux, macOS)
200-
201-
### Development Setup
202-
```bash
203-
git clone https://github.com/rowingdude/analyzeMFT.git
204-
cd analyzeMFT
205-
pip install -e .
206-
pip install -r requirements-dev.txt
207-
```
208-
209-
### Testing
210-
```bash
211-
# Run all tests
212-
pytest tests/
213-
214-
# Run with coverage
215-
pytest tests/ --cov=src --cov-report=html
216-
```
217-
218-
### Code Quality
219-
- Follow PEP 8 style guidelines
220-
- Use type hints throughout
221-
- Maintain test coverage above 80%
222-
- Document all public APIs
223-
224-
### Submitting Changes
225-
1. Fork the repository
226-
2. Create a feature branch
227-
3. Make your changes with tests
228-
4. Ensure all tests pass
229-
5. Submit a pull request
230-
231-
## Version
232-
233-
Current version: 3.1.0
234-
235-
## Author
236-
237-
Benjamin Cance ([email protected])
238-
239-
## License
240-
241-
Copyright Benjamin Cance 2024
242-
243-
Licensed under the MIT License. See LICENSE.txt for details.
8+
### Outputs
9+
Multiple output formats are supported to integrate with common forensic workflows. Users can export results as CSV, JSON, XML, or Excel files for review and reporting. For timeline analysis, the body file format compatible with mactime and other tools is available. SQLite export creates a relational database structure for querying and long-term storage. The TSK timeline and log2timeline CSV formats allow direct ingestion into established forensic platforms.
10+
11+
### Optimization
12+
Performance optimizations are built into the tool to handle large MFT files efficiently. Processing occurs in configurable chunks to manage memory usage, particularly important when analyzing MFTs that are hundreds of megabytes in size. Multiprocessing is used during hash computation to reduce processing time. Users can adjust the number of worker processes and chunk size based on system resources. Progress indicators provide real-time feedback during long-running operations.
13+
14+
### Features
15+
The tool supports configurable analysis profiles to suit different operational needs. The default profile provides balanced processing for general use. The quick profile minimizes processing overhead for rapid triage. The forensic profile enables maximum data extraction, including all timestamp variants and extended attributes. The performance profile adjusts internal settings to prioritize speed and resource efficiency on large datasets.
16+
17+
Hash computation is available for file record attributes that include data runs. MD5, SHA256, SHA512, and CRC32 hashes can be generated for resident and non-resident data. This feature supports file identification and integrity verification. Hashing runs in parallel by default, with the number of processes configurable. Users can disable multiprocessing if running in constrained environments.
18+
19+
Configuration is managed through command-line options or external files in JSON or YAML format. A configuration file can define output settings, analysis profiles, hash options, and filtering criteria. Sample configuration files can be generated using the --create-config option. The --list-profiles option displays all available built-in profiles and their descriptions.
20+
21+
Input is specified using the -f option followed by the path to the MFT file. Output format is determined by the file extension or explicit export flags. The -o option sets the output destination. When exporting to SQLite, the --sqlite flag must be used. For CSV output, no additional flag is required if the output file ends in .csv.
22+
23+
A test MFT generator is included for development and training purposes. Using the --generate-test-mft option, users can create synthetic MFT files with a specified number of records. This feature is useful for validating tool functionality, testing parsers, or creating demonstration data.
24+
25+
Command-line options include verbosity controls with -v for increased output and -d for debug-level logging. These help diagnose issues during processing. Export options allow selection of format without relying on file extensions. Performance tuning options include --chunk-size for record batch size and --hash-processes to set the number of hashing threads.
26+
27+
The tool includes a structured help system. Running the script with --help displays all available options and their descriptions. The usage summary shows required and optional arguments. Detailed explanations are provided for each category of options, including export, performance, configuration, and debugging settings.
28+
29+
### Development
30+
Future development will focus on improving processing speed through parallel parsing of MFT records. Enhanced progress reporting with estimated time to completion will be added. Memory management will be further optimized for systems with limited RAM. New analysis features will include detection of timestamp anomalies, orphaned records, and directory hierarchy reconstruction.
31+
32+
Planned export formats include STIX/TAXII for threat intelligence sharing, and integration with Elasticsearch and Splunk for centralized log analysis. Graph database export to Neo4j will enable visualization of file system relationships. Users will be able to filter output by date range, file type, and size directly within the tool. An interactive mode may be introduced to allow step-by-step examination of records.
33+
34+
Contributions to the project are accepted via GitHub pull requests. Developers must ensure compatibility with Python 3.8 and above. All new code must include type hints and comprehensive unit tests. The test suite is run using pytest, and coverage must remain above 80%. Code should follow PEP 8 guidelines and be cross-platform compatible. Documentation must be updated for any new features or changes.
24435

24536
## Disclaimer
24637

0 commit comments

Comments
 (0)