⚠️ WARNING - USE AT YOUR OWN RISK⚠️ This tool moves, copies, synchronizes, and can DELETE files across your system.
- Always keep current backups
- Start in test mode and review logs before using real data
- Review your
organizer_config.jsoncarefully- Folder sync and deduplication can delete real files when misconfigured
- No warranty is provided - see LICENSE
File Organizer has three main capabilities:
- Organization: Scans your files and creates a tree of soft links under
output_base(e.g.~/organized/), grouped by type, year, and discovered content categories. Original files stay where they are. - Folder Sync: Bidirectionally synchronizes configured folder pairs (e.g. main drive ↔ external backup drive).
- Duplicate Removal (Deduplication): Optionally finds and removes duplicate real files under configured
source_folders.
There are two primary ways to use it:
- Desktop app (
./manage_organizer.sh gui) – recommended; wraps all modes with a GUI. - Command line (
python file_organizer.py ...) – for advanced/terminal-heavy use.
- How to run (CLI):
python file_organizer.py --scan-once
- What it does:
- Uses a local
test/directory and auto-created sample files. - Runs one full organization cycle then exits.
- Uses a local
- Files affected:
- Only under
test/(created by--create-testor lazily on first run). - Real files outside
test/are never touched.
- Only under
- Output:
- Soft-link tree under
test/organized/.
- Soft-link tree under
- How to run (single pass):
python file_organizer.py --REAL --scan-once
- How to run (daemon / continuous):
python file_organizer.py --REAL(desktop app can also start/stop this)
- What it does in one full cycle (
run_full_cycle):- Scans and organizes files (creates/updates soft links under
output_base). - Optionally synchronizes configured
sync_pairs(bidirectional folder sync). - Optionally runs deduplication (if enabled and configured).
- Optionally queues background backups (if enabled).
- Scans and organizes files (creates/updates soft links under
- Files affected:
- Real files in the directories referenced by
source_folders,sync_pairs, andbackup_directories. - Soft links under
output_base(e.g.~/organized/).
- Real files in the directories referenced by
- How to run:
python file_organizer.py --REAL --sync-only
- What it does:
- Skips organization and dedup.
- Runs only folder synchronization for all
sync_pairs.
- Files affected:
- Real files in the folders referenced by
sync_pairs.
- Real files in the folders referenced by
- How to run:
python file_organizer.py --REAL --dedupe-only
- What it does (current implementation):
- Looks at the
source_folderslist inorganizer_config.json. - Recursively scans those directories for real files.
- Groups files by content hash (MD5) and removes duplicates, keeping only the newest copy in each group.
- Looks at the
- Important:
- This operates on original files, not on
~/organizedsoft links. - It does not currently clean up duplicate soft links; it deletes duplicate real files under
source_folders.
- This operates on original files, not on
If you want dedup to do nothing, either:
- Leave
source_foldersunset/empty, or- Set
"enable_duplicate_detection": false.
The README text in earlier versions said dedupe only touched ~/organized soft links; that is no longer accurate for this implementation.
In organizer_config.json you can define drive shortcuts:
"drives": {
"MAIN_DRIVE": "/Users/yourname",
"EXTERNAL_DRIVE": "/Volumes/YourExternalDrive",
"PROTON_DRIVE": "MAIN_DRIVE/ProtonDrive",
"GOOGLE_DRIVE": "MAIN_DRIVE/GoogleDrive/MyFiles"
}- Placeholders like
MAIN_DRIVE,GOOGLE_DRIVE, etc. are resolved at startup. - Nested references are supported, e.g.
PROTON_DRIVEreferencingMAIN_DRIVE. - After resolution, the program rewrites:
"MAIN_DRIVE/dev"→"/Users/yourname/dev""GOOGLE_DRIVE/Documents"→"/Users/yourname/GoogleDrive/MyFiles/Documents"
- After this phase the code should never see the literal string
MAIN_DRIVEagain – only real paths.
sync_pairs describe which folders should be synchronized. The current recommended format uses a folders array of two paths (order doesn’t matter – sync is bidirectional):
"sync_pairs": [
{
"comment": "Example: main dev ↔ external dev",
"folders": ["MAIN_DRIVE/dev", "EXTERNAL_DRIVE/dev"]
},
{
"comment": "Example: documents ↔ Google Drive documents",
"folders": ["MAIN_DRIVE/Documents", "GOOGLE_DRIVE/Documents"]
}
]- Each
foldersentry becomes two absolute paths after drive resolution. - Sync logic (bidirectional):
- If file exists only in A → copy A → B.
- If file exists only in B → copy B → A.
- If file exists in both and B is newer → copy B → A.
- Otherwise → copy A → B.
- Exclusions:
exclude_patternsare respected (e.g..git,node_modules,.tmp*).
There is also an old format (source/target) still supported for backward compatibility, but new configs should prefer folders.
source_folders are only used for deduplication in this implementation:
"source_folders": [
"MAIN_DRIVE/Documents",
"MAIN_DRIVE/Pictures"
]- These are resolved via the
drivessection the same way assync_pairs. - Dedup scans all real files under these directories (recursively, across drives if placeholders point there).
- For each group of identical files (by content hash):
- It keeps the newest file (by modification time).
- It deletes the older copies.
If source_folders is empty or missing, dedup logs:
No source_folders configured - skipping duplicate detection
…and does nothing else.
The organizer writes soft links into output_base (default ~/organized in production, test/organized in test mode):
- Links are grouped by type (
documents/,images/, etc.), year (2024/,2025/), and discovered content categories (python/,budget/, etc.). - Each entry is a symlink pointing back to the original file.
- The organizer never moves your original files as part of the organization step; it only creates/removes symlinks.
Important distinction:
- Organization step → manipulates soft links under
output_base.- Sync and dedup → operate on real files under
sync_pairsandsource_folders.
Core keys in organizer_config.json:
drives: Drive shortcuts, can be nested and used in other paths.sync_pairs: Folder pairs to keep in sync (bidirectional).source_folders: Roots to scan for deduplication of real files.exclude_patterns: Names/patterns to skip (e.g..git,node_modules,.tmp*).output_base: Root for the organized soft-link tree (e.g."~/organized").enable_content_analysis: Turn ML-based category discovery on/off.enable_folder_sync: Enable/disable running the sync step.enable_duplicate_detection: Enable/disable dedup acrosssource_folders.backup_drive_path/backup_directories: Background backup destination and sources.
A minimal production config using drives and sync pairs might look like:
{
"drives": {
"MAIN_DRIVE": "/Users/rod",
"EXTERNAL_DRIVE": "/Volumes/PASSPORT3",
"PROTON_DRIVE": "MAIN_DRIVE/ProtonDrive",
"GOOGLE_DRIVE": "MAIN_DRIVE/GoogleDrive/MyFiles"
},
"sync_pairs": [
{ "folders": ["MAIN_DRIVE/dev", "GOOGLE_DRIVE/dev"] },
{ "folders": ["MAIN_DRIVE/Documents", "PROTON_DRIVE/Documents"] }
],
"exclude_patterns": [
"node_modules", "_build", "deps", "ebin", ".git",
"__pycache__", ".pytest_cache", ".mypy_cache", ".tox",
".venv", "venv", "env", "dist", "build", "target",
".next", ".cache", ".parcel-cache", "coverage", ".nyc_output",
"elm-stuff", ".elixir_ls", ".stack-work", "Photos Library.photoslibrary",
".photoslibrary", "iPhoto Library", "Lightroom", ".bundle", "vendor",
"bundle", "priv/static", ".gradle", ".m2", "tmp/cache", ".tmp*",
".DS_Store", "*.pyc", "*.log", ".Spotlight-V100", ".TemporaryItems",
".fseventsd", ".DocumentRevisions-V100"
],
"output_base": "~/organized",
"enable_content_analysis": true,
"enable_folder_sync": true,
"enable_duplicate_detection": false
}If you later want deduplication of real files, you would add for example:
"source_folders": [
"MAIN_DRIVE/Documents",
"MAIN_DRIVE/Pictures"
],
"enable_duplicate_detection": true…and then run with --REAL (full cycle) or --REAL --dedupe-only (just dedup).
python file_organizer.py [OPTIONS]
Options:
-R, --REAL Run in PRODUCTION mode (default: TEST mode)
--scan-once Run a single organization/sync/dedupe cycle, then exit
--create-test Create test environment under ./test and exit
--sync-only Only synchronize folders (production mode)
--dedupe-only Only run deduplication (production mode)
--config PATH Use a custom config file (default: organizer_config.json)Typical flows:
- Safe test run:
python file_organizer.py --scan-once
- One-shot real run:
python file_organizer.py --REAL --scan-once
- Daemon (continuous real mode):
python file_organizer.py --REAL
- Just sync:
python file_organizer.py --REAL --sync-only
- Just dedupe (real files under source_folders):
python file_organizer.py --REAL --dedupe-only
Logs are written to ~/.file_organizer.log; the desktop app exposes them in a viewer, or you can use:
tail -f ~/.file_organizer.logBefore running in production mode with real files:
- Backups: You have current backups of anything important.
- Tested: You have run at least one full cycle in test mode and reviewed
test/organized/. - Config reviewed:
drives,sync_pairs, and (if used)source_folderspoint only to locations you are comfortable modifying. - Dedup clarity: You understand that current dedup logic deletes real files under
source_folders, not just soft links. - Logs monitored: You know how to watch
~/.file_organizer.logand stop the process if something looks wrong.
If any of the above is unclear, stay in test mode or run with enable_duplicate_detection: false and limited sync_pairs until you’re confident.