Skip to content

Commit 437418c

Browse files
authored
Merge pull request #11 from PieceWiseProjects/feature/standardize-codebase
feat: standardize codebase and fix documentation for v1.0.2
2 parents 88291e3 + a89ea10 commit 437418c

File tree

12 files changed

+1336
-228
lines changed

12 files changed

+1336
-228
lines changed

.gitignore

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,5 +179,56 @@ __pycache__
179179
*.egg-info
180180
aws
181181
*.egg
182-
.iml
182+
*.iml
183+
184+
# Project-specific test files
185+
test_*.py
186+
temp_*.py
187+
example_*.py
188+
189+
# Animation and media files (keep Animation.gif but exclude others)
190+
*.mp4
191+
*.mov
192+
*.avi
193+
194+
# Temporary and backup files
195+
*.tmp
196+
*.temp
197+
*.bak
198+
*.swp
199+
*~
200+
201+
# macOS specific
202+
.DS_Store
203+
.DS_Store?
204+
._*
205+
.Spotlight-V100
206+
.Trashes
207+
208+
# Windows specific
209+
Thumbs.db
210+
ehthumbs.db
211+
Desktop.ini
212+
213+
# VS Code
214+
.vscode/
215+
*.code-workspace
216+
217+
# PyCharm (uncommented for safety)
218+
.idea/
219+
220+
# Additional coverage and linting
221+
.ruff_cache/
222+
.mypy_cache/
223+
htmlcov/
224+
.coverage*
225+
226+
# Build and dist artifacts
227+
build/
228+
dist/
229+
*.egg-info/
230+
231+
# Documentation build
232+
docs/_build/
233+
docs/build/
183234

.pre-commit-config.yaml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v4.4.0
4+
hooks:
5+
- id: trailing-whitespace
6+
- id: end-of-file-fixer
7+
- id: check-yaml
8+
- id: check-added-large-files
9+
- id: check-merge-conflict
10+
- id: debug-statements
11+
12+
- repo: https://github.com/psf/black
13+
rev: 23.12.1
14+
hooks:
15+
- id: black
16+
language_version: python3.9
17+
18+
- repo: https://github.com/pycqa/isort
19+
rev: 5.13.2
20+
hooks:
21+
- id: isort
22+
args: ["--profile", "black"]
23+
24+
- repo: https://github.com/astral-sh/ruff-pre-commit
25+
rev: v0.1.8
26+
hooks:
27+
- id: ruff
28+
args: [--fix, --exit-non-zero-on-fix]
29+
30+
- repo: https://github.com/pre-commit/mirrors-mypy
31+
rev: v1.7.1
32+
hooks:
33+
- id: mypy
34+
additional_dependencies: [types-all]
35+
args: [--strict, --ignore-missing-imports]
36+
37+
- repo: local
38+
hooks:
39+
- id: pytest
40+
name: pytest
41+
entry: uv run pytest
42+
language: system
43+
types: [python]
44+
pass_filenames: false
45+
always_run: true

README.md

Lines changed: 27 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# 🕒 formatify
1+
# formatify
22

33
---
44

5-
> 🧠 Auto-detect and standardize messy timestamp formats.
5+
> Auto-detect and standardize messy timestamp formats.
66
> Perfect for log parsers, data pipelines, or anyone tired of wrestling with inconsistent datetime strings.
77
88
[![PyPI version](https://img.shields.io/pypi/v/formatify_py.svg)](https://pypi.org/project/formatify_py)
@@ -20,7 +20,7 @@
2020

2121
---
2222

23-
## ⚠️ Problem
23+
## Problem
2424

2525
Ever pulled in a CSV or log file and found timestamps like this?
2626

@@ -36,7 +36,7 @@ How do you reliably infer and **standardize** them — especially when:
3636

3737
---
3838

39-
## Solution
39+
## Solution
4040

4141
`formatify` infers the datetime format(s) from a list of timestamp strings and gives you:
4242

@@ -49,7 +49,7 @@ No dependencies. Works out of the box.
4949

5050
---
5151

52-
## 📄 What This Library Does
52+
## What This Library Does
5353

5454
Behind the scenes, `formatify` uses:
5555

@@ -68,10 +68,10 @@ It produces:
6868

6969
---
7070

71-
## 🚀 Quick Example
71+
## Quick Example
7272

7373
```python
74-
from formatify_py.main import analyze_heterogeneous_timestamp_formats
74+
from formatify_py import analyze_heterogeneous_timestamp_formats
7575

7676
samples = [
7777
"2023-07-15T14:23:05Z",
@@ -90,19 +90,19 @@ for gid, group in results.items():
9090

9191
---
9292

93-
## 🔍 Features
93+
## Features
9494

95-
Auto-detect `strftime` format
96-
Handles ISO 8601, text months, UNIX epoch
97-
Infers year/month/day/hour/minute roles
98-
Groups mixed formats automatically
99-
Timezone-aware
100-
No dependencies
101-
Fast and customizable
95+
- Auto-detect `strftime` format
96+
- Handles ISO 8601, text months, UNIX epoch
97+
- Infers year/month/day/hour/minute roles
98+
- Groups mixed formats automatically
99+
- Timezone-aware
100+
- No dependencies
101+
- Fast and customizable
102102

103103
---
104104

105-
## 🧪 API
105+
## API
106106

107107
### Main Entry Point
108108

@@ -120,6 +120,9 @@ Returns a dictionary mapping group IDs to result dictionaries. Each result inclu
120120
* `detected_timezone`: parsed offset (if any)
121121
* `coverage`: fraction of total samples in this group
122122
* `accuracy`: percent of valid parses in group
123+
* `primary_delimiter`: most common delimiter used
124+
* `samples`: original timestamps in this group
125+
* `group_features`: detected structural features
123126

124127
### Lower-Level Functions
125128

@@ -131,7 +134,7 @@ infer_datetime_format_from_samples(samples: List[str]) -> Dict[str, Any]
131134

132135
---
133136

134-
## 🔊 Mixed Format Handling
137+
## Mixed Format Handling
135138

136139
`formatify` is designed to handle **real-world timestamp mess**. When your input includes a mix of styles — ISO, slashed, text-months, or epoch — it:
137140

@@ -143,20 +146,20 @@ This lets you feed in 3 formats or 30, and still get clean, grouped results.
143146

144147
---
145148

146-
## 👁️ Design Notes
149+
## Design Notes
147150

148151
Want to know how the internals work? Check out:
149152

150153
* [How Formatify Thinks About Timestamps](docs/design.md)
151154

152155
---
153156

154-
## 🔍 Dev Guide
157+
## Dev Guide
155158

156159
```bash
157160
# Clone the repo
158161
git clone https://github.com/PieceWiseProjects/formatify.git
159-
cd formatify_py
162+
cd formatify
160163

161164
# Set up environment
162165
uv pip install -e .[dev,test]
@@ -173,26 +176,26 @@ uv run python -m build
173176

174177
---
175178

176-
## 🚰 Contributing
179+
## Contributing
177180

178181
We're just getting started — contributions, issues, and ideas welcome!
179182

180183
1. Fork and branch: `git checkout -b feature/my-feature`
181184
2. Code and test
182185
3. Lint and push
183-
4. Open a pull request 💡
186+
4. Open a pull request
184187

185188
Follow our [Contributor Guidelines](https://www.contributor-covenant.org).
186189

187190
---
188191

189-
## 📜 License
192+
## License
190193

191194
MIT — see [LICENSE](LICENSE) for details.
192195

193196
---
194197

195-
## 🙌 Credits
198+
## Credits
196199

197200
Built and maintained by [Aalekh Roy](https://github.com/RoyAalekh)
198201
Part of the [PieceWiseProjects](https://github.com/PieceWiseProjects) initiative.

docs/design.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Just a quick note on how the core logic of `formatify` works, what design choice
44

55
---
66

7-
## 🧭 Why This Exists
7+
## Why This Exists
88

99
Parsing timestamps in the wild is messy:
1010

@@ -16,7 +16,7 @@ So we built something lean and explicit — especially useful for logs, ETL pipe
1616

1717
---
1818

19-
## What We Chose To Do
19+
## What We Chose To Do
2020

2121
* Use **regex + tokenization** instead of fuzzy parsing
2222
* Look at **how often parts change** (e.g., days vs. years) to guess roles
@@ -32,7 +32,7 @@ And we return:
3232

3333
---
3434

35-
## 💡 What We Didnt Use
35+
## What We Didn't Use
3636

3737
We intentionally skipped tools like:
3838

@@ -42,23 +42,23 @@ We intentionally skipped tools like:
4242

4343
---
4444

45-
## 📉 Tradeoffs
45+
## Tradeoffs
4646

47-
👍 Pros:
47+
Pros:
4848

4949
* Fast
5050
* Easy to reason about
5151
* Plays nice with your data pipeline
5252

53-
👎 Cons:
53+
Cons:
5454

5555
* No locale support (e.g. `1 Mars 2023`)
5656
* Not built for unstructured natural language
5757
* Not a fuzzy parser — it’s deterministic
5858

5959
---
6060

61-
## 🧱 How It All Fits Together
61+
## How It All Fits Together
6262

6363
```mermaid
6464
graph TD
@@ -74,7 +74,7 @@ graph TD
7474

7575
---
7676

77-
## 🧠 TL;DR
77+
## TL;DR
7878

7979
`formatify` is built to be:
8080

0 commit comments

Comments
 (0)