Skip to content

Fast profanity filtering tool for English, Spanish, Chinese, Turkish and more.

License

Notifications You must be signed in to change notification settings

viddexa/safetext

Repository files navigation

🤔 why safetext?

Fast profanity detection and filtering for 13 languages.

  • Multi-format Detection: Single words, phrases, and contextual profanity
  • Whitelisting: Exclude specific words from detection
  • Auto Language Detection: From text or subtitle files
  • Precise Filtering: Exact position tracking and custom censoring
  • Simple Integration: One-line setup with clean API

📦 installation

easily install safetext with pip:

pip install safetext

for development setup, see our scripts documentation.

🎯 quickstart

check and censor profanity

>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."

using whitelist

exclude specific words from profanity detection:

# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

automated language detection

  • from text:
>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'
  • from .srt (subtitle) file:
>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'

🌍 supported languages

safetext currently supports profanity detection in 13 languages:

Language ISO 639-1 Code Language Name
🇸🇦 ar Arabic
🇦🇿 az Azerbaijani
🇩🇪 de German
🇬🇧 en English
🇪🇸 es Spanish
🇮🇷 fa Persian (Farsi)
🇫🇷 fr French
🇮🇳 hi Hindi
🇯🇵 ja Japanese
🇵🇹 pt Portuguese
🇷🇺 ru Russian
🇹🇷 tr Turkish
🇨🇳 zh Chinese

🤝 contribute to safetext

join our mission in refining content moderation!

contribute by:

  • adding new languages: create a folder with the ISO 639-1 code and include a words.txt.
  • enhancing word lists: improve detection accuracy.
  • sharing feedback: your ideas can shape safetext.

see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.


🏆 contributors

meet our awesome contributors who make safetext better every day!


follow us for more!

LinkedInHugging FaceX