Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 3f2a62a124926cfeb840796f104a702878ac10e5
Author: Carsten Schnober <[email protected]>
Date:   Wed Sep 18 18:18:29 2024 +0200

    Update Gensim to >=4.3.3, <4.4.0 (#450)

    * Update Gensim to >=4.3.3, <4.4.0

    * update nltk as well

    ---------

    Co-authored-by: Dale Wahl <[email protected]>
    Co-authored-by: Sal Hagen <[email protected]>

commit fee2c8c08617094f28496963da282d2e2dddeab7
Merge: 3d94b666 f8e93eda
Author: sal-phd-desktop <[email protected]>
Date:   Wed Sep 18 18:11:19 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 3d94b666cedd0de4e0bee953cbf1d787fdc38854
Author: sal-phd-desktop <[email protected]>
Date:   Wed Sep 18 18:11:04 2024 +0200

    FINALLY remove 'News' from the front page, replace with 4CAT BlueSky updates and potential information about the specific server (to be set on config page)

commit f8e93edabe9013a2c1229caa4c454fab09620125
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 15:11:21 2024 +0200

    Simple extensions page in Control Panel

commit b5be128c7b8682fb233d962326d9118a61053165
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:08:13 2024 +0200

    Remove 'docs' directory

commit 1e2010af44817016c274c9ec9f7f9971deb57f66
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:07:38 2024 +0200

    Forgot TikTok and Douyin

commit c757dd51884e7ec9cf62ca1726feacab4b2283b7
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:01:31 2024 +0200

    Say 'zeeschuimer' instead of 'extension' to avoid confusion with 4CAT extensions

commit ee7f4345478f923541536c86a5b06246deae03f6
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:00:40 2024 +0200

    RIP Parler data source

commit 11300f2430b51887823b280405de4ded4f15ede1
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 11:21:37 2024 +0200

    Tuplestring

commit 547265240eba81ca0ad270cd3c536a2b1dcf512d
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 11:15:29 2024 +0200

    Pass user obj instead of str to ConfigWrapper in Processor

commit b21866d7900b5d20ed6ce61ee9aff50f3c0df910
Author: Stijn Peeters <[email protected]>
Date:   Tue Sep 17 17:45:01 2024 +0200

    Ensure request-aware config reader in user object when using config wrapper

commit bbe79e4b0fe870ccc36cab7bfe7963b28d1948e3
Author: Sal Hagen <[email protected]>
Date:   Tue Sep 17 15:12:46 2024 +0200

    Fix extension path walk for Windows

commit d6064beaf31a6a85b0e34ed4f8126eb4c4fc07e3
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 16 14:50:45 2024 +0200

    Allow tags that have no users

    Use case: tag-based frontend differentiation using X-4CAT-Config-Via-Proxy

commit b542ded6f976809ec88445e7b04f2c81b900188e
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 16 14:13:14 2024 +0200

    Trailing slash in query results list

commit a4bddae575b22a009925206a1337bdd89349e567
Author: Dale Wahl <[email protected]>
Date:   Mon Sep 16 13:57:23 2024 +0200

    4CAT Extension - easy(ier) adding of new datasources/processors that can be mainted seperately from 4CAT base code (#451)

    * domain only

    * fix reference

    * try and collect links with selenium

    * update column_filter to find multiple matches

    * fix up the normal url_scraper datasource

    * ensure all selenium links are strings for join

    * change output of url_scraper to ndjson with map_items

    * missed key/index change

    * update web archive to use json and map to 4CAT

    * fix no text found

    * and none on scraped_links

    * check key first

    * fix up web_archive error reporting

    * handle None type for error

    * record web archive "bad request"

    * add wait after redirect movement

    * increase waittime for redirects

    * add processor for trackers

    * dict to list for addition

    * allow both newline and comma seperated links

    * attempt to scrape iframes as seperate pages

    * Fixes for selenium scraper to work with config database

    * installation of packages, geckodriver, and firefox if selenium enabled

    * update install instructions

    * fix merge error

    * fix dropped function

    * have to be kidding me

    * add note; setup requires docker... need to think about IF this will ever
    be installed without Docker

    * seperate selenium class into wrapper and Search class so wrapper can be
    used in processors!

    * add screenshots; add firefox extension support

    * update selenium definitions

    * regex for extracting urls from strings

    * screenshots processor; extract urls from text and takes screenshots

    * Allow producing zip files from data sources

    * import time

    * pick better default

    * test screenshot datasource

    * validate all params

    * fix enable extension

    * haha break out of while loop

    * count my items

    * whoops, len() is important here

    * must be getting tired...

    * remove redundant logging

    * Eager loading for screenshots, viewport options, etc

    * Woops, wrong folder

    * Fix label shortening

    * Just 'queue' instead of 'search queue'

    * Yeah, make it headless

    * README -> DESCRIPTION

    * h1 -> h2

    * Actually just have no header

    * Use proper filename for downloaded files

    * Configure whether to offer pseudonymisation etc

    * Tweak descriptions

    * fix log missing data

    * add columns to post_topic_matrix

    * fix breadcrumb bug

    * Add top topics column

    * Fix selenium config install parameter (Docker uses this/manual would
    need to run install_selenium, well, manually)

    * this processor is slow; i thought it was broken long before it updated!

    * refactor detect_trackers as conversion processor not filter

    * add geckodriver executable to docker install

    * Auto-configure webdrivers if available in PATH

    * update screenshots to act as image-downloader and benefit from processors

    * fix is_compatible_with

    * Delete helper-scripts/migrate/migrate-1.30-1.31.py

    * fix embeddings is_compatible_with

    * fix up UI options for hashing and private

    * abstract was moved to lib

    * various fixes to selenium based datasources

    * processors not compatible with image datasets

    * update firefox extension handling

    * screenshots datasource fix get_options

    * rename screenshots processor to be detected as image dataset

    * add monthly and weekly frequencies to wayback machine datasource

    * wayback ds: fix fail if all attempts do not realize results; addion frequency options to options; add daily

    * add scroll down page to allow lazy loading for entire page screenshots

    * screenshots: adjust pause time so it can be used to force a wait for images to load

    I have not successfully come up with or found a way to wait for all images to load; document.readyState == 'complete' does not function in this way on certain sites including the wayback machine

    * hash URLs to create filenames

    * remove log

    * add setting to toggle display advanced options

    * add progress bars

    * web archive fix query validation

    * count subpages in progress

    * remove overwritten function

    * move http response to own column

    * special filenames

    * add timestamps to all screenshots

    * restart selenium on failure

    * new build have selenium

    * process urls after start (keep original query parameters)

    * undo default firefox

    * quick max

    * rename SeleniumScraper to SeleniumSearch

    todo: build SeleniumProcessor!

    * max number screenshots configurable

    * method to get url with error handling

    * use get_with_error_handling

    * d'oh, screenshot processor needs to quit selenium

    * update log to contain URL

    * Update scrolling to use Page down key if necessary

    * improve logs

    * update image_category_wall as screenshot datasource does not have category column; this is not ideal and ought to be solved in another way.

    Also, could I get categories from the metadata? That's... ugh.

    * no category, no processor

    * str errors

    * screenshots: dismiss alerts when checking ready state is complete

    * set screenshot timeout to 30 seconds

    * update gensim package

    * screenshots: move processor interrupt into attempts loop

    * if alert disappears before we can dismiss it...

    * selenium specific logger

    * do not switch window when no alert found on dismiss

    * extract wait for page to load to selenium class

    * improve descriptions of screenshot options

    * remove unused line

    * treat timeouts differently from other errors

    these are more likely due to an issue with the website in question

    * debug if requested

    * increase pause time

    * restart browser w/ PID

    * increase max_workers for selenium

    this is by individual worker class not for all selenium classes... so you can really crank them out if desired

    * quick fix restart by pid

    * avoid bad urls

    * missing bracket & attempt to fix-missing dependencies in Docker install

    * Allow dynamic form options in processors

    * Allow 'requires' on data source options as well

    * Handle list values with requires

    * basic processor for apple store; setup checks for additional requirements

    * fix is_4cat_class

    * show preview when no map_item

    * add google store datasource

    * Docker setup.py use extensions

    * Wider support for file upload in processors

    * Log file uploads in DMI service manager

    * add map_item methods and record more data per item

    need additional item data as map_item is staticmethod

    * update from master; merge conflicts

    * fix docker build context (ignore data files)

    * fix option requirements

    * apple store fix: list still tries to get query

    * apple & google stores fix up item mapping

    * missed merge error

    * minor fix

    * remove unused import

    * fix datasources w/ files frontend error

    * fix error w/ datasources having file option

    * better way to name docker volumes

    * update two other docker compose files

    * fix docker-compose ymls

    * minor bug: fix and add warning; fix no results fail

    * update apple field names to better match interface

    * update google store fieldnames and order

    * sneak in jinja logger if needed

    * fix fourcat.js handling checkboxes for dynamic settings

    * add new endpoint for app details to apple store

    * apple_store map new beta app data

    * add default lang/country

    * not all apps have advisories

    * revert so button works

    * add chart positions to beta map items

    * basic scheduler

    To-do
    - fix up and add options to scheduler view (e.g. delete/change)
    - add scheduler view to navigator
    - tie jobs to datasets? (either in scheduler view or, perhaps, filter dataset view)
    - more testing...

    * update scheduler view, add functions to update job interval

    * revert .env

    * working scheduler!

    * basic scheduler view w/ datasets

    * fix postgres tag

    * update job status in scheduled_jobs table

    * fix timestamp; end_date needed for last run check; add dataset label

    * improve scheduler view

    * remove dataset from scheduled_jobs table on delete

    * scheduler view order by last creation

    * scheduler views: separate scheduler list from scheduled dataset list

    * additional update from master fixes

    * apple_store map_items fix missing locales

    * add back depth for pagination

    * correct route

    * modify pagination to accept args

    * pagination fun

    * pagination: i hate testing on live servers...

    * ok ok need the pagination route

    * pagination: add route_args

    * fix up scheduler header

    * improve app store descriptions

    * add azure store

    * fix azure links

    * azure_store: add category search

    * azure fix type of config update timestamp

    OPTION_DATE does not appear correctly in settings and causes it to be written incorrectly

    * basic aws store

    * check if selenium available; get correct app_id

    * aws: implement pagination

    * add logging; wait for elements to load after next page; attempts to rework filter option collection

    * apple_store: handle invalid param error

    * fix filter_options

    * aws: fix filter option collection!

    * more merge

    * move new datasources and processors to extensions and modify setup.py and module loader to use the new locations

    * migrate.py to run extension "fourcat_install.py" files

    * formatting

    * remove extensions; add gitignore

    * excise scheduler merge

    * some additional cleanup from app_studies branch

    * allow nested datasources folders; ignore files in extensions main folder

    * allow extension install scripts to run pip if migrate.py has not

    * Remove unused URL functions we could use ural for

    * Take care of git commit hash tracking for extension processors

    * Get rid of unused path.versionfile config setting

    * Add extensions README

    * Squashed commit of the following:

    commit cd356f7a69d15e8ecc8efffc6d63a16368e62962
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 17:36:18 2024 +0200

        UI setting for 4CAT install ad in login

    commit 0945d8c0a11803a6bb411f15099d50fea25f10ab
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 17:32:55 2024 +0200

        UI setting for anonymisation controls

        Todo: make per-datasource

    commit 1a2562c2f9a368dbe0fc03264fb387e44313213b
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 15:53:27 2024 +0200

        Debug panel for HTTP headers in control panel

    commit 203314ec83fb631d985926a0b5c5c440cfaba9aa
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 15:53:17 2024 +0200

        Preview for HTML datasets

    commit 48c20c2ebac382bd41b92da4481ff7d832dc1538
    Author: Desktop Sal <[email protected]>
    Date:   Wed Sep 11 13:54:23 2024 +0200

        Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies

    commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3
    Author: Dale Wahl <[email protected]>
    Date:   Fri Sep 6 16:29:19 2024 +0200

        fix nltk where it matters

    commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d
    Author: Stijn Peeters <[email protected]>
    Date:   Tue Sep 3 12:05:14 2024 +0200

        Actually check progress in text annotator

    commit 693960f41b73e39eda0c2f23eb361c18bde632cd
    Author: Stijn Peeters <[email protected]>
    Date:   Mon Sep 2 18:03:18 2024 +0200

        Add processor for stormtrooper DMI service

    commit 6ae964aad492527bc5d016a00f870145aab6e1af
    Author: Stijn Peeters <[email protected]>
    Date:   Fri Aug 30 17:31:37 2024 +0200

        Fix reference to old stopwords list in neologisms preset

    * Fix Github links for extensions

    * Fix commit detection in extensions

    * Fix extension detection in module loader

    * Follow symlinks when loading extensions

    Probably not uncommon to have a checked out repo somewhere to then symlink into the extensions dir

    * Make queue message on create page more generic

    * Markdown in datasource option tooltips

    * Remove Spacy model from requirements

    * Add software_source to database SQL

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>
    Co-authored-by: Stijn Peeters <[email protected]>

commit cd356f7a69d15e8ecc8efffc6d63a16368e62962
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 17:36:18 2024 +0200

    UI setting for 4CAT install ad in login

commit 0945d8c0a11803a6bb411f15099d50fea25f10ab
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 17:32:55 2024 +0200

    UI setting for anonymisation controls

    Todo: make per-datasource

commit 1a2562c2f9a368dbe0fc03264fb387e44313213b
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 15:53:27 2024 +0200

    Debug panel for HTTP headers in control panel

commit 203314ec83fb631d985926a0b5c5c440cfaba9aa
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 15:53:17 2024 +0200

    Preview for HTML datasets

commit 48c20c2ebac382bd41b92da4481ff7d832dc1538
Author: Desktop Sal <[email protected]>
Date:   Wed Sep 11 13:54:23 2024 +0200

    Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies

commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3
Author: Dale Wahl <[email protected]>
Date:   Fri Sep 6 16:29:19 2024 +0200

    fix nltk where it matters

commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d
Author: Stijn Peeters <[email protected]>
Date:   Tue Sep 3 12:05:14 2024 +0200

    Actually check progress in text annotator

commit 693960f41b73e39eda0c2f23eb361c18bde632cd
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 2 18:03:18 2024 +0200

    Add processor for stormtrooper DMI service

commit 6ae964aad492527bc5d016a00f870145aab6e1af
Author: Stijn Peeters <[email protected]>
Date:   Fri Aug 30 17:31:37 2024 +0200

    Fix reference to old stopwords list in neologisms preset

commit 4ba872bef2968f7f8bf5831fd3a4f413420b36ed
Author: Dale Wahl <[email protected]>
Date:   Tue Aug 27 13:04:46 2024 +0200

    fix hatebase: default column option for OPTION_MULTI_SELECT must be list

commit e276033542f2d22e7f614f318a01d65114a21482
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Aug 21 12:53:10 2024 +0200

    Bump nltk from 3.6.7 to 3.9 (#447)

    Bumps [nltk](https://github.com/nltk/nltk) from 3.6.7 to 3.9.
    - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
    - [Commits](https://github.com/nltk/nltk/compare/3.6.7...3.9)

    ---
    updated-dependencies:
    - dependency-name: nltk
      dependency-type: direct:production
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 1d749c3cf83b130ba70bdb09174f382d6711a14b
Author: sal-phd-desktop <[email protected]>
Date:   Wed Aug 21 12:52:54 2024 +0200

    Set UTF-8 encoding when opening stop words (fixes Windows bug)

commit a03e5fd4252e7242563c291558606440256eb3d1
Author: Dale Wahl <[email protected]>
Date:   Mon Aug 19 14:19:21 2024 +0200

    remove duplicate line

commit aa07e8c13c2d59c6b699f78133036514659ee420
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 29 09:35:22 2024 +0200

    tweet import fix: author banner key missing when author has no banner

commit 32dac5d2ffb936210f12f5c725514fd25a0286f1
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 29 08:52:08 2024 +0200

    tell user when dataset is not found

    we could have a proper 404 page, but at least leave a message

commit 2c8c860fc5378113d1352016ac26ca761adecb32
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 22 17:41:00 2024 +0200

    telegram fix: reactions datastructure

commit 1c0bf5e580eb16d8a6f9afa415f9febce449a537
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 22 11:19:52 2024 +0200

    fix telegram: crawl_max_depth can be None if it is not enabled for a user

commit 3dfe7af292b33574a31630e3a0da10954ed87d0a
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 19 11:52:31 2024 +0200

    fix more config.get() magic

commit 2453182bcee6e54b396b762ab77b60b8a0893638
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 19 10:54:23 2024 +0200

    config_manager - fix `get_all` w/ one results (super rare edge); fix overwriting self.db in `with_db`

commit 6b9cb0b5479e6e64e09a49fa2ca9effe1c5a7415
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 15:20:49 2024 +0200

    add surf nginx init file

commit 5e984e13a08d9fba7d5806a7ef4e012ce7d57319
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 14:30:34 2024 +0200

    change port for surf

commit 2ce8c354e90f939a16dad3f0155fd7d79405c79e
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 12:54:11 2024 +0200

    use latest image on surf

commit 13ec0fd3f2bed86c3b2dff73014093a6a92fbfb5
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 12:46:59 2024 +0200

    update surf docker-compose.yml

     this may require a new release

commit 78698f6ac1b22b1154d31f69543ba7b266d33191
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 10:34:56 2024 +0200

    clip: handle new and old format

commit eb7693780cb191403f107817ca30d90373929bf0
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 16 14:27:08 2024 +0200

    DMI SM updates to use status endpoint w/ database records; run on CPU if no GPU enabled

commit d2a787e2c1559417bb5401f3208c82954052504f
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 15:58:06 2024 +0200

    Require most recent Telethon version

commit 346150bd9cc96ac099abd4d15fa3de39bd65e9d1
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 15:57:55 2024 +0200

    Catch UPDATE_APP_TO_LOGIN in Telegram

commit 04acc06e95098d7e2f9b4af404447c9cfaee5b99
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 11:27:30 2024 +0200

    Unbreak Twitter error handling

commit e9b5232a963be02c2e86dabacb607b2315a4e0e6
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 13:27:15 2024 +0200

    Ensure str type when trying to extract video URLs from a field

commit d69dd6f337cac05ed31c05334890679976a1e6de
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 12:31:14 2024 +0200

    Make CSV column mapping params look nicer on result page

commit 9bd9da568f593085a8d54744836e3290a75b51a7
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 12:22:03 2024 +0200

    Add "empty" and "current timestamp" as options to CSV mapping

commit 0b574571952a206904440faf8601ddf95ab42b24
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 11 16:59:56 2024 +0200

    image_wall: backup fit method

commit eeb1ddeb7ca85b6802dfed3c74d1352062383d50
Merge: 2504c37b 43239467
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 11 16:47:45 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 43239467db046eea5eb5268f91d1b63a1042238d
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 11 12:08:08 2024 +0200

    fix processor more button

    would only show top level analysis if not logged in

commit d6ab2b0783f8e40ecd8fadbc2abccffa6f093e39
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 9 15:35:25 2024 +0200

    search_gab - use MappedItem

commit 2504c37b67ff6f19720b44d8bb6054b1c3d5a155
Author: Stijn Peeters <[email protected]>
Date:   Sat Jul 6 17:51:22 2024 +0200

    Fix multiline spacing in multi select list

commit fea66ce38be0717da6c1f847e7124f7069c096e2
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 13:15:45 2024 +0200

    use processor media_type if dataset does not have media_type; set default media_type for downloaders

commit d41fa34514e8177efdac7e64a31f2ee75c7d1652
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:57:18 2024 +0200

    video_hasher: handle no metadata file

commit 2820dcecc36ed4705a2776064d387ff7ed14e84f
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:50:09 2024 +0200

    num_rows not num_items()

commit fb09162db902fa22fdf2d7a3ed171ce1489bd92f
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:44:03 2024 +0200

    Google vision API returning 400s; properly log and record processed entries; google networks should not run on empty datasets

commit ebf39d8262d199895aedc4f7fa275c5685e58563
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:28:13 2024 +0200

    fix image_category_wall

    whoops, cleared categories and post_values after filling them!

commit 1ad9ec2c2e76604793ec37584c051f116af2fdab
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 12:03:54 2024 +0200

    fsdfdsgd sorry

commit c7254c08a477c6cdc8497507e8452c3eff7101c9
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 12:01:21 2024 +0200

    Fix razdel versioning

commit b9a327abe99f2d9ede4f2747f34f20d1dc6803cb
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 11:57:47 2024 +0200

    Reorganise tokeniser, stopwords

commit fb13bc483af9ba0d677ee35fd045bf36ab1cddf7
Merge: 0b745692 e3046496
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 11:56:08 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit e30464964262870c54c73f65a3bce630d6576f45
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 10:51:53 2024 +0200

    media_upload allow setting for max_form_part and warn users of failure above certain number of files

commit e4f982b4550b352a5d1a131abd78d52e6c196e48
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 09:50:49 2024 +0200

    Update media_import help text; looks like failure happens somewhere between 600-1000 files due to Flask request size limits

commit 0b74569280f8f87376a964a6b160ea1993cb3354
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 4 17:55:36 2024 +0200

    Add razdel as option for Russian tokenisation

commit 9f15a2b8d666c3b6fddeb151b7c424cb44df18a6
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 17:13:15 2024 +0200

    remove the log

commit ffcb6a4239075ba190fb534b25b89507e09e5f56
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 17:12:43 2024 +0200

    Inform user if too many files are uploaded

    I do not understand why this is appearing. app.config['MAX_CONTENT_LENGTH'] is set to None. Problem persists in Flask alone (i.e., does not appear to be Gunicorn/Nginx/Apache).

commit 9cad12dd6f64a63c48d3b5b304b5c7d9d1a6ddb7
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 4 15:09:42 2024 +0200

    Bump version

commit aad94f393de77cc9d4f578e1f5be66a3601a4c90
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 10:51:01 2024 +0200

    Update setup.py to ensure videohash updates

commit d9154a6f9c46a5c793909b88da751bc71d6f759f
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:45:26 2024 +0200

    clip: categorizing requires categories...

    seriously, guys?

commit 0af9a5ec49bd2bcfbb87bda33976c65683f68777
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:31:49 2024 +0200

    blip2: fix no metadata file found (uploads...)

commit d695053f440bd938a57f06adea7b9c732ecf30d7
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:25:26 2024 +0200

    cat_vis_wall - use str as category type if mixed

    i.e., use floats as string categories

commit bcb914076760ea1fb0e277cdcd1782ffa101b535
Author: Sal Hagen <[email protected]>
Date:   Tue Jul 2 16:06:43 2024 +0200

    Add Twitter author profile pic and banner URLs

commit 1b3b02f826578e8f702ea84a27c8ced7b1fab345
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:42:50 2024 +0200

    add migrate.py log file in Docker

commit 2aaa972e6888743fc329d721c37fa626cf2eeae3
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:42:22 2024 +0200

    add necessary pip packages for upgrade in Docker environment; add error logging and save to file for trouble shooting

commit 18b8a53c01b334e0f70610b1305d380b25dbe9c6
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:41:36 2024 +0200

    update Dockerfile to keep build environment

    useful for interactive upgrade

commit 7b224b9b798c9aaf956b5b618b98d742c4a2e7cd
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:41:12 2024 +0200

    remove docker-compose.yml versions

commit acf5de0ed02e144b920a80abfdfa35986dd0ed4c
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 1 17:38:32 2024 +0200

    Better issues.md, footer link

commit 1953ca3895656ca9a12d2657e58019795ae64b3a
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 1 12:00:07 2024 +0200

    FIX: get_key() is more of a creating of a key then general getting of a key...

commit 12289bb5c766d1af23799ff11278b46b48fc2841
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 1 11:37:06 2024 +0200

    .metadata.json may not have top_parent via Media Uploader

    This may exist in other processors if a proper check is not in place; will need to review

commit 25f4ed65ec2c32298a90490cf51037a7ea2d0bf9
Author: Dale Wahl <[email protected]>
Date:   Tue Jun 25 14:43:40 2024 +0200

    Media upload datasource! (#419)

    * basic changes to allow files box

    * basic imports, yay!

    * video_scene_timelines to work on video imports!

    * add is_compatible_with checks to processors that cannot run on new media top_datasets

    * more is_compatible fixes

    * necessary function for checking media_types

    * enable more processors on media datasets

    * consolidate user_input file type

    * detect mimetype from filename

    best I can do without downloading all the files first.

    * handle zip archives; allow log and metadata files

    * do not count metadata or log files in num_files

    * move machine learning processors so they can be imported elsewhere

    * audio_to_text datasource

    * When validating zip file uploads, send list of file attributes instead of the first 128K of the zip file

    * Check type of files in zip when uploading media

    * Skip useless files when uploading media as zip

    * check multiple zip types in JS

    * js !=== python

    * fix media_type for loose file imports; fix extension for audio_to_text preset; fix merge for some processors w/ media_type

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>

commit 4ce689bdc3e441a7adf85883ddcda6bae0525ed9
Author: Stijn Peeters <[email protected]>
Date:   Mon Jun 24 11:58:50 2024 +0200

    Avoid KeyError

commit 155522d0817d19ac7b6b0b0164242156d6f7443a
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 20 15:58:21 2024 +0200

    add generated images to image wall w/ text visual

commit eecde519eab1208eeb6ee53c2d8febff7fb8febf
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 20 15:57:56 2024 +0200

    allow users to NOT generate all images from prompts

commit d0b9574093a109997e63b1062b2bdd8e71300a29
Author: Stijn Peeters <[email protected]>
Date:   Wed Jun 19 16:28:26 2024 +0200

    ...don't mangle URLs in preview links

commit c105e368a521ec54ae717bb9eb2fe9fae66cf6e8
Merge: 0028a999 8d4f99b2
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 16:25:36 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 0028a9994d698611dd8b546b9b3bccbeec30b74f
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 16:25:12 2024 +0200

    add followups to processors

commit 8d4f99b22e0308606c7f713ef704dfa939e85247
Author: Stijn Peeters <[email protected]>
Date:   Wed Jun 19 16:17:22 2024 +0200

    More flexible URL linking in CSV preview

commit f4f8e6621bd6f2504dc3afc2078280bf5edb6444
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 13:54:00 2024 +0200

    tokeniser fix: use default lang for word_tokenize if language is 'other'

commit 127472e91d8e510f3de2a9cc4a87be6cf2d0deaa
Author: Stijn Peeters <[email protected]>
Date:   Tue Jun 18 16:45:01 2024 +0200

    Better log messages for Telegram data source

commit e8714b6fba72e00c690a8d643d8dc54d2250c94a
Author: Stijn Peeters <[email protected]>
Date:   Mon Jun 17 17:42:21 2024 +0200

    Add 'crawl' feature to Telegram data source

    Fixes #321 (though might need a bit more testing)

commit 25fded7b596097f7916e1793f1841bae2b63d453
Merge: d67cf440 b10e3bb8
Author: sal-phd-desktop <[email protected]>
Date:   Fri Jun 14 16:23:02 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit d67cf440730ea1d4e124c76a4c21d65b56f39c68
Author: sal-phd-desktop <[email protected]>
Date:   Fri Jun 14 16:22:59 2024 +0200

    Fix export 4chan script and remove some unecessary code

commit b10e3bb8f0c8a67aa5fdbba1962301d8acdf625c
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 15:14:06 2024 +0200

    video_hasher prefix: fix extension type

commit ba565cdaa2ebeecf23fd60889d546c76b9ea5eb1
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 14:53:13 2024 +0200

    video_hasher: fix to work with Pillow updates; add max amount videos

commit 90da5d231eff6a4249bef5468fcdbf1ebcf9247a
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 10:25:24 2024 +0200

    image_cat_wall fix the fix

commit a8b943d8e2c5471f82ea0442e2659d84fe8d9760
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 12 13:29:41 2024 +0200

    add OCR processor to image w/ text visualization

commit e7e636b6b89b6163fa6976e67edba68e7d75b7ac
Author: Dale Wahl <[email protected]>
Date:   Tue Jun 11 15:23:12 2024 +0200

    add image_wall_w_text to follow on BLIP captions

commit f74b97827f0465baf8483040471a77e4654e70b1
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 6 11:05:25 2024 +0200

    image_category_wall: allow multiple images per item/post

commit e3c9ea57d46b32ba47b00a6047a278ddd530adc1
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 16:27:50 2024 +0200

    image_category_wall convert None to str for category

commit 00874576c354235f4655f1d433ec4382010e18e3
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 14:54:51 2024 +0200

    image_category_wall fix float categories

commit e0c55a8ae132bedef5da27ecbbb9489a094d454c
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 12:51:42 2024 +0200

    download_images fix divide by zero when user can download all

commit 3580fc9450501262badb8e61ef4b4df4b4c54322
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 12:51:24 2024 +0200

    image_category_wall remove 'max' when user can use all images

commit f2145bdeff1d68e46cdd3521ecbb61573f01a2f2
Author: Dale Wahl <[email protected]>
Date:   Wed May 29 17:59:23 2024 +0200

    rank_attributes: option to count missing data or blanks

commit 01e7ab9677a75181bbedc62fa00e636ce2b17c18
Author: Dale Wahl <[email protected]>
Date:   Wed May 29 16:53:57 2024 +0200

    fix missing field strategy so default_stategy not overwritten on second loop

    default_stategy would be set to correctly to the callable, but overwritten on second loop (and map_missing is a dictionary at that point).

commit 097f838af1f5f2748578dd9072eb9e3a8b3a7057
Author: Dale Wahl <[email protected]>
Date:   Tue May 28 12:16:08 2024 +0200

    add log_level arg to 4cat-daemon.py

    I've been using this forever and don't know why I haven't commited it

commit fd3ac238e60f052889d99c71588170570a384900
Author: Dale Wahl <[email protected]>
Date:   Tue May 28 10:10:56 2024 +0200

    google & clarifai to csv had identical "type"

    possibly caused issue w/ preset

commit 1b9965d40aa33035a73f685c13a1ab50cc877f78
Author: Stijn Peeters <[email protected]>
Date:   Mon May 27 15:54:20 2024 +0200

    Ensure file cleanup worker always exists

commit 0e0917f2232e240df3412fd4df51cf0be19248b5
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:36:22 2024 +0200

    Also update Spacy model versions...

commit f40128213529d154cfb77afa7aa67a72d5bb640f
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:32:35 2024 +0200

    *Actually* remove typing_extensions dependency

    ???

commit ba3d83b824c5fb6fcb0aec5e1c36b35070d6e5d9
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:30:08 2024 +0200

    Update minimum Pillow dependency version

commit 1c3485648bf2a911052eeeae4f293f303a944aec
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:27:27 2024 +0200

    Do not require typing_extensions explicitly

    This was required to ensure Spacy could load - looks like Spacy has since been updated to work with newer versions of typing_extensions as well

commit 3828de83ba123254463a904392f24daec626c136
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:02:04 2024 +0200

    Bump version

commit 8f0d098107a4bbc9d55cc6048f7a38f1d1891a32
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:01:28 2024 +0200

    Require non-broken version of emoji library

commit 4b2ad805fcc99a83e46732fc991d98d78ef06c6c
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 13:11:03 2024 +0200

    Show worker progress in control panel if available

commit 9144d4503f46108437616d6bc0cf4fde74df3aca
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 11:07:41 2024 +0200

    Bump version

commit 807ab77101d197ec897640480a2140439d570c05
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 21:57:11 2024 +0200

    Fix Instagram upload with missing media URL

commit d0b4840fd465b6d21657c3d50f9291ac911b6082
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:35:04 2024 +0200

    Comma comma comma

commit 7fd2e14c9505d0ed1ac77dc09c24f766ea61ee6c
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:25:26 2024 +0200

    Fix progress indicator for scene extractor

commit 661c42c2d083da7004335b0e14910935c3d392f6
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:12:21 2024 +0200

    Don't crash video hasher non non-str item IDs

commit 1f280321cdde27a9909885fa2f64dbeffa549fb1
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:09:53 2024 +0200

    Do not crash timelines processor when metadata has unexpected format

commit 572d03f1f368f0ad5f47e705a119b37646148d1d
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:09:30 2024 +0200

    More efficient video frame extractor

commit 1b51d224ca544d7e2913238adbff2049412bc41e
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:04:27 2024 +0200

    Fix crash in video stack processor with ffmpeg < 5.1

commit ddc73cb2e2f0985e64f84ca86bc167fa9e9dc81a
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:03:48 2024 +0200

    Helper function for determining ffmpeg version

commit ef9dd482b2258c428584997dc661156f63f68b91
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 12:14:58 2024 +0200

    Allow absence of articleComponent in LinkedIn posts

commit 060f2cd7f922e7fae337b0697f7c477442d21ef1
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 12:12:54 2024 +0200

    Cast post IDs to string when mapping video scenes

commit ab34c415c9ada23763b45676639ce3e80a34f594
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 11:46:39 2024 +0200

    Twitter -> X/Twitter

commit de6d97554ccb68375979e5ff09c7e65d8d70a6cd
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 11:45:19 2024 +0200

    Colleges -> Collages

commit 30365580dc59b4d95e8a62d1b3c666bef60ce7e8
Author: Stijn Peeters <[email protected]>
Date:   Tue May 21 15:41:55 2024 +0200

    Explicit disconnect after Telegram image download

commit 5727ff7230db42463a824f45d63f0b8343caac14
Author: Stijn Peeters <[email protected]>
Date:   Tue May 21 14:05:50 2024 +0200

    Catch TimedOutError while downloading Telegram images

commit e0e06686e78976f971aac620267d7e009eaaadff
Author: Sal Hagen <[email protected]>
Date:   Mon May 13 13:01:42 2024 +0200

    Typo in LinkedIn search

commit 51e58dde6ca21278a80f252a8c22dc83d87ace1f
Author: Dale Wahl <[email protected]>
Date:   Tue May 7 13:10:43 2024 +0200

    text_from_image: fix metadata missing (indent issue)

commit c1f8ecc1674375bba2b2e38cb29c9d4d44098f0a
Author: Dale Wahl <[email protected]>
Date:   Tue May 7 09:45:25 2024 +0200

    text_from_image fix: ensure metadata success before attempting to update original

commit 72dbf80db71499c59133e1128205b756d240b300
Merge: d7561625 baacc86b
Author: Stijn Peeters <[email protected]>
Date:   Fri May 3 13:14:08 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit d7561625b127573fbb0332fbb713be6a3cb3d953
Author: Stijn Peeters <[email protected]>
Date:   Fri May 3 13:14:03 2024 +0200

    Comments without replies don't always have reply_comment_total

commit baacc86b269612b4b0956345f8b9fa902df1b61f
Author: Dale Wahl <[email protected]>
Date:   Fri May 3 12:01:22 2024 +0200

    DSM fix and simplify GPU mem check

commit 9b662e9f9b4f4ce194608c8e20a8fc50bc6d9ae3
Author: Parker-Kasiewicz <[email protected]>
Date:   Thu May 2 00:53:45 2024 -0700

    Adding Gab as a Data Source! (#401)

    * Can successfully import gab data, although
    can't tell if formatting is right becuase
    waiting on queued requests.

    * Version w/ different item types

    * Ingest Gab posts from Zeeschuimer

    * Small fix for merge conflicts (whoops)

    * Gab processing logic transferred from Zeeschuimer

    * fixing small errors for Gab data source

    * basic processing for truth social from Zeeschuimer

    ---------

    Co-authored-by: Dale Wahl <[email protected]>

commit 3ecb8fd9c27aee4c457f03516794c6c4eac19c09
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:51:36 2024 +0200

    Fix duplicate line in views_admin.py

commit 8b66ae7e467913f8e7571cf4b45493f63804266f
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:49:54 2024 +0200

    Allow processors to define which fields should be pseudonymised

commit c973750c8cabb8698704c5997903e92d1de866d2
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:15:32 2024 +0200

    Allow auto-queue of pseudonymisation after import

commit 49ad9f0ff785fd44ae494755b785c7fdf7c9cf15
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:08:35 2024 +0200

    Get rid of redundant and buggy next/copy_to implementation in Search class

commit 106d3659e2fda89867d3a4f587c1c1addfaff2f7
Author: Dale Wahl <[email protected]>
Date:   Wed May 1 16:14:03 2024 +0200

    use current branch in settings

commit 60bef4157d807f7c01ef3b425295244e91919f31
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 11:04:07 2024 +0200

    Nicer code

commit 4182c436e4fb5109c5e041dc729f77a58d877889
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 16:19:36 2024 +0200

    Always shut down API worker only after everything else has been shut down

commit e685108b3cbe5f005ce2df21906267071ad8118e
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 16:12:42 2024 +0200

    Properly interrupt expiration worker when asked

commit 27a568eca7f2f3742223fef6285eaf80583e0fc4
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 13:40:50 2024 +0200

    Allow floats-as-strings as timestamps when importing CSV

commit 2d2bbb9fdb9b426b8f4a80782f04257721a97f2e
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 30 13:05:07 2024 +0200

    douyin: add consistency to map_item stats

commit 289aa342c9912aceeca35887c079c72aa6ffbf52
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 29 15:26:38 2024 +0200

    fix collection data in Douyin to handle $undefined

commit 5b9b23fb1696bc1b69e1d902c0a2ad4b7d168984
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 29 13:00:03 2024 +0200

    add scipy requirement to make compatible with gensim

    https://stackoverflow.com/questions/78279136/importerror-cannot-import-name-triu-from-scipy-linalg-gensim

commit 7eab746e944f1ababe3dcd6a5d25387a64c2237d
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 29 12:00:09 2024 +0200

    stupid, stupid, stupid

commit 90577982ac05019a7ac76818a62f91e84dd65902
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 29 11:56:22 2024 +0200

    Fix leftover iterate_mapped_items

commit 57dbdf74c49c34c05784debb9f7e258da7ae7d54
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 15:26:39 2024 +0200

    Woops

commit f11760d2c13e817e23cfa5e26b24f74cf817f65e
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 15:26:04 2024 +0200

    Update list of supported platforms in readme

commit 760ff1cdeb006f70acaa00ded82fb3cbc7617c9d
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 12:13:28 2024 +0200

    Bump version

commit 1fd78b2362840299e80f5540c9fedc1be3b06da1
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:58:24 2024 +0200

    Use MissingMappedField for Douyin fields undefined in the source data

commit 6918baeabc7a08b6a63495c5d38c86b2c88bca44
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:31:11 2024 +0200

    Fix Douyin mapping failure if cellRoom is $undefined

commit aad6208167c07686348234daff4dcf9cd036f5a5
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:30:53 2024 +0200

    Better error when trying to import data for unknown datasource

commit 43c6ed646994111188bde66d5bcfe4ab602e8512
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:30:31 2024 +0200

    Fix Twitter mapping on URLs that cannot be expanded

commit 91c3da176fad90ba16871fa8892fac5a0df13785
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:12:54 2024 +0200

    Safe cast to int in CrowdTangle import

commit 765f29e9232afdf284ab1667b0f371951e0bf2f4
Author: Stijn Peeters <[email protected]>
Date:   Wed Apr 24 12:37:02 2024 +0200

    Fix erroneous shell command in front-end restart trigger

commit c99fdd9eca8f5925d93375cac846e8b7633194fb
Merge: 342a4037 bc1deddf
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 12:29:35 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 342a4037411e7ccaa50b25a4686434bec39e2568
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 12:29:32 2024 +0200

    Enable TikTok comment and Gab import by default

commit bc1deddf57aa5049fb79622c4309fb7051d77bdb
Merge: 537d7645 3c644f01
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 23 12:16:37 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 537d76456e2826e8c4dd7026ec5b2d436370fad8
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 23 12:14:46 2024 +0200

    do the todo: fix column_filter to match exact/contains with int

commit 3c644f01baeca34e712d36efdf5c77ccd3ef7a06
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 11:16:07 2024 +0200

    Don't crash on empty URLs in dataset merge

commit f1574c26e2e3bdc40cc04bb8193cf6d3fa14792b
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 18 12:08:55 2024 +0200

    fix: do not fail when no processor exists

    weird! failed on a dataset `type="custom-search"` which was created by an import script w/ no processor. Also likely would make deprecated processors fail.
    500 server error:
    ```
    File "/opt/4cat/common/lib/dataset.py", line 800, in get_columns
         return self.get_item_keys(processor=self.get_own_processor())
       File "/opt/4cat/common/lib/dataset.py", line 405, in get_item_keys
         keys = list(items.__next__().keys())
       File "/opt/4cat/common/lib/dataset.py", line 337, in iterate_items
         if own_processor.map_item_method_available(dataset=self):
     AttributeError: 'NoneType' object has no attribute 'map_item_method_available'
    ```

commit 50a4434a37d71af6a9470c7fc4a236b043cbfb4d
Author: Stijn Peeters <[email protected]>
Date:   Wed Apr 17 14:30:58 2024 +0200

    Add "TikTok comments" data source

commit c43e76daae3c2e6ecdb218ee749315b985eccca4
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 16 17:59:25 2024 +0200

    Allow notifications per tag

commit 36984104e674e8577756bfc3fdd5c72f6569d9e1
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 16 17:25:38 2024 +0200

    fix: pass dataset to get_options when queuing processors

commit 59cb19a3c88f7f4a4ac02d0b7a891afde50ea069
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 16 10:55:29 2024 +0200

    fix: dicts are shared in classes & you cannot delete a key more than once

    randomly found this; probably as no one else has reddit enabled!

commit 3ec9c6ea471bcdbe9fb1caad1e5fe1502a705444
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 15 13:22:19 2024 +0200

    fix results page error when dataset was being created; do not check for resultspage updates when user not focused on page

commit db05ae5e565248e865e67b8ea60e6653357bb1f4
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 15 11:27:33 2024 +0200

    on import file, differentiate between missing field(s) and unable to map item

commit 940bac72c7e53bec9e136867c13e2a0a355961a4
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:57:48 2024 +0200

    Case-insensitive username/note matching in user list

commit d0f34245bd07b5ad2fd3e90754ef0264ffc350a9
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:29:12 2024 +0200

    Only determine settings tab name in one place

commit 9f69d7bc0bbb657be1e725d5fb3fe350b7205bff
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:20:34 2024 +0200

    git != github

commit 9b4981d8c7358f31ed65d9f161d556e578389801
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:56:04 2024 +0200

    Fix issues with user tags

    Fix number of users in tag overview; allow filtering by user tags on user list; don't delete all user tags when deleting one

commit 9e8ccd3a78765acdfd2005eaa215dc0dc07266e0
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:32:45 2024 +0200

    Do not hide all non-hidden child processors

    lol

commit 3f15410af3a278f5644f41f49e25498a1fac3c76
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:23:52 2024 +0200

    Disable standard video downloader for Telegram

commit 94c814b9cab2ae2be10d5c5d3f6cfe20898e349c
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:14:16 2024 +0200

    Telegram video downloader processor

commit d36254a188947fff507e8df59f793e98b3be1570
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:14:04 2024 +0200

    Better styling for 4CAT settings, alphabetic order, submenus

commit 808300fa109f306a921f2048b2cf4b6dafc4ba5f
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 11 14:44:32 2024 +0200

    Fix multiselect in UI

commit 131a0eca0ad514b1ee57803e5c560ab0e56de42d
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 8 18:28:04 2024 +0200

    Do not attempt to load crashed file as module in Slack webhook. Fixes #422 (hopefully)

commit 6d8cb067bc12f8be68749f74a7291e0849494225
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:43:58 2024 +0200

    Allow comma-separated list when adding new dataset owners

commit 2612aea49f63c37ac691cc89c553c764ead2344f
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:40:04 2024 +0200

    Include number of users with tag on tag page

commit 39f2ec40faa3b8493bd5525279aeaeb2e4f586e0
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:26:02 2024 +0200

    Fix confirmation before deleting user tag

commit b00a410a3441e7f2a9d73a9f2dfb0f4ef70ea8a5
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:25:01 2024 +0200

    Add link to users with tag on tag admin page

commit 3ef3e5ec9adbd8ddd128ce2b3f8fa3b1de1297e3
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 18:49:25 2024 +0200

    Give filtered datasets a more sensible label, based on source dataset

commit 0d5870b78fb73cb58231736cc8a2efbb0b3cd88a
Author: Dale Wahl <[email protected]>
Date:   Fri Apr 5 17:40:57 2024 +0200

    update iterate methods (#418)

    * working to make iterate_mapped_item primary method used by processors and elsewhere in 4CAT; iterate_item method only internally (and provide item directly as is from file) with iterate_mapped_object as intermediate method to use map_missing method and handle missing values as well as warn if needed

    * switch from iterate_items to iterate_mapped_items; careful attention to item_to_yield allowing a choice of the original item, the mapped item, or both

    * revert some unecessary renaming

    * fix annotations bug...

    this fixes the bug, but i noticed that the notations saved in the database do not have the correct post IDs.

    * Introduce DatasetItem class and simplify iterate_items

    * Don't crash when no item mapper

    * ...actually commit the DatasetItem class

    * Fix typos in comment

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>
    Co-authored-by: Sal Hagen <[email protected]>

commit 17b77351c51ace21b7057276bbae9da2643a3fc4
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 16:20:19 2024 +0200

    Allow dynamic form options in processors (#397)

    * Allow dynamic form options in processors

    * Allow 'requires' on data source options as well

    * Handle list values with requires

    * Wider support for file upload in processors

    * Log file uploads in DMI service manager

    * fix error w/ datasources having file option

    * fix fourcat.js use of checkboxes for dynamic settings

    * Fix faulty toggleButton targeting

    ---------

    Co-authored-by: Dale Wahl <[email protected]>

commit 693fcedc93ee4476a60d0e0876e688f82a8526fa
Author: Dale Wahl <[email protected]>
Date:   Fri Apr 5 15:59:10 2024 +0200

    Add method to processors to toggle display in UI (#411)

    * add ui_only parameter to DataSet.get_available_processors() and BasicProcessor.display_in_ui()

    Allow using `display_in_ui` to hide processors from UI but allow them to be queued either via API or presets. This avoids issue of is_compatible_with() having to be used to hide processors with sometimes ill effects.

    * keep same data structure....

    * don't delete twice; it's redundant... and raises an error

    * Rename arguments/properties

    * Exclude hidden processors in top level view

    * fix logic

    * Exclude in child template as well

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>

commit 3cd146c2908da6b3a06a0c1511bf042c4223af0f
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 16:41:39 2024 +0200

    fix: whoops remove debug

commit daa7291e813e62fed4600a4acb8430004836cb86
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 15:16:30 2024 +0200

    CSV preview add hyperlinks if "url" or "link" in column header

commit 5f2d6e65bad4f71b2c3cc75d2cdab76f15671d4c
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 15:16:01 2024 +0200

    blip2 processor to work w/ DMI Service Manager

commit fe881dec18778d99ac4a0f60ca40a1f43fdb1689
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 09:53:30 2024 +0200

    catch AttributeError on slackhook if unable to read file

    ever vigilant against a lack of flavour...

commit 2808256b1fabf2e6e8a5a94aad98af60c50fb7b0
Merge: 14123847 eb474640
Author: Dale Wahl <[email protected]>
Date:   Wed Apr 3 17:28:40 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 14123847b5852bf0e7c84fced6c2380165ec93f6
Author: Dale Wahl <[email protected]>
Date:   Wed Apr 3 17:28:38 2024 +0200

    staging_areas should not be made for completed datasets (else they may be deleted prematurely)

commit eb474640559ee3e914d9c95adb60be09b906f1d6
Merge: bbdf2ab9 3f8b285c
Author: sal-phd-desktop <[email protected]>
Date:   Wed Apr 3 16:50:54 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit bbdf2ab9b4292c14911ac01b481c829defa85e5c
Author: sal-phd-desktop <[email protected]>
Date:   Wed Apr 3 16:50:36 2024 +0200

    Helper script to export the 'classic' 4CAT 4chan data

commit 3f8b285c44c33a3ce08e885889b311bc454a70ea
Merge: 8f40f3f5 f7cc5b8d
Author: Sal Hagen <[email protected]>
Date:   Wed Apr 3 12:12:17 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 8f40f3f5222a63e93f46eb3b57791d10060a0cc8
Author: Sal Hagen <[email protected]>
Date:   Wed Apr 3 12:12:13 2024 +0200

    Tumblr search typo

commit f7cc5b8d012dec3d8e0c8847ae16c662e82040b5
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 2 12:32:51 2024 +0200

    More/less flavour in restart worker

commit 073587efc581adca0608988573ac83ea8b0c93d0
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 14:15:27 2024 +0100

    create favicon.ico (remove from repo)

    be sure to keep webtool/static/img/favicon/favicon-bw.ico as basis

commit 28d733d56204231f4089660ff61282174aac7aed
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 09:44:45 2024 +0100

    add allow_access_request check to request-password page

    clicking it would only return the user to the login page anyway, but better not even show it

commit 1f2cb77e3cb0fc9b5403da52aaa925b33089d18f
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 09:37:51 2024 +0100

    fix can_request_access to use 4cat.allow_access_request option

commit 0d66f11d3619af798d5acc41dbf4fe118b7ddad8
Merge: 25825383 05b3fc07
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 17:54:48 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 2582538303e31470ed6bf8a01645f7b45af15e5d
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 17:54:45 2024 +0100

    More permissive timeout for pixplot

commit 05b3fc0771ded10dc55db799e8f47e42add08d43
Author: Dale Wahl <[email protected]>
Date:   Tue Mar 26 14:01:59 2024 +0100

    remove redundant call of Path

commit e4a93442efb84d73d6a4c9af9bc46a8f3e3fdda2
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 11:52:09 2024 +0100

    Include column with link description in Telegram mapping

commit 876f4a4b6df51ec4b30a048c32191438b6778f90
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 25 14:48:47 2024 +0100

    douyin handle image posts

commit 81ad61baabaf965b1c848f55a80c23bd3e1a9000
Author: Stijn Peeters <[email protected]>
Date:   Mon Mar 25 08:01:44 2024 +0100

    Accept non-numeric IDs in Telegram image downloader

commit a8b36dc5682df7c16e25474ea8fdbfc4f12f9d46
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 23:15:51 2024 +0100

    Ensure unique IDs for Telegram datasets

commit 4a3e9ffee072c4d3efb7bfd8744369b46f19eef2
Merge: 0c119130 d749237e
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 22:56:59 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 0c11913049aabb5a83ffe26d58bdf17affdbc0b9
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:09:10 2024 +0100

    Better string formatting in Telegram image downloader

commit 8a7da5317defdafb5bdbf74dcbeb68e464fa21f4
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:06:06 2024 +0100

    Add 'link thumbnails' option to Telegram image downloader

commit a0baae17d8f11e4cae7cc261f8d406b1b1ce628a
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:24 2024 +0100

    Add 'Fetch URL metadata' processor

commit b9a0668f35c6d1fc5bfb42e1ae706418cbe6e0a7
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:15 2024 +0100

    Update ural dependency

commit a28036186f5d35e435cade7638ed35361054967e
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:08 2024 +0100

    Add emoji library dependency

commit bb50fc946fb6cdd8454969514bdc6d5ecf3f3530
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:04:59 2024 +0100

    Add 'emoji' option to Count Values processor

commit e653e3d8fb9c01697d96316df6f7634454671191
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:04:42 2024 +0100

    Add 'forwards', 'reactions', 'link_title', 'link_attached' columns to mapped Telegram items

commit d749237ec5c103b286ba8086904e405e232fc14c
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 11:02:14 2024 +0100

    telegram: sp too?

    this is why i test locally first...

commit 9d7d27c61425bbbbccd18a8e3de35ab372dbfbf3
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 10:58:48 2024 +0100

    telegram: missed reference to options

commit c1671ce0ef69c71c81c3ae69a59e4ad7dc1bda79
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 10:49:02 2024 +0100

    telegram fix: class dictionaries are shared between all workers

    admin calls get_options and `del options["max_posts"]["max"]` runs, then normal user calls get_options and there is no longer max. could also copy cls.options, but not sure why we cannot create the options in `get_options`.

commit cd2e74d251491a93bc66dc7a64e8b2a60b0ed8ae
Author: Stijn Peeters <[email protected]>
Date:   Wed Mar 20 11:10:30 2024 +0100

    Make Telegram max entities a setting

commit 38fcabb81da956e5513bd0246ee086d1ab4896c9
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:47:59 2024 +0100

    Make metrics table use BIGINT

    Folder size may not fit otherwise!

commit 34013cb91eed7fac725defd408b67bddee4b806b
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:37:10 2024 +0100

    Fix duplicate stats in metrics table

commit c8ad90b3436cff600320d3b2efdf6144240ea59d
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:14:39 2024 +0100

    Calculate disk use stats via worker instead of on demand

commit e4e0c4e3a375bf14bdca7b633231b60e34c322e0
Author: Stijn Peeters <[email protected]>
Date:   Thu Mar 14 10:25:23 2024 +0100

    Spelling thing

commit ae1c00fb3a521a2c3258b2597b04322d202c3ee7
Author: Stijn Peeters <[email protected]>
Date:   Thu Mar 14 10:25:10 2024 +0100

    Disable direct editing of tag order

commit e3ce81452ad8ee3231309383c24fb26e553b0dff
Merge: fa3be93b a7b5820c
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 13 16:25:46 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit fa3be93bafef17e95881207604efa1212d562d9e
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 13 16:25:43 2024 +0100

    instagram: check both user and owner for full_name

commit a7b5820c9f2acb5081ef80ea0293f42ee91925a3
Author: Dale Wahl <[email protected]>
Date:   Tue Mar 12 15:59:43 2024 +0100

    proposed fix to results filter (#417)

    * proposed fix to results filter

    * do not filter datasources at all for results/ view

commit b930b6e964b460ef5160398c6cd1038f766b0548
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 11 12:00:12 2024 +0100

    remove unused code

    the `can_preview` attribute does not appear to exist so this is always hidden

commit 97cd2d52966bd751da704a4a06cfa5478f999885
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 11 11:51:28 2024 +0100

    faster collection of folder size for admin panel

    was between five and six times faster in my tests around 11G of data files)

commit 108fd28b594a95b94727ccc601fec59da61a8d3d
Author: Dale Wahl <[email protected]>
Date:   Thu Mar 7 11:09:33 2024 +0100

    typo fixes, log fix

commit 44848a8f4b9fea07e7f9ce03e4fe0d696d5f1d27
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 6 10:17:34 2024 +0100

    fix tf_idf - sometimes less results than max

commit e5f1f703247a5763d3d0e03c44ee31ab60b8a8ed
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 6 09:33:21 2024 +0100

    fix image downloader failing on 4chan images

    we do not often rename datasources, but when we do...

commit f5e50d508096729bccdc0dafa460f83c419c2606
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 5 16:23:34 2024 +0100

    Version 1.39 -> 1.40

commit 4b3e4efa25914f5f9509f69596a82935440e5f9f
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:28:15 2024 +0100

    Add 'safe' parameter to get_item_data

commit b98f62ab6a3a21815cc0fa899cdca1d48eab0fdb
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:27:57 2024 +0100

    Use iterate_mapped_items in dataset view

commit 6d9baa9c228168dce7fe946681c95d471d45c6e0
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:27:33 2024 +0100

    Update TikTok downloader for new item mapper

commit 1622ec660754582eb2791f0d114df76e71640370
Author: Dale Wahl <[email protected]>
Date:   Mon Feb 26 12:51:31 2024 +0100

    flawless was removed from dataset class, but used by telegram

    adding back to fix telegram, but perhaps it should be changed

commit 84168e945e2ecf963cfdac3409d60544b521f694
Author: Dale Wahl <[email protected]>
Date:   Wed Feb 21 15:56:24 2024 +0100

    webtool checks for gunicorn and if exists sets up error log

    this normally only ran in Docker

commit 7119862feac1e9993b8dedccc59887830e7715a1
Author: Stijn Peeters <[email protected]>
Date:   Tue Feb 20 18:36:21 2024 +0100

    Use MappedItem in ML processors

commit 32b8790420af8572f4a3db2d2bc8ffd696872114
Author: Stijn Peeters <[email protected]>
Date:   Tue Feb 20 16:58:22 2024 +0100

    Map items to objects instead of dicts (#409)

    * Consistent parameter name for map_item()

    * Wrap mapped items in MappedItem() object

    * Keep track of import warnings in search.py

    * Add warning when mapping a tweet with missing metric data

    * Add new iterate_mapped_objects method

    * Log mapping warnings when merging datasets

    * Pass object instead of dict

    * Clarify Twitter warning

    * Documenting MappedItem

    * Explain things to myself

    * R…
  • Loading branch information
stijn-uva committed Sep 19, 2024
1 parent da99f39 commit 941007c
Showing 1 changed file with 0 additions and 4 deletions.
4 changes: 0 additions & 4 deletions webtool/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,6 @@
file_handler.setFormatter(logFormatter)
app.logger.addHandler(file_handler)

else:
log = Logger()

fourcat_modules = ModuleCollector()
db = Database(logger=log, dbname=config.get("DB_NAME"), user=config.get("DB_USER"),
password=config.get("DB_PASSWORD"), host=config.get("DB_HOST"),
port=config.get("DB_PORT"), appname="frontend")
Expand Down

0 comments on commit 941007c

Please sign in to comment.