Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor module loader #455

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Refactor module loader #455

wants to merge 13 commits into from

Conversation

stijn-uva
Copy link
Member

@stijn-uva stijn-uva commented Sep 19, 2024

This is a more radical version of #425 that makes some significant changes to how configuration contexts are used and handled. TL;DR, in order of preference:

  • Use self.config when in an object context to read settings
  • Use the method argument config when in a constructor, or static or class method, to read settings
  • Use the config imported from webtool when in a Flask view to read settings
  • Use the global config variable to read settings otherwise. In principle this should only occur in two places in the code, bootstrap.py and Flask's __init__.py (there are currently a few other places where relying on injection would be too inconvenient)
  • Do not pass around user objects. Instead pass around user-aware ConfigWrappers. Where you need to use User objects, pass a config as argument to their instructor so that they will read from their own configuration context (if desired).

The long version:

  • Eliminate the passing of user objects as much as possible in favour of heavier usage of ConfigWrapper. This means that processors no longer need to be aware of users (with very few exceptions) - I think this is good because users are a front-end concept, not one the back-end should need to care about. Users were only ever really needed in the back-end to make sure the right configuration values were read.
  • ConfigWrapper is used to create a configuration manager (reader) that reads configuration values taking into account a certain user and optionally a Flask request. This is then used to make sure that the right configuration tags are applied. The wrapping can happen in low-level parts of the code such as the main worker class, and the result can just be passed on to higher level functions such as processor process() etc. As long as a processor reads from self.config the right values should always be read.
  • Because get_options(), validate_query() and is_compatible_with() are class methods they cannot access self.config. Instead they now have an argument config (replacing user) to which a reader or wrapped reader is passed. They then read from that for e.g. checking whether an API key is present or how many items can be processed.
  • Instead of each set of views defining their own config as before, this is now done in the main Flask app, so that the front-end only ever instantiates a single (wrapped) configuration reader.
  • In the backend, the configuration reader is instantiated as early as in bootstrap.py, and then passed all the way down to the individual modules where it is wrapped as needed. Thus in the backend there is also only a single reader that is wrapped in various ways as needed.
  • ModuleCollector now also gets its config reader via injection rather than instantiating its own, to avoid some config managers being aware of module-defined settings and others not (this was the original goal of Refactor module loading v2: move writing of module_config.bin to bootstrap #425).
  • If a processor ever still needs to access the user object, it is available via the ConfigWrapper().user property (which only exists for wrapped readers). This should be avoided but is necessary in some places e.g. the Telegram search worker which needs to behave differently when used by an anonymous user.

stijn-uva and others added 12 commits November 7, 2023 12:39
I think that is everywhere in the frontend.

Backend is a bit odd as we are passing dataset.modules when it is None and thus creating children that would require individual inits of ModuleCollector. Could be more to look at there.
commit 3f2a62a124926cfeb840796f104a702878ac10e5
Author: Carsten Schnober <[email protected]>
Date:   Wed Sep 18 18:18:29 2024 +0200

    Update Gensim to >=4.3.3, <4.4.0 (#450)

    * Update Gensim to >=4.3.3, <4.4.0

    * update nltk as well

    ---------

    Co-authored-by: Dale Wahl <[email protected]>
    Co-authored-by: Sal Hagen <[email protected]>

commit fee2c8c08617094f28496963da282d2e2dddeab7
Merge: 3d94b666 f8e93eda
Author: sal-phd-desktop <[email protected]>
Date:   Wed Sep 18 18:11:19 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 3d94b666cedd0de4e0bee953cbf1d787fdc38854
Author: sal-phd-desktop <[email protected]>
Date:   Wed Sep 18 18:11:04 2024 +0200

    FINALLY remove 'News' from the front page, replace with 4CAT BlueSky updates and potential information about the specific server (to be set on config page)

commit f8e93edabe9013a2c1229caa4c454fab09620125
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 15:11:21 2024 +0200

    Simple extensions page in Control Panel

commit b5be128c7b8682fb233d962326d9118a61053165
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:08:13 2024 +0200

    Remove 'docs' directory

commit 1e2010af44817016c274c9ec9f7f9971deb57f66
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:07:38 2024 +0200

    Forgot TikTok and Douyin

commit c757dd51884e7ec9cf62ca1726feacab4b2283b7
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:01:31 2024 +0200

    Say 'zeeschuimer' instead of 'extension' to avoid confusion with 4CAT extensions

commit ee7f4345478f923541536c86a5b06246deae03f6
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:00:40 2024 +0200

    RIP Parler data source

commit 11300f2430b51887823b280405de4ded4f15ede1
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 11:21:37 2024 +0200

    Tuplestring

commit 547265240eba81ca0ad270cd3c536a2b1dcf512d
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 11:15:29 2024 +0200

    Pass user obj instead of str to ConfigWrapper in Processor

commit b21866d7900b5d20ed6ce61ee9aff50f3c0df910
Author: Stijn Peeters <[email protected]>
Date:   Tue Sep 17 17:45:01 2024 +0200

    Ensure request-aware config reader in user object when using config wrapper

commit bbe79e4b0fe870ccc36cab7bfe7963b28d1948e3
Author: Sal Hagen <[email protected]>
Date:   Tue Sep 17 15:12:46 2024 +0200

    Fix extension path walk for Windows

commit d6064beaf31a6a85b0e34ed4f8126eb4c4fc07e3
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 16 14:50:45 2024 +0200

    Allow tags that have no users

    Use case: tag-based frontend differentiation using X-4CAT-Config-Via-Proxy

commit b542ded6f976809ec88445e7b04f2c81b900188e
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 16 14:13:14 2024 +0200

    Trailing slash in query results list

commit a4bddae575b22a009925206a1337bdd89349e567
Author: Dale Wahl <[email protected]>
Date:   Mon Sep 16 13:57:23 2024 +0200

    4CAT Extension - easy(ier) adding of new datasources/processors that can be mainted seperately from 4CAT base code (#451)

    * domain only

    * fix reference

    * try and collect links with selenium

    * update column_filter to find multiple matches

    * fix up the normal url_scraper datasource

    * ensure all selenium links are strings for join

    * change output of url_scraper to ndjson with map_items

    * missed key/index change

    * update web archive to use json and map to 4CAT

    * fix no text found

    * and none on scraped_links

    * check key first

    * fix up web_archive error reporting

    * handle None type for error

    * record web archive "bad request"

    * add wait after redirect movement

    * increase waittime for redirects

    * add processor for trackers

    * dict to list for addition

    * allow both newline and comma seperated links

    * attempt to scrape iframes as seperate pages

    * Fixes for selenium scraper to work with config database

    * installation of packages, geckodriver, and firefox if selenium enabled

    * update install instructions

    * fix merge error

    * fix dropped function

    * have to be kidding me

    * add note; setup requires docker... need to think about IF this will ever
    be installed without Docker

    * seperate selenium class into wrapper and Search class so wrapper can be
    used in processors!

    * add screenshots; add firefox extension support

    * update selenium definitions

    * regex for extracting urls from strings

    * screenshots processor; extract urls from text and takes screenshots

    * Allow producing zip files from data sources

    * import time

    * pick better default

    * test screenshot datasource

    * validate all params

    * fix enable extension

    * haha break out of while loop

    * count my items

    * whoops, len() is important here

    * must be getting tired...

    * remove redundant logging

    * Eager loading for screenshots, viewport options, etc

    * Woops, wrong folder

    * Fix label shortening

    * Just 'queue' instead of 'search queue'

    * Yeah, make it headless

    * README -> DESCRIPTION

    * h1 -> h2

    * Actually just have no header

    * Use proper filename for downloaded files

    * Configure whether to offer pseudonymisation etc

    * Tweak descriptions

    * fix log missing data

    * add columns to post_topic_matrix

    * fix breadcrumb bug

    * Add top topics column

    * Fix selenium config install parameter (Docker uses this/manual would
    need to run install_selenium, well, manually)

    * this processor is slow; i thought it was broken long before it updated!

    * refactor detect_trackers as conversion processor not filter

    * add geckodriver executable to docker install

    * Auto-configure webdrivers if available in PATH

    * update screenshots to act as image-downloader and benefit from processors

    * fix is_compatible_with

    * Delete helper-scripts/migrate/migrate-1.30-1.31.py

    * fix embeddings is_compatible_with

    * fix up UI options for hashing and private

    * abstract was moved to lib

    * various fixes to selenium based datasources

    * processors not compatible with image datasets

    * update firefox extension handling

    * screenshots datasource fix get_options

    * rename screenshots processor to be detected as image dataset

    * add monthly and weekly frequencies to wayback machine datasource

    * wayback ds: fix fail if all attempts do not realize results; addion frequency options to options; add daily

    * add scroll down page to allow lazy loading for entire page screenshots

    * screenshots: adjust pause time so it can be used to force a wait for images to load

    I have not successfully come up with or found a way to wait for all images to load; document.readyState == 'complete' does not function in this way on certain sites including the wayback machine

    * hash URLs to create filenames

    * remove log

    * add setting to toggle display advanced options

    * add progress bars

    * web archive fix query validation

    * count subpages in progress

    * remove overwritten function

    * move http response to own column

    * special filenames

    * add timestamps to all screenshots

    * restart selenium on failure

    * new build have selenium

    * process urls after start (keep original query parameters)

    * undo default firefox

    * quick max

    * rename SeleniumScraper to SeleniumSearch

    todo: build SeleniumProcessor!

    * max number screenshots configurable

    * method to get url with error handling

    * use get_with_error_handling

    * d'oh, screenshot processor needs to quit selenium

    * update log to contain URL

    * Update scrolling to use Page down key if necessary

    * improve logs

    * update image_category_wall as screenshot datasource does not have category column; this is not ideal and ought to be solved in another way.

    Also, could I get categories from the metadata? That's... ugh.

    * no category, no processor

    * str errors

    * screenshots: dismiss alerts when checking ready state is complete

    * set screenshot timeout to 30 seconds

    * update gensim package

    * screenshots: move processor interrupt into attempts loop

    * if alert disappears before we can dismiss it...

    * selenium specific logger

    * do not switch window when no alert found on dismiss

    * extract wait for page to load to selenium class

    * improve descriptions of screenshot options

    * remove unused line

    * treat timeouts differently from other errors

    these are more likely due to an issue with the website in question

    * debug if requested

    * increase pause time

    * restart browser w/ PID

    * increase max_workers for selenium

    this is by individual worker class not for all selenium classes... so you can really crank them out if desired

    * quick fix restart by pid

    * avoid bad urls

    * missing bracket & attempt to fix-missing dependencies in Docker install

    * Allow dynamic form options in processors

    * Allow 'requires' on data source options as well

    * Handle list values with requires

    * basic processor for apple store; setup checks for additional requirements

    * fix is_4cat_class

    * show preview when no map_item

    * add google store datasource

    * Docker setup.py use extensions

    * Wider support for file upload in processors

    * Log file uploads in DMI service manager

    * add map_item methods and record more data per item

    need additional item data as map_item is staticmethod

    * update from master; merge conflicts

    * fix docker build context (ignore data files)

    * fix option requirements

    * apple store fix: list still tries to get query

    * apple & google stores fix up item mapping

    * missed merge error

    * minor fix

    * remove unused import

    * fix datasources w/ files frontend error

    * fix error w/ datasources having file option

    * better way to name docker volumes

    * update two other docker compose files

    * fix docker-compose ymls

    * minor bug: fix and add warning; fix no results fail

    * update apple field names to better match interface

    * update google store fieldnames and order

    * sneak in jinja logger if needed

    * fix fourcat.js handling checkboxes for dynamic settings

    * add new endpoint for app details to apple store

    * apple_store map new beta app data

    * add default lang/country

    * not all apps have advisories

    * revert so button works

    * add chart positions to beta map items

    * basic scheduler

    To-do
    - fix up and add options to scheduler view (e.g. delete/change)
    - add scheduler view to navigator
    - tie jobs to datasets? (either in scheduler view or, perhaps, filter dataset view)
    - more testing...

    * update scheduler view, add functions to update job interval

    * revert .env

    * working scheduler!

    * basic scheduler view w/ datasets

    * fix postgres tag

    * update job status in scheduled_jobs table

    * fix timestamp; end_date needed for last run check; add dataset label

    * improve scheduler view

    * remove dataset from scheduled_jobs table on delete

    * scheduler view order by last creation

    * scheduler views: separate scheduler list from scheduled dataset list

    * additional update from master fixes

    * apple_store map_items fix missing locales

    * add back depth for pagination

    * correct route

    * modify pagination to accept args

    * pagination fun

    * pagination: i hate testing on live servers...

    * ok ok need the pagination route

    * pagination: add route_args

    * fix up scheduler header

    * improve app store descriptions

    * add azure store

    * fix azure links

    * azure_store: add category search

    * azure fix type of config update timestamp

    OPTION_DATE does not appear correctly in settings and causes it to be written incorrectly

    * basic aws store

    * check if selenium available; get correct app_id

    * aws: implement pagination

    * add logging; wait for elements to load after next page; attempts to rework filter option collection

    * apple_store: handle invalid param error

    * fix filter_options

    * aws: fix filter option collection!

    * more merge

    * move new datasources and processors to extensions and modify setup.py and module loader to use the new locations

    * migrate.py to run extension "fourcat_install.py" files

    * formatting

    * remove extensions; add gitignore

    * excise scheduler merge

    * some additional cleanup from app_studies branch

    * allow nested datasources folders; ignore files in extensions main folder

    * allow extension install scripts to run pip if migrate.py has not

    * Remove unused URL functions we could use ural for

    * Take care of git commit hash tracking for extension processors

    * Get rid of unused path.versionfile config setting

    * Add extensions README

    * Squashed commit of the following:

    commit cd356f7a69d15e8ecc8efffc6d63a16368e62962
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 17:36:18 2024 +0200

        UI setting for 4CAT install ad in login

    commit 0945d8c0a11803a6bb411f15099d50fea25f10ab
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 17:32:55 2024 +0200

        UI setting for anonymisation controls

        Todo: make per-datasource

    commit 1a2562c2f9a368dbe0fc03264fb387e44313213b
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 15:53:27 2024 +0200

        Debug panel for HTTP headers in control panel

    commit 203314ec83fb631d985926a0b5c5c440cfaba9aa
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 15:53:17 2024 +0200

        Preview for HTML datasets

    commit 48c20c2ebac382bd41b92da4481ff7d832dc1538
    Author: Desktop Sal <[email protected]>
    Date:   Wed Sep 11 13:54:23 2024 +0200

        Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies

    commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3
    Author: Dale Wahl <[email protected]>
    Date:   Fri Sep 6 16:29:19 2024 +0200

        fix nltk where it matters

    commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d
    Author: Stijn Peeters <[email protected]>
    Date:   Tue Sep 3 12:05:14 2024 +0200

        Actually check progress in text annotator

    commit 693960f41b73e39eda0c2f23eb361c18bde632cd
    Author: Stijn Peeters <[email protected]>
    Date:   Mon Sep 2 18:03:18 2024 +0200

        Add processor for stormtrooper DMI service

    commit 6ae964aad492527bc5d016a00f870145aab6e1af
    Author: Stijn Peeters <[email protected]>
    Date:   Fri Aug 30 17:31:37 2024 +0200

        Fix reference to old stopwords list in neologisms preset

    * Fix Github links for extensions

    * Fix commit detection in extensions

    * Fix extension detection in module loader

    * Follow symlinks when loading extensions

    Probably not uncommon to have a checked out repo somewhere to then symlink into the extensions dir

    * Make queue message on create page more generic

    * Markdown in datasource option tooltips

    * Remove Spacy model from requirements

    * Add software_source to database SQL

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>
    Co-authored-by: Stijn Peeters <[email protected]>

commit cd356f7a69d15e8ecc8efffc6d63a16368e62962
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 17:36:18 2024 +0200

    UI setting for 4CAT install ad in login

commit 0945d8c0a11803a6bb411f15099d50fea25f10ab
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 17:32:55 2024 +0200

    UI setting for anonymisation controls

    Todo: make per-datasource

commit 1a2562c2f9a368dbe0fc03264fb387e44313213b
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 15:53:27 2024 +0200

    Debug panel for HTTP headers in control panel

commit 203314ec83fb631d985926a0b5c5c440cfaba9aa
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 15:53:17 2024 +0200

    Preview for HTML datasets

commit 48c20c2ebac382bd41b92da4481ff7d832dc1538
Author: Desktop Sal <[email protected]>
Date:   Wed Sep 11 13:54:23 2024 +0200

    Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies

commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3
Author: Dale Wahl <[email protected]>
Date:   Fri Sep 6 16:29:19 2024 +0200

    fix nltk where it matters

commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d
Author: Stijn Peeters <[email protected]>
Date:   Tue Sep 3 12:05:14 2024 +0200

    Actually check progress in text annotator

commit 693960f41b73e39eda0c2f23eb361c18bde632cd
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 2 18:03:18 2024 +0200

    Add processor for stormtrooper DMI service

commit 6ae964aad492527bc5d016a00f870145aab6e1af
Author: Stijn Peeters <[email protected]>
Date:   Fri Aug 30 17:31:37 2024 +0200

    Fix reference to old stopwords list in neologisms preset

commit 4ba872bef2968f7f8bf5831fd3a4f413420b36ed
Author: Dale Wahl <[email protected]>
Date:   Tue Aug 27 13:04:46 2024 +0200

    fix hatebase: default column option for OPTION_MULTI_SELECT must be list

commit e276033542f2d22e7f614f318a01d65114a21482
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Aug 21 12:53:10 2024 +0200

    Bump nltk from 3.6.7 to 3.9 (#447)

    Bumps [nltk](https://github.com/nltk/nltk) from 3.6.7 to 3.9.
    - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
    - [Commits](https://github.com/nltk/nltk/compare/3.6.7...3.9)

    ---
    updated-dependencies:
    - dependency-name: nltk
      dependency-type: direct:production
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 1d749c3cf83b130ba70bdb09174f382d6711a14b
Author: sal-phd-desktop <[email protected]>
Date:   Wed Aug 21 12:52:54 2024 +0200

    Set UTF-8 encoding when opening stop words (fixes Windows bug)

commit a03e5fd4252e7242563c291558606440256eb3d1
Author: Dale Wahl <[email protected]>
Date:   Mon Aug 19 14:19:21 2024 +0200

    remove duplicate line

commit aa07e8c13c2d59c6b699f78133036514659ee420
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 29 09:35:22 2024 +0200

    tweet import fix: author banner key missing when author has no banner

commit 32dac5d2ffb936210f12f5c725514fd25a0286f1
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 29 08:52:08 2024 +0200

    tell user when dataset is not found

    we could have a proper 404 page, but at least leave a message

commit 2c8c860fc5378113d1352016ac26ca761adecb32
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 22 17:41:00 2024 +0200

    telegram fix: reactions datastructure

commit 1c0bf5e580eb16d8a6f9afa415f9febce449a537
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 22 11:19:52 2024 +0200

    fix telegram: crawl_max_depth can be None if it is not enabled for a user

commit 3dfe7af292b33574a31630e3a0da10954ed87d0a
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 19 11:52:31 2024 +0200

    fix more config.get() magic

commit 2453182bcee6e54b396b762ab77b60b8a0893638
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 19 10:54:23 2024 +0200

    config_manager - fix `get_all` w/ one results (super rare edge); fix overwriting self.db in `with_db`

commit 6b9cb0b5479e6e64e09a49fa2ca9effe1c5a7415
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 15:20:49 2024 +0200

    add surf nginx init file

commit 5e984e13a08d9fba7d5806a7ef4e012ce7d57319
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 14:30:34 2024 +0200

    change port for surf

commit 2ce8c354e90f939a16dad3f0155fd7d79405c79e
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 12:54:11 2024 +0200

    use latest image on surf

commit 13ec0fd3f2bed86c3b2dff73014093a6a92fbfb5
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 12:46:59 2024 +0200

    update surf docker-compose.yml

     this may require a new release

commit 78698f6ac1b22b1154d31f69543ba7b266d33191
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 10:34:56 2024 +0200

    clip: handle new and old format

commit eb7693780cb191403f107817ca30d90373929bf0
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 16 14:27:08 2024 +0200

    DMI SM updates to use status endpoint w/ database records; run on CPU if no GPU enabled

commit d2a787e2c1559417bb5401f3208c82954052504f
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 15:58:06 2024 +0200

    Require most recent Telethon version

commit 346150bd9cc96ac099abd4d15fa3de39bd65e9d1
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 15:57:55 2024 +0200

    Catch UPDATE_APP_TO_LOGIN in Telegram

commit 04acc06e95098d7e2f9b4af404447c9cfaee5b99
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 11:27:30 2024 +0200

    Unbreak Twitter error handling

commit e9b5232a963be02c2e86dabacb607b2315a4e0e6
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 13:27:15 2024 +0200

    Ensure str type when trying to extract video URLs from a field

commit d69dd6f337cac05ed31c05334890679976a1e6de
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 12:31:14 2024 +0200

    Make CSV column mapping params look nicer on result page

commit 9bd9da568f593085a8d54744836e3290a75b51a7
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 12:22:03 2024 +0200

    Add "empty" and "current timestamp" as options to CSV mapping

commit 0b574571952a206904440faf8601ddf95ab42b24
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 11 16:59:56 2024 +0200

    image_wall: backup fit method

commit eeb1ddeb7ca85b6802dfed3c74d1352062383d50
Merge: 2504c37b 43239467
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 11 16:47:45 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 43239467db046eea5eb5268f91d1b63a1042238d
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 11 12:08:08 2024 +0200

    fix processor more button

    would only show top level analysis if not logged in

commit d6ab2b0783f8e40ecd8fadbc2abccffa6f093e39
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 9 15:35:25 2024 +0200

    search_gab - use MappedItem

commit 2504c37b67ff6f19720b44d8bb6054b1c3d5a155
Author: Stijn Peeters <[email protected]>
Date:   Sat Jul 6 17:51:22 2024 +0200

    Fix multiline spacing in multi select list

commit fea66ce38be0717da6c1f847e7124f7069c096e2
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 13:15:45 2024 +0200

    use processor media_type if dataset does not have media_type; set default media_type for downloaders

commit d41fa34514e8177efdac7e64a31f2ee75c7d1652
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:57:18 2024 +0200

    video_hasher: handle no metadata file

commit 2820dcecc36ed4705a2776064d387ff7ed14e84f
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:50:09 2024 +0200

    num_rows not num_items()

commit fb09162db902fa22fdf2d7a3ed171ce1489bd92f
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:44:03 2024 +0200

    Google vision API returning 400s; properly log and record processed entries; google networks should not run on empty datasets

commit ebf39d8262d199895aedc4f7fa275c5685e58563
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:28:13 2024 +0200

    fix image_category_wall

    whoops, cleared categories and post_values after filling them!

commit 1ad9ec2c2e76604793ec37584c051f116af2fdab
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 12:03:54 2024 +0200

    fsdfdsgd sorry

commit c7254c08a477c6cdc8497507e8452c3eff7101c9
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 12:01:21 2024 +0200

    Fix razdel versioning

commit b9a327abe99f2d9ede4f2747f34f20d1dc6803cb
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 11:57:47 2024 +0200

    Reorganise tokeniser, stopwords

commit fb13bc483af9ba0d677ee35fd045bf36ab1cddf7
Merge: 0b745692 e3046496
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 11:56:08 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit e30464964262870c54c73f65a3bce630d6576f45
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 10:51:53 2024 +0200

    media_upload allow setting for max_form_part and warn users of failure above certain number of files

commit e4f982b4550b352a5d1a131abd78d52e6c196e48
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 09:50:49 2024 +0200

    Update media_import help text; looks like failure happens somewhere between 600-1000 files due to Flask request size limits

commit 0b74569280f8f87376a964a6b160ea1993cb3354
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 4 17:55:36 2024 +0200

    Add razdel as option for Russian tokenisation

commit 9f15a2b8d666c3b6fddeb151b7c424cb44df18a6
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 17:13:15 2024 +0200

    remove the log

commit ffcb6a4239075ba190fb534b25b89507e09e5f56
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 17:12:43 2024 +0200

    Inform user if too many files are uploaded

    I do not understand why this is appearing. app.config['MAX_CONTENT_LENGTH'] is set to None. Problem persists in Flask alone (i.e., does not appear to be Gunicorn/Nginx/Apache).

commit 9cad12dd6f64a63c48d3b5b304b5c7d9d1a6ddb7
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 4 15:09:42 2024 +0200

    Bump version

commit aad94f393de77cc9d4f578e1f5be66a3601a4c90
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 10:51:01 2024 +0200

    Update setup.py to ensure videohash updates

commit d9154a6f9c46a5c793909b88da751bc71d6f759f
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:45:26 2024 +0200

    clip: categorizing requires categories...

    seriously, guys?

commit 0af9a5ec49bd2bcfbb87bda33976c65683f68777
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:31:49 2024 +0200

    blip2: fix no metadata file found (uploads...)

commit d695053f440bd938a57f06adea7b9c732ecf30d7
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:25:26 2024 +0200

    cat_vis_wall - use str as category type if mixed

    i.e., use floats as string categories

commit bcb914076760ea1fb0e277cdcd1782ffa101b535
Author: Sal Hagen <[email protected]>
Date:   Tue Jul 2 16:06:43 2024 +0200

    Add Twitter author profile pic and banner URLs

commit 1b3b02f826578e8f702ea84a27c8ced7b1fab345
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:42:50 2024 +0200

    add migrate.py log file in Docker

commit 2aaa972e6888743fc329d721c37fa626cf2eeae3
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:42:22 2024 +0200

    add necessary pip packages for upgrade in Docker environment; add error logging and save to file for trouble shooting

commit 18b8a53c01b334e0f70610b1305d380b25dbe9c6
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:41:36 2024 +0200

    update Dockerfile to keep build environment

    useful for interactive upgrade

commit 7b224b9b798c9aaf956b5b618b98d742c4a2e7cd
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:41:12 2024 +0200

    remove docker-compose.yml versions

commit acf5de0ed02e144b920a80abfdfa35986dd0ed4c
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 1 17:38:32 2024 +0200

    Better issues.md, footer link

commit 1953ca3895656ca9a12d2657e58019795ae64b3a
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 1 12:00:07 2024 +0200

    FIX: get_key() is more of a creating of a key then general getting of a key...

commit 12289bb5c766d1af23799ff11278b46b48fc2841
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 1 11:37:06 2024 +0200

    .metadata.json may not have top_parent via Media Uploader

    This may exist in other processors if a proper check is not in place; will need to review

commit 25f4ed65ec2c32298a90490cf51037a7ea2d0bf9
Author: Dale Wahl <[email protected]>
Date:   Tue Jun 25 14:43:40 2024 +0200

    Media upload datasource! (#419)

    * basic changes to allow files box

    * basic imports, yay!

    * video_scene_timelines to work on video imports!

    * add is_compatible_with checks to processors that cannot run on new media top_datasets

    * more is_compatible fixes

    * necessary function for checking media_types

    * enable more processors on media datasets

    * consolidate user_input file type

    * detect mimetype from filename

    best I can do without downloading all the files first.

    * handle zip archives; allow log and metadata files

    * do not count metadata or log files in num_files

    * move machine learning processors so they can be imported elsewhere

    * audio_to_text datasource

    * When validating zip file uploads, send list of file attributes instead of the first 128K of the zip file

    * Check type of files in zip when uploading media

    * Skip useless files when uploading media as zip

    * check multiple zip types in JS

    * js !=== python

    * fix media_type for loose file imports; fix extension for audio_to_text preset; fix merge for some processors w/ media_type

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>

commit 4ce689bdc3e441a7adf85883ddcda6bae0525ed9
Author: Stijn Peeters <[email protected]>
Date:   Mon Jun 24 11:58:50 2024 +0200

    Avoid KeyError

commit 155522d0817d19ac7b6b0b0164242156d6f7443a
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 20 15:58:21 2024 +0200

    add generated images to image wall w/ text visual

commit eecde519eab1208eeb6ee53c2d8febff7fb8febf
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 20 15:57:56 2024 +0200

    allow users to NOT generate all images from prompts

commit d0b9574093a109997e63b1062b2bdd8e71300a29
Author: Stijn Peeters <[email protected]>
Date:   Wed Jun 19 16:28:26 2024 +0200

    ...don't mangle URLs in preview links

commit c105e368a521ec54ae717bb9eb2fe9fae66cf6e8
Merge: 0028a999 8d4f99b2
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 16:25:36 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 0028a9994d698611dd8b546b9b3bccbeec30b74f
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 16:25:12 2024 +0200

    add followups to processors

commit 8d4f99b22e0308606c7f713ef704dfa939e85247
Author: Stijn Peeters <[email protected]>
Date:   Wed Jun 19 16:17:22 2024 +0200

    More flexible URL linking in CSV preview

commit f4f8e6621bd6f2504dc3afc2078280bf5edb6444
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 13:54:00 2024 +0200

    tokeniser fix: use default lang for word_tokenize if language is 'other'

commit 127472e91d8e510f3de2a9cc4a87be6cf2d0deaa
Author: Stijn Peeters <[email protected]>
Date:   Tue Jun 18 16:45:01 2024 +0200

    Better log messages for Telegram data source

commit e8714b6fba72e00c690a8d643d8dc54d2250c94a
Author: Stijn Peeters <[email protected]>
Date:   Mon Jun 17 17:42:21 2024 +0200

    Add 'crawl' feature to Telegram data source

    Fixes #321 (though might need a bit more testing)

commit 25fded7b596097f7916e1793f1841bae2b63d453
Merge: d67cf440 b10e3bb8
Author: sal-phd-desktop <[email protected]>
Date:   Fri Jun 14 16:23:02 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit d67cf440730ea1d4e124c76a4c21d65b56f39c68
Author: sal-phd-desktop <[email protected]>
Date:   Fri Jun 14 16:22:59 2024 +0200

    Fix export 4chan script and remove some unecessary code

commit b10e3bb8f0c8a67aa5fdbba1962301d8acdf625c
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 15:14:06 2024 +0200

    video_hasher prefix: fix extension type

commit ba565cdaa2ebeecf23fd60889d546c76b9ea5eb1
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 14:53:13 2024 +0200

    video_hasher: fix to work with Pillow updates; add max amount videos

commit 90da5d231eff6a4249bef5468fcdbf1ebcf9247a
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 10:25:24 2024 +0200

    image_cat_wall fix the fix

commit a8b943d8e2c5471f82ea0442e2659d84fe8d9760
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 12 13:29:41 2024 +0200

    add OCR processor to image w/ text visualization

commit e7e636b6b89b6163fa6976e67edba68e7d75b7ac
Author: Dale Wahl <[email protected]>
Date:   Tue Jun 11 15:23:12 2024 +0200

    add image_wall_w_text to follow on BLIP captions

commit f74b97827f0465baf8483040471a77e4654e70b1
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 6 11:05:25 2024 +0200

    image_category_wall: allow multiple images per item/post

commit e3c9ea57d46b32ba47b00a6047a278ddd530adc1
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 16:27:50 2024 +0200

    image_category_wall convert None to str for category

commit 00874576c354235f4655f1d433ec4382010e18e3
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 14:54:51 2024 +0200

    image_category_wall fix float categories

commit e0c55a8ae132bedef5da27ecbbb9489a094d454c
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 12:51:42 2024 +0200

    download_images fix divide by zero when user can download all

commit 3580fc9450501262badb8e61ef4b4df4b4c54322
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 12:51:24 2024 +0200

    image_category_wall remove 'max' when user can use all images

commit f2145bdeff1d68e46cdd3521ecbb61573f01a2f2
Author: Dale Wahl <[email protected]>
Date:   Wed May 29 17:59:23 2024 +0200

    rank_attributes: option to count missing data or blanks

commit 01e7ab9677a75181bbedc62fa00e636ce2b17c18
Author: Dale Wahl <[email protected]>
Date:   Wed May 29 16:53:57 2024 +0200

    fix missing field strategy so default_stategy not overwritten on second loop

    default_stategy would be set to correctly to the callable, but overwritten on second loop (and map_missing is a dictionary at that point).

commit 097f838af1f5f2748578dd9072eb9e3a8b3a7057
Author: Dale Wahl <[email protected]>
Date:   Tue May 28 12:16:08 2024 +0200

    add log_level arg to 4cat-daemon.py

    I've been using this forever and don't know why I haven't commited it

commit fd3ac238e60f052889d99c71588170570a384900
Author: Dale Wahl <[email protected]>
Date:   Tue May 28 10:10:56 2024 +0200

    google & clarifai to csv had identical "type"

    possibly caused issue w/ preset

commit 1b9965d40aa33035a73f685c13a1ab50cc877f78
Author: Stijn Peeters <[email protected]>
Date:   Mon May 27 15:54:20 2024 +0200

    Ensure file cleanup worker always exists

commit 0e0917f2232e240df3412fd4df51cf0be19248b5
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:36:22 2024 +0200

    Also update Spacy model versions...

commit f40128213529d154cfb77afa7aa67a72d5bb640f
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:32:35 2024 +0200

    *Actually* remove typing_extensions dependency

    ???

commit ba3d83b824c5fb6fcb0aec5e1c36b35070d6e5d9
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:30:08 2024 +0200

    Update minimum Pillow dependency version

commit 1c3485648bf2a911052eeeae4f293f303a944aec
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:27:27 2024 +0200

    Do not require typing_extensions explicitly

    This was required to ensure Spacy could load - looks like Spacy has since been updated to work with newer versions of typing_extensions as well

commit 3828de83ba123254463a904392f24daec626c136
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:02:04 2024 +0200

    Bump version

commit 8f0d098107a4bbc9d55cc6048f7a38f1d1891a32
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:01:28 2024 +0200

    Require non-broken version of emoji library

commit 4b2ad805fcc99a83e46732fc991d98d78ef06c6c
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 13:11:03 2024 +0200

    Show worker progress in control panel if available

commit 9144d4503f46108437616d6bc0cf4fde74df3aca
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 11:07:41 2024 +0200

    Bump version

commit 807ab77101d197ec897640480a2140439d570c05
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 21:57:11 2024 +0200

    Fix Instagram upload with missing media URL

commit d0b4840fd465b6d21657c3d50f9291ac911b6082
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:35:04 2024 +0200

    Comma comma comma

commit 7fd2e14c9505d0ed1ac77dc09c24f766ea61ee6c
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:25:26 2024 +0200

    Fix progress indicator for scene extractor

commit 661c42c2d083da7004335b0e14910935c3d392f6
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:12:21 2024 +0200

    Don't crash video hasher non non-str item IDs

commit 1f280321cdde27a9909885fa2f64dbeffa549fb1
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:09:53 2024 +0200

    Do not crash timelines processor when metadata has unexpected format

commit 572d03f1f368f0ad5f47e705a119b37646148d1d
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:09:30 2024 +0200

    More efficient video frame extractor

commit 1b51d224ca544d7e2913238adbff2049412bc41e
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:04:27 2024 +0200

    Fix crash in video stack processor with ffmpeg < 5.1

commit ddc73cb2e2f0985e64f84ca86bc167fa9e9dc81a
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:03:48 2024 +0200

    Helper function for determining ffmpeg version

commit ef9dd482b2258c428584997dc661156f63f68b91
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 12:14:58 2024 +0200

    Allow absence of articleComponent in LinkedIn posts

commit 060f2cd7f922e7fae337b0697f7c477442d21ef1
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 12:12:54 2024 +0200

    Cast post IDs to string when mapping video scenes

commit ab34c415c9ada23763b45676639ce3e80a34f594
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 11:46:39 2024 +0200

    Twitter -> X/Twitter

commit de6d97554ccb68375979e5ff09c7e65d8d70a6cd
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 11:45:19 2024 +0200

    Colleges -> Collages

commit 30365580dc59b4d95e8a62d1b3c666bef60ce7e8
Author: Stijn Peeters <[email protected]>
Date:   Tue May 21 15:41:55 2024 +0200

    Explicit disconnect after Telegram image download

commit 5727ff7230db42463a824f45d63f0b8343caac14
Author: Stijn Peeters <[email protected]>
Date:   Tue May 21 14:05:50 2024 +0200

    Catch TimedOutError while downloading Telegram images

commit e0e06686e78976f971aac620267d7e009eaaadff
Author: Sal Hagen <[email protected]>
Date:   Mon May 13 13:01:42 2024 +0200

    Typo in LinkedIn search

commit 51e58dde6ca21278a80f252a8c22dc83d87ace1f
Author: Dale Wahl <[email protected]>
Date:   Tue May 7 13:10:43 2024 +0200

    text_from_image: fix metadata missing (indent issue)

commit c1f8ecc1674375bba2b2e38cb29c9d4d44098f0a
Author: Dale Wahl <[email protected]>
Date:   Tue May 7 09:45:25 2024 +0200

    text_from_image fix: ensure metadata success before attempting to update original

commit 72dbf80db71499c59133e1128205b756d240b300
Merge: d7561625 baacc86b
Author: Stijn Peeters <[email protected]>
Date:   Fri May 3 13:14:08 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit d7561625b127573fbb0332fbb713be6a3cb3d953
Author: Stijn Peeters <[email protected]>
Date:   Fri May 3 13:14:03 2024 +0200

    Comments without replies don't always have reply_comment_total

commit baacc86b269612b4b0956345f8b9fa902df1b61f
Author: Dale Wahl <[email protected]>
Date:   Fri May 3 12:01:22 2024 +0200

    DSM fix and simplify GPU mem check

commit 9b662e9f9b4f4ce194608c8e20a8fc50bc6d9ae3
Author: Parker-Kasiewicz <[email protected]>
Date:   Thu May 2 00:53:45 2024 -0700

    Adding Gab as a Data Source! (#401)

    * Can successfully import gab data, although
    can't tell if formatting is right becuase
    waiting on queued requests.

    * Version w/ different item types

    * Ingest Gab posts from Zeeschuimer

    * Small fix for merge conflicts (whoops)

    * Gab processing logic transferred from Zeeschuimer

    * fixing small errors for Gab data source

    * basic processing for truth social from Zeeschuimer

    ---------

    Co-authored-by: Dale Wahl <[email protected]>

commit 3ecb8fd9c27aee4c457f03516794c6c4eac19c09
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:51:36 2024 +0200

    Fix duplicate line in views_admin.py

commit 8b66ae7e467913f8e7571cf4b45493f63804266f
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:49:54 2024 +0200

    Allow processors to define which fields should be pseudonymised

commit c973750c8cabb8698704c5997903e92d1de866d2
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:15:32 2024 +0200

    Allow auto-queue of pseudonymisation after import

commit 49ad9f0ff785fd44ae494755b785c7fdf7c9cf15
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:08:35 2024 +0200

    Get rid of redundant and buggy next/copy_to implementation in Search class

commit 106d3659e2fda89867d3a4f587c1c1addfaff2f7
Author: Dale Wahl <[email protected]>
Date:   Wed May 1 16:14:03 2024 +0200

    use current branch in settings

commit 60bef4157d807f7c01ef3b425295244e91919f31
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 11:04:07 2024 +0200

    Nicer code

commit 4182c436e4fb5109c5e041dc729f77a58d877889
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 16:19:36 2024 +0200

    Always shut down API worker only after everything else has been shut down

commit e685108b3cbe5f005ce2df21906267071ad8118e
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 16:12:42 2024 +0200

    Properly interrupt expiration worker when asked

commit 27a568eca7f2f3742223fef6285eaf80583e0fc4
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 13:40:50 2024 +0200

    Allow floats-as-strings as timestamps when importing CSV

commit 2d2bbb9fdb9b426b8f4a80782f04257721a97f2e
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 30 13:05:07 2024 +0200

    douyin: add consistency to map_item stats

commit 289aa342c9912aceeca35887c079c72aa6ffbf52
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 29 15:26:38 2024 +0200

    fix collection data in Douyin to handle $undefined

commit 5b9b23fb1696bc1b69e1d902c0a2ad4b7d168984
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 29 13:00:03 2024 +0200

    add scipy requirement to make compatible with gensim

    https://stackoverflow.com/questions/78279136/importerror-cannot-import-name-triu-from-scipy-linalg-gensim

commit 7eab746e944f1ababe3dcd6a5d25387a64c2237d
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 29 12:00:09 2024 +0200

    stupid, stupid, stupid

commit 90577982ac05019a7ac76818a62f91e84dd65902
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 29 11:56:22 2024 +0200

    Fix leftover iterate_mapped_items

commit 57dbdf74c49c34c05784debb9f7e258da7ae7d54
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 15:26:39 2024 +0200

    Woops

commit f11760d2c13e817e23cfa5e26b24f74cf817f65e
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 15:26:04 2024 +0200

    Update list of supported platforms in readme

commit 760ff1cdeb006f70acaa00ded82fb3cbc7617c9d
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 12:13:28 2024 +0200

    Bump version

commit 1fd78b2362840299e80f5540c9fedc1be3b06da1
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:58:24 2024 +0200

    Use MissingMappedField for Douyin fields undefined in the source data

commit 6918baeabc7a08b6a63495c5d38c86b2c88bca44
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:31:11 2024 +0200

    Fix Douyin mapping failure if cellRoom is $undefined

commit aad6208167c07686348234daff4dcf9cd036f5a5
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:30:53 2024 +0200

    Better error when trying to import data for unknown datasource

commit 43c6ed646994111188bde66d5bcfe4ab602e8512
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:30:31 2024 +0200

    Fix Twitter mapping on URLs that cannot be expanded

commit 91c3da176fad90ba16871fa8892fac5a0df13785
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:12:54 2024 +0200

    Safe cast to int in CrowdTangle import

commit 765f29e9232afdf284ab1667b0f371951e0bf2f4
Author: Stijn Peeters <[email protected]>
Date:   Wed Apr 24 12:37:02 2024 +0200

    Fix erroneous shell command in front-end restart trigger

commit c99fdd9eca8f5925d93375cac846e8b7633194fb
Merge: 342a4037 bc1deddf
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 12:29:35 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 342a4037411e7ccaa50b25a4686434bec39e2568
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 12:29:32 2024 +0200

    Enable TikTok comment and Gab import by default

commit bc1deddf57aa5049fb79622c4309fb7051d77bdb
Merge: 537d7645 3c644f01
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 23 12:16:37 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 537d76456e2826e8c4dd7026ec5b2d436370fad8
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 23 12:14:46 2024 +0200

    do the todo: fix column_filter to match exact/contains with int

commit 3c644f01baeca34e712d36efdf5c77ccd3ef7a06
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 11:16:07 2024 +0200

    Don't crash on empty URLs in dataset merge

commit f1574c26e2e3bdc40cc04bb8193cf6d3fa14792b
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 18 12:08:55 2024 +0200

    fix: do not fail when no processor exists

    weird! failed on a dataset `type="custom-search"` which was created by an import script w/ no processor. Also likely would make deprecated processors fail.
    500 server error:
    ```
    File "/opt/4cat/common/lib/dataset.py", line 800, in get_columns
         return self.get_item_keys(processor=self.get_own_processor())
       File "/opt/4cat/common/lib/dataset.py", line 405, in get_item_keys
         keys = list(items.__next__().keys())
       File "/opt/4cat/common/lib/dataset.py", line 337, in iterate_items
         if own_processor.map_item_method_available(dataset=self):
     AttributeError: 'NoneType' object has no attribute 'map_item_method_available'
    ```

commit 50a4434a37d71af6a9470c7fc4a236b043cbfb4d
Author: Stijn Peeters <[email protected]>
Date:   Wed Apr 17 14:30:58 2024 +0200

    Add "TikTok comments" data source

commit c43e76daae3c2e6ecdb218ee749315b985eccca4
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 16 17:59:25 2024 +0200

    Allow notifications per tag

commit 36984104e674e8577756bfc3fdd5c72f6569d9e1
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 16 17:25:38 2024 +0200

    fix: pass dataset to get_options when queuing processors

commit 59cb19a3c88f7f4a4ac02d0b7a891afde50ea069
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 16 10:55:29 2024 +0200

    fix: dicts are shared in classes & you cannot delete a key more than once

    randomly found this; probably as no one else has reddit enabled!

commit 3ec9c6ea471bcdbe9fb1caad1e5fe1502a705444
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 15 13:22:19 2024 +0200

    fix results page error when dataset was being created; do not check for resultspage updates when user not focused on page

commit db05ae5e565248e865e67b8ea60e6653357bb1f4
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 15 11:27:33 2024 +0200

    on import file, differentiate between missing field(s) and unable to map item

commit 940bac72c7e53bec9e136867c13e2a0a355961a4
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:57:48 2024 +0200

    Case-insensitive username/note matching in user list

commit d0f34245bd07b5ad2fd3e90754ef0264ffc350a9
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:29:12 2024 +0200

    Only determine settings tab name in one place

commit 9f69d7bc0bbb657be1e725d5fb3fe350b7205bff
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:20:34 2024 +0200

    git != github

commit 9b4981d8c7358f31ed65d9f161d556e578389801
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:56:04 2024 +0200

    Fix issues with user tags

    Fix number of users in tag overview; allow filtering by user tags on user list; don't delete all user tags when deleting one

commit 9e8ccd3a78765acdfd2005eaa215dc0dc07266e0
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:32:45 2024 +0200

    Do not hide all non-hidden child processors

    lol

commit 3f15410af3a278f5644f41f49e25498a1fac3c76
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:23:52 2024 +0200

    Disable standard video downloader for Telegram

commit 94c814b9cab2ae2be10d5c5d3f6cfe20898e349c
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:14:16 2024 +0200

    Telegram video downloader processor

commit d36254a188947fff507e8df59f793e98b3be1570
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:14:04 2024 +0200

    Better styling for 4CAT settings, alphabetic order, submenus

commit 808300fa109f306a921f2048b2cf4b6dafc4ba5f
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 11 14:44:32 2024 +0200

    Fix multiselect in UI

commit 131a0eca0ad514b1ee57803e5c560ab0e56de42d
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 8 18:28:04 2024 +0200

    Do not attempt to load crashed file as module in Slack webhook. Fixes #422 (hopefully)

commit 6d8cb067bc12f8be68749f74a7291e0849494225
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:43:58 2024 +0200

    Allow comma-separated list when adding new dataset owners

commit 2612aea49f63c37ac691cc89c553c764ead2344f
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:40:04 2024 +0200

    Include number of users with tag on tag page

commit 39f2ec40faa3b8493bd5525279aeaeb2e4f586e0
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:26:02 2024 +0200

    Fix confirmation before deleting user tag

commit b00a410a3441e7f2a9d73a9f2dfb0f4ef70ea8a5
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:25:01 2024 +0200

    Add link to users with tag on tag admin page

commit 3ef3e5ec9adbd8ddd128ce2b3f8fa3b1de1297e3
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 18:49:25 2024 +0200

    Give filtered datasets a more sensible label, based on source dataset

commit 0d5870b78fb73cb58231736cc8a2efbb0b3cd88a
Author: Dale Wahl <[email protected]>
Date:   Fri Apr 5 17:40:57 2024 +0200

    update iterate methods (#418)

    * working to make iterate_mapped_item primary method used by processors and elsewhere in 4CAT; iterate_item method only internally (and provide item directly as is from file) with iterate_mapped_object as intermediate method to use map_missing method and handle missing values as well as warn if needed

    * switch from iterate_items to iterate_mapped_items; careful attention to item_to_yield allowing a choice of the original item, the mapped item, or both

    * revert some unecessary renaming

    * fix annotations bug...

    this fixes the bug, but i noticed that the notations saved in the database do not have the correct post IDs.

    * Introduce DatasetItem class and simplify iterate_items

    * Don't crash when no item mapper

    * ...actually commit the DatasetItem class

    * Fix typos in comment

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>
    Co-authored-by: Sal Hagen <[email protected]>

commit 17b77351c51ace21b7057276bbae9da2643a3fc4
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 16:20:19 2024 +0200

    Allow dynamic form options in processors (#397)

    * Allow dynamic form options in processors

    * Allow 'requires' on data source options as well

    * Handle list values with requires

    * Wider support for file upload in processors

    * Log file uploads in DMI service manager

    * fix error w/ datasources having file option

    * fix fourcat.js use of checkboxes for dynamic settings

    * Fix faulty toggleButton targeting

    ---------

    Co-authored-by: Dale Wahl <[email protected]>

commit 693fcedc93ee4476a60d0e0876e688f82a8526fa
Author: Dale Wahl <[email protected]>
Date:   Fri Apr 5 15:59:10 2024 +0200

    Add method to processors to toggle display in UI (#411)

    * add ui_only parameter to DataSet.get_available_processors() and BasicProcessor.display_in_ui()

    Allow using `display_in_ui` to hide processors from UI but allow them to be queued either via API or presets. This avoids issue of is_compatible_with() having to be used to hide processors with sometimes ill effects.

    * keep same data structure....

    * don't delete twice; it's redundant... and raises an error

    * Rename arguments/properties

    * Exclude hidden processors in top level view

    * fix logic

    * Exclude in child template as well

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>

commit 3cd146c2908da6b3a06a0c1511bf042c4223af0f
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 16:41:39 2024 +0200

    fix: whoops remove debug

commit daa7291e813e62fed4600a4acb8430004836cb86
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 15:16:30 2024 +0200

    CSV preview add hyperlinks if "url" or "link" in column header

commit 5f2d6e65bad4f71b2c3cc75d2cdab76f15671d4c
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 15:16:01 2024 +0200

    blip2 processor to work w/ DMI Service Manager

commit fe881dec18778d99ac4a0f60ca40a1f43fdb1689
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 09:53:30 2024 +0200

    catch AttributeError on slackhook if unable to read file

    ever vigilant against a lack of flavour...

commit 2808256b1fabf2e6e8a5a94aad98af60c50fb7b0
Merge: 14123847 eb474640
Author: Dale Wahl <[email protected]>
Date:   Wed Apr 3 17:28:40 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 14123847b5852bf0e7c84fced6c2380165ec93f6
Author: Dale Wahl <[email protected]>
Date:   Wed Apr 3 17:28:38 2024 +0200

    staging_areas should not be made for completed datasets (else they may be deleted prematurely)

commit eb474640559ee3e914d9c95adb60be09b906f1d6
Merge: bbdf2ab9 3f8b285c
Author: sal-phd-desktop <[email protected]>
Date:   Wed Apr 3 16:50:54 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit bbdf2ab9b4292c14911ac01b481c829defa85e5c
Author: sal-phd-desktop <[email protected]>
Date:   Wed Apr 3 16:50:36 2024 +0200

    Helper script to export the 'classic' 4CAT 4chan data

commit 3f8b285c44c33a3ce08e885889b311bc454a70ea
Merge: 8f40f3f5 f7cc5b8d
Author: Sal Hagen <[email protected]>
Date:   Wed Apr 3 12:12:17 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 8f40f3f5222a63e93f46eb3b57791d10060a0cc8
Author: Sal Hagen <[email protected]>
Date:   Wed Apr 3 12:12:13 2024 +0200

    Tumblr search typo

commit f7cc5b8d012dec3d8e0c8847ae16c662e82040b5
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 2 12:32:51 2024 +0200

    More/less flavour in restart worker

commit 073587efc581adca0608988573ac83ea8b0c93d0
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 14:15:27 2024 +0100

    create favicon.ico (remove from repo)

    be sure to keep webtool/static/img/favicon/favicon-bw.ico as basis

commit 28d733d56204231f4089660ff61282174aac7aed
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 09:44:45 2024 +0100

    add allow_access_request check to request-password page

    clicking it would only return the user to the login page anyway, but better not even show it

commit 1f2cb77e3cb0fc9b5403da52aaa925b33089d18f
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 09:37:51 2024 +0100

    fix can_request_access to use 4cat.allow_access_request option

commit 0d66f11d3619af798d5acc41dbf4fe118b7ddad8
Merge: 25825383 05b3fc07
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 17:54:48 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 2582538303e31470ed6bf8a01645f7b45af15e5d
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 17:54:45 2024 +0100

    More permissive timeout for pixplot

commit 05b3fc0771ded10dc55db799e8f47e42add08d43
Author: Dale Wahl <[email protected]>
Date:   Tue Mar 26 14:01:59 2024 +0100

    remove redundant call of Path

commit e4a93442efb84d73d6a4c9af9bc46a8f3e3fdda2
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 11:52:09 2024 +0100

    Include column with link description in Telegram mapping

commit 876f4a4b6df51ec4b30a048c32191438b6778f90
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 25 14:48:47 2024 +0100

    douyin handle image posts

commit 81ad61baabaf965b1c848f55a80c23bd3e1a9000
Author: Stijn Peeters <[email protected]>
Date:   Mon Mar 25 08:01:44 2024 +0100

    Accept non-numeric IDs in Telegram image downloader

commit a8b36dc5682df7c16e25474ea8fdbfc4f12f9d46
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 23:15:51 2024 +0100

    Ensure unique IDs for Telegram datasets

commit 4a3e9ffee072c4d3efb7bfd8744369b46f19eef2
Merge: 0c119130 d749237e
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 22:56:59 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 0c11913049aabb5a83ffe26d58bdf17affdbc0b9
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:09:10 2024 +0100

    Better string formatting in Telegram image downloader

commit 8a7da5317defdafb5bdbf74dcbeb68e464fa21f4
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:06:06 2024 +0100

    Add 'link thumbnails' option to Telegram image downloader

commit a0baae17d8f11e4cae7cc261f8d406b1b1ce628a
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:24 2024 +0100

    Add 'Fetch URL metadata' processor

commit b9a0668f35c6d1fc5bfb42e1ae706418cbe6e0a7
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:15 2024 +0100

    Update ural dependency

commit a28036186f5d35e435cade7638ed35361054967e
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:08 2024 +0100

    Add emoji library dependency

commit bb50fc946fb6cdd8454969514bdc6d5ecf3f3530
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:04:59 2024 +0100

    Add 'emoji' option to Count Values processor

commit e653e3d8fb9c01697d96316df6f7634454671191
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:04:42 2024 +0100

    Add 'forwards', 'reactions', 'link_title', 'link_attached' columns to mapped Telegram items

commit d749237ec5c103b286ba8086904e405e232fc14c
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 11:02:14 2024 +0100

    telegram: sp too?

    this is why i test locally first...

commit 9d7d27c61425bbbbccd18a8e3de35ab372dbfbf3
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 10:58:48 2024 +0100

    telegram: missed reference to options

commit c1671ce0ef69c71c81c3ae69a59e4ad7dc1bda79
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 10:49:02 2024 +0100

    telegram fix: class dictionaries are shared between all workers

    admin calls get_options and `del options["max_posts"]["max"]` runs, then normal user calls get_options and there is no longer max. could also copy cls.options, but not sure why we cannot create the options in `get_options`.

commit cd2e74d251491a93bc66dc7a64e8b2a60b0ed8ae
Author: Stijn Peeters <[email protected]>
Date:   Wed Mar 20 11:10:30 2024 +0100

    Make Telegram max entities a setting

commit 38fcabb81da956e5513bd0246ee086d1ab4896c9
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:47:59 2024 +0100

    Make metrics table use BIGINT

    Folder size may not fit otherwise!

commit 34013cb91eed7fac725defd408b67bddee4b806b
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:37:10 2024 +0100

    Fix duplicate stats in metrics table

commit c8ad90b3436cff600320d3b2efdf6144240ea59d
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:14:39 2024 +0100

    Calculate disk use stats via worker instead of on demand

commit e4e0c4e3a375bf14bdca7b633231b60e34c322e0
Author: Stijn Peeters <[email protected]>
Date:   Thu Mar 14 10:25:23 2024 +0100

    Spelling thing

commit ae1c00fb3a521a2c3258b2597b04322d202c3ee7
Author: Stijn Peeters <[email protected]>
Date:   Thu Mar 14 10:25:10 2024 +0100

    Disable direct editing of tag order

commit e3ce81452ad8ee3231309383c24fb26e553b0dff
Merge: fa3be93b a7b5820c
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 13 16:25:46 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit fa3be93bafef17e95881207604efa1212d562d9e
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 13 16:25:43 2024 +0100

    instagram: check both user and owner for full_name

commit a7b5820c9f2acb5081ef80ea0293f42ee91925a3
Author: Dale Wahl <[email protected]>
Date:   Tue Mar 12 15:59:43 2024 +0100

    proposed fix to results filter (#417)

    * proposed fix to results filter

    * do not filter datasources at all for results/ view

commit b930b6e964b460ef5160398c6cd1038f766b0548
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 11 12:00:12 2024 +0100

    remove unused code

    the `can_preview` attribute does not appear to exist so this is always hidden

commit 97cd2d52966bd751da704a4a06cfa5478f999885
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 11 11:51:28 2024 +0100

    faster collection of folder size for admin panel

    was between five and six times faster in my tests around 11G of data files)

commit 108fd28b594a95b94727ccc601fec59da61a8d3d
Author: Dale Wahl <[email protected]>
Date:   Thu Mar 7 11:09:33 2024 +0100

    typo fixes, log fix

commit 44848a8f4b9fea07e7f9ce03e4fe0d696d5f1d27
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 6 10:17:34 2024 +0100

    fix tf_idf - sometimes less results than max

commit e5f1f703247a5763d3d0e03c44ee31ab60b8a8ed
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 6 09:33:21 2024 +0100

    fix image downloader failing on 4chan images

    we do not often rename datasources, but when we do...

commit f5e50d508096729bccdc0dafa460f83c419c2606
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 5 16:23:34 2024 +0100

    Version 1.39 -> 1.40

commit 4b3e4efa25914f5f9509f69596a82935440e5f9f
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:28:15 2024 +0100

    Add 'safe' parameter to get_item_data

commit b98f62ab6a3a21815cc0fa899cdca1d48eab0fdb
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:27:57 2024 +0100

    Use iterate_mapped_items in dataset view

commit 6d9baa9c228168dce7fe946681c95d471d45c6e0
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:27:33 2024 +0100

    Update TikTok downloader for new item mapper

commit 1622ec660754582eb2791f0d114df76e71640370
Author: Dale Wahl <[email protected]>
Date:   Mon Feb 26 12:51:31 2024 +0100

    flawless was removed from dataset class, but used by telegram

    adding back to fix telegram, but perhaps it should be changed

commit 84168e945e2ecf963cfdac3409d60544b521f694
Author: Dale Wahl <[email protected]>
Date:   Wed Feb 21 15:56:24 2024 +0100

    webtool checks for gunicorn and if exists sets up error log

    this normally only ran in Docker

commit 7119862feac1e9993b8dedccc59887830e7715a1
Author: Stijn Peeters <[email protected]>
Date:   Tue Feb 20 18:36:21 2024 +0100

    Use MappedItem in ML processors

commit 32b8790420af8572f4a3db2d2bc8ffd696872114
Author: Stijn Peeters <[email protected]>
Date:   Tue Feb 20 16:58:22 2024 +0100

    Map items to objects instead of dicts (#409)

    * Consistent parameter name for map_item()

    * Wrap mapped items in MappedItem() object

    * Keep track of import warnings in search.py

    * Add warning when mapping a tweet with missing metric data

    * Add new iterate_mapped_objects method

    * Log mapping warnings when merging datasets

    * Pass object instead of dict

    * Clarify Twitter warning

    * Documenting MappedItem

    * Explain things to myself

    * R…
commit 3f2a62a124926cfeb840796f104a702878ac10e5
Author: Carsten Schnober <[email protected]>
Date:   Wed Sep 18 18:18:29 2024 +0200

    Update Gensim to >=4.3.3, <4.4.0 (#450)

    * Update Gensim to >=4.3.3, <4.4.0

    * update nltk as well

    ---------

    Co-authored-by: Dale Wahl <[email protected]>
    Co-authored-by: Sal Hagen <[email protected]>

commit fee2c8c08617094f28496963da282d2e2dddeab7
Merge: 3d94b666 f8e93eda
Author: sal-phd-desktop <[email protected]>
Date:   Wed Sep 18 18:11:19 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 3d94b666cedd0de4e0bee953cbf1d787fdc38854
Author: sal-phd-desktop <[email protected]>
Date:   Wed Sep 18 18:11:04 2024 +0200

    FINALLY remove 'News' from the front page, replace with 4CAT BlueSky updates and potential information about the specific server (to be set on config page)

commit f8e93edabe9013a2c1229caa4c454fab09620125
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 15:11:21 2024 +0200

    Simple extensions page in Control Panel

commit b5be128c7b8682fb233d962326d9118a61053165
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:08:13 2024 +0200

    Remove 'docs' directory

commit 1e2010af44817016c274c9ec9f7f9971deb57f66
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:07:38 2024 +0200

    Forgot TikTok and Douyin

commit c757dd51884e7ec9cf62ca1726feacab4b2283b7
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:01:31 2024 +0200

    Say 'zeeschuimer' instead of 'extension' to avoid confusion with 4CAT extensions

commit ee7f4345478f923541536c86a5b06246deae03f6
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 14:00:40 2024 +0200

    RIP Parler data source

commit 11300f2430b51887823b280405de4ded4f15ede1
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 11:21:37 2024 +0200

    Tuplestring

commit 547265240eba81ca0ad270cd3c536a2b1dcf512d
Author: Stijn Peeters <[email protected]>
Date:   Wed Sep 18 11:15:29 2024 +0200

    Pass user obj instead of str to ConfigWrapper in Processor

commit b21866d7900b5d20ed6ce61ee9aff50f3c0df910
Author: Stijn Peeters <[email protected]>
Date:   Tue Sep 17 17:45:01 2024 +0200

    Ensure request-aware config reader in user object when using config wrapper

commit bbe79e4b0fe870ccc36cab7bfe7963b28d1948e3
Author: Sal Hagen <[email protected]>
Date:   Tue Sep 17 15:12:46 2024 +0200

    Fix extension path walk for Windows

commit d6064beaf31a6a85b0e34ed4f8126eb4c4fc07e3
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 16 14:50:45 2024 +0200

    Allow tags that have no users

    Use case: tag-based frontend differentiation using X-4CAT-Config-Via-Proxy

commit b542ded6f976809ec88445e7b04f2c81b900188e
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 16 14:13:14 2024 +0200

    Trailing slash in query results list

commit a4bddae575b22a009925206a1337bdd89349e567
Author: Dale Wahl <[email protected]>
Date:   Mon Sep 16 13:57:23 2024 +0200

    4CAT Extension - easy(ier) adding of new datasources/processors that can be mainted seperately from 4CAT base code (#451)

    * domain only

    * fix reference

    * try and collect links with selenium

    * update column_filter to find multiple matches

    * fix up the normal url_scraper datasource

    * ensure all selenium links are strings for join

    * change output of url_scraper to ndjson with map_items

    * missed key/index change

    * update web archive to use json and map to 4CAT

    * fix no text found

    * and none on scraped_links

    * check key first

    * fix up web_archive error reporting

    * handle None type for error

    * record web archive "bad request"

    * add wait after redirect movement

    * increase waittime for redirects

    * add processor for trackers

    * dict to list for addition

    * allow both newline and comma seperated links

    * attempt to scrape iframes as seperate pages

    * Fixes for selenium scraper to work with config database

    * installation of packages, geckodriver, and firefox if selenium enabled

    * update install instructions

    * fix merge error

    * fix dropped function

    * have to be kidding me

    * add note; setup requires docker... need to think about IF this will ever
    be installed without Docker

    * seperate selenium class into wrapper and Search class so wrapper can be
    used in processors!

    * add screenshots; add firefox extension support

    * update selenium definitions

    * regex for extracting urls from strings

    * screenshots processor; extract urls from text and takes screenshots

    * Allow producing zip files from data sources

    * import time

    * pick better default

    * test screenshot datasource

    * validate all params

    * fix enable extension

    * haha break out of while loop

    * count my items

    * whoops, len() is important here

    * must be getting tired...

    * remove redundant logging

    * Eager loading for screenshots, viewport options, etc

    * Woops, wrong folder

    * Fix label shortening

    * Just 'queue' instead of 'search queue'

    * Yeah, make it headless

    * README -> DESCRIPTION

    * h1 -> h2

    * Actually just have no header

    * Use proper filename for downloaded files

    * Configure whether to offer pseudonymisation etc

    * Tweak descriptions

    * fix log missing data

    * add columns to post_topic_matrix

    * fix breadcrumb bug

    * Add top topics column

    * Fix selenium config install parameter (Docker uses this/manual would
    need to run install_selenium, well, manually)

    * this processor is slow; i thought it was broken long before it updated!

    * refactor detect_trackers as conversion processor not filter

    * add geckodriver executable to docker install

    * Auto-configure webdrivers if available in PATH

    * update screenshots to act as image-downloader and benefit from processors

    * fix is_compatible_with

    * Delete helper-scripts/migrate/migrate-1.30-1.31.py

    * fix embeddings is_compatible_with

    * fix up UI options for hashing and private

    * abstract was moved to lib

    * various fixes to selenium based datasources

    * processors not compatible with image datasets

    * update firefox extension handling

    * screenshots datasource fix get_options

    * rename screenshots processor to be detected as image dataset

    * add monthly and weekly frequencies to wayback machine datasource

    * wayback ds: fix fail if all attempts do not realize results; addion frequency options to options; add daily

    * add scroll down page to allow lazy loading for entire page screenshots

    * screenshots: adjust pause time so it can be used to force a wait for images to load

    I have not successfully come up with or found a way to wait for all images to load; document.readyState == 'complete' does not function in this way on certain sites including the wayback machine

    * hash URLs to create filenames

    * remove log

    * add setting to toggle display advanced options

    * add progress bars

    * web archive fix query validation

    * count subpages in progress

    * remove overwritten function

    * move http response to own column

    * special filenames

    * add timestamps to all screenshots

    * restart selenium on failure

    * new build have selenium

    * process urls after start (keep original query parameters)

    * undo default firefox

    * quick max

    * rename SeleniumScraper to SeleniumSearch

    todo: build SeleniumProcessor!

    * max number screenshots configurable

    * method to get url with error handling

    * use get_with_error_handling

    * d'oh, screenshot processor needs to quit selenium

    * update log to contain URL

    * Update scrolling to use Page down key if necessary

    * improve logs

    * update image_category_wall as screenshot datasource does not have category column; this is not ideal and ought to be solved in another way.

    Also, could I get categories from the metadata? That's... ugh.

    * no category, no processor

    * str errors

    * screenshots: dismiss alerts when checking ready state is complete

    * set screenshot timeout to 30 seconds

    * update gensim package

    * screenshots: move processor interrupt into attempts loop

    * if alert disappears before we can dismiss it...

    * selenium specific logger

    * do not switch window when no alert found on dismiss

    * extract wait for page to load to selenium class

    * improve descriptions of screenshot options

    * remove unused line

    * treat timeouts differently from other errors

    these are more likely due to an issue with the website in question

    * debug if requested

    * increase pause time

    * restart browser w/ PID

    * increase max_workers for selenium

    this is by individual worker class not for all selenium classes... so you can really crank them out if desired

    * quick fix restart by pid

    * avoid bad urls

    * missing bracket & attempt to fix-missing dependencies in Docker install

    * Allow dynamic form options in processors

    * Allow 'requires' on data source options as well

    * Handle list values with requires

    * basic processor for apple store; setup checks for additional requirements

    * fix is_4cat_class

    * show preview when no map_item

    * add google store datasource

    * Docker setup.py use extensions

    * Wider support for file upload in processors

    * Log file uploads in DMI service manager

    * add map_item methods and record more data per item

    need additional item data as map_item is staticmethod

    * update from master; merge conflicts

    * fix docker build context (ignore data files)

    * fix option requirements

    * apple store fix: list still tries to get query

    * apple & google stores fix up item mapping

    * missed merge error

    * minor fix

    * remove unused import

    * fix datasources w/ files frontend error

    * fix error w/ datasources having file option

    * better way to name docker volumes

    * update two other docker compose files

    * fix docker-compose ymls

    * minor bug: fix and add warning; fix no results fail

    * update apple field names to better match interface

    * update google store fieldnames and order

    * sneak in jinja logger if needed

    * fix fourcat.js handling checkboxes for dynamic settings

    * add new endpoint for app details to apple store

    * apple_store map new beta app data

    * add default lang/country

    * not all apps have advisories

    * revert so button works

    * add chart positions to beta map items

    * basic scheduler

    To-do
    - fix up and add options to scheduler view (e.g. delete/change)
    - add scheduler view to navigator
    - tie jobs to datasets? (either in scheduler view or, perhaps, filter dataset view)
    - more testing...

    * update scheduler view, add functions to update job interval

    * revert .env

    * working scheduler!

    * basic scheduler view w/ datasets

    * fix postgres tag

    * update job status in scheduled_jobs table

    * fix timestamp; end_date needed for last run check; add dataset label

    * improve scheduler view

    * remove dataset from scheduled_jobs table on delete

    * scheduler view order by last creation

    * scheduler views: separate scheduler list from scheduled dataset list

    * additional update from master fixes

    * apple_store map_items fix missing locales

    * add back depth for pagination

    * correct route

    * modify pagination to accept args

    * pagination fun

    * pagination: i hate testing on live servers...

    * ok ok need the pagination route

    * pagination: add route_args

    * fix up scheduler header

    * improve app store descriptions

    * add azure store

    * fix azure links

    * azure_store: add category search

    * azure fix type of config update timestamp

    OPTION_DATE does not appear correctly in settings and causes it to be written incorrectly

    * basic aws store

    * check if selenium available; get correct app_id

    * aws: implement pagination

    * add logging; wait for elements to load after next page; attempts to rework filter option collection

    * apple_store: handle invalid param error

    * fix filter_options

    * aws: fix filter option collection!

    * more merge

    * move new datasources and processors to extensions and modify setup.py and module loader to use the new locations

    * migrate.py to run extension "fourcat_install.py" files

    * formatting

    * remove extensions; add gitignore

    * excise scheduler merge

    * some additional cleanup from app_studies branch

    * allow nested datasources folders; ignore files in extensions main folder

    * allow extension install scripts to run pip if migrate.py has not

    * Remove unused URL functions we could use ural for

    * Take care of git commit hash tracking for extension processors

    * Get rid of unused path.versionfile config setting

    * Add extensions README

    * Squashed commit of the following:

    commit cd356f7a69d15e8ecc8efffc6d63a16368e62962
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 17:36:18 2024 +0200

        UI setting for 4CAT install ad in login

    commit 0945d8c0a11803a6bb411f15099d50fea25f10ab
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 17:32:55 2024 +0200

        UI setting for anonymisation controls

        Todo: make per-datasource

    commit 1a2562c2f9a368dbe0fc03264fb387e44313213b
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 15:53:27 2024 +0200

        Debug panel for HTTP headers in control panel

    commit 203314ec83fb631d985926a0b5c5c440cfaba9aa
    Author: Stijn Peeters <[email protected]>
    Date:   Sat Sep 14 15:53:17 2024 +0200

        Preview for HTML datasets

    commit 48c20c2ebac382bd41b92da4481ff7d832dc1538
    Author: Desktop Sal <[email protected]>
    Date:   Wed Sep 11 13:54:23 2024 +0200

        Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies

    commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3
    Author: Dale Wahl <[email protected]>
    Date:   Fri Sep 6 16:29:19 2024 +0200

        fix nltk where it matters

    commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d
    Author: Stijn Peeters <[email protected]>
    Date:   Tue Sep 3 12:05:14 2024 +0200

        Actually check progress in text annotator

    commit 693960f41b73e39eda0c2f23eb361c18bde632cd
    Author: Stijn Peeters <[email protected]>
    Date:   Mon Sep 2 18:03:18 2024 +0200

        Add processor for stormtrooper DMI service

    commit 6ae964aad492527bc5d016a00f870145aab6e1af
    Author: Stijn Peeters <[email protected]>
    Date:   Fri Aug 30 17:31:37 2024 +0200

        Fix reference to old stopwords list in neologisms preset

    * Fix Github links for extensions

    * Fix commit detection in extensions

    * Fix extension detection in module loader

    * Follow symlinks when loading extensions

    Probably not uncommon to have a checked out repo somewhere to then symlink into the extensions dir

    * Make queue message on create page more generic

    * Markdown in datasource option tooltips

    * Remove Spacy model from requirements

    * Add software_source to database SQL

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>
    Co-authored-by: Stijn Peeters <[email protected]>

commit cd356f7a69d15e8ecc8efffc6d63a16368e62962
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 17:36:18 2024 +0200

    UI setting for 4CAT install ad in login

commit 0945d8c0a11803a6bb411f15099d50fea25f10ab
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 17:32:55 2024 +0200

    UI setting for anonymisation controls

    Todo: make per-datasource

commit 1a2562c2f9a368dbe0fc03264fb387e44313213b
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 15:53:27 2024 +0200

    Debug panel for HTTP headers in control panel

commit 203314ec83fb631d985926a0b5c5c440cfaba9aa
Author: Stijn Peeters <[email protected]>
Date:   Sat Sep 14 15:53:17 2024 +0200

    Preview for HTML datasets

commit 48c20c2ebac382bd41b92da4481ff7d832dc1538
Author: Desktop Sal <[email protected]>
Date:   Wed Sep 11 13:54:23 2024 +0200

    Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies

commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3
Author: Dale Wahl <[email protected]>
Date:   Fri Sep 6 16:29:19 2024 +0200

    fix nltk where it matters

commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d
Author: Stijn Peeters <[email protected]>
Date:   Tue Sep 3 12:05:14 2024 +0200

    Actually check progress in text annotator

commit 693960f41b73e39eda0c2f23eb361c18bde632cd
Author: Stijn Peeters <[email protected]>
Date:   Mon Sep 2 18:03:18 2024 +0200

    Add processor for stormtrooper DMI service

commit 6ae964aad492527bc5d016a00f870145aab6e1af
Author: Stijn Peeters <[email protected]>
Date:   Fri Aug 30 17:31:37 2024 +0200

    Fix reference to old stopwords list in neologisms preset

commit 4ba872bef2968f7f8bf5831fd3a4f413420b36ed
Author: Dale Wahl <[email protected]>
Date:   Tue Aug 27 13:04:46 2024 +0200

    fix hatebase: default column option for OPTION_MULTI_SELECT must be list

commit e276033542f2d22e7f614f318a01d65114a21482
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Wed Aug 21 12:53:10 2024 +0200

    Bump nltk from 3.6.7 to 3.9 (#447)

    Bumps [nltk](https://github.com/nltk/nltk) from 3.6.7 to 3.9.
    - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
    - [Commits](https://github.com/nltk/nltk/compare/3.6.7...3.9)

    ---
    updated-dependencies:
    - dependency-name: nltk
      dependency-type: direct:production
    ...

    Signed-off-by: dependabot[bot] <[email protected]>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 1d749c3cf83b130ba70bdb09174f382d6711a14b
Author: sal-phd-desktop <[email protected]>
Date:   Wed Aug 21 12:52:54 2024 +0200

    Set UTF-8 encoding when opening stop words (fixes Windows bug)

commit a03e5fd4252e7242563c291558606440256eb3d1
Author: Dale Wahl <[email protected]>
Date:   Mon Aug 19 14:19:21 2024 +0200

    remove duplicate line

commit aa07e8c13c2d59c6b699f78133036514659ee420
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 29 09:35:22 2024 +0200

    tweet import fix: author banner key missing when author has no banner

commit 32dac5d2ffb936210f12f5c725514fd25a0286f1
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 29 08:52:08 2024 +0200

    tell user when dataset is not found

    we could have a proper 404 page, but at least leave a message

commit 2c8c860fc5378113d1352016ac26ca761adecb32
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 22 17:41:00 2024 +0200

    telegram fix: reactions datastructure

commit 1c0bf5e580eb16d8a6f9afa415f9febce449a537
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 22 11:19:52 2024 +0200

    fix telegram: crawl_max_depth can be None if it is not enabled for a user

commit 3dfe7af292b33574a31630e3a0da10954ed87d0a
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 19 11:52:31 2024 +0200

    fix more config.get() magic

commit 2453182bcee6e54b396b762ab77b60b8a0893638
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 19 10:54:23 2024 +0200

    config_manager - fix `get_all` w/ one results (super rare edge); fix overwriting self.db in `with_db`

commit 6b9cb0b5479e6e64e09a49fa2ca9effe1c5a7415
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 15:20:49 2024 +0200

    add surf nginx init file

commit 5e984e13a08d9fba7d5806a7ef4e012ce7d57319
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 14:30:34 2024 +0200

    change port for surf

commit 2ce8c354e90f939a16dad3f0155fd7d79405c79e
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 12:54:11 2024 +0200

    use latest image on surf

commit 13ec0fd3f2bed86c3b2dff73014093a6a92fbfb5
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 12:46:59 2024 +0200

    update surf docker-compose.yml

     this may require a new release

commit 78698f6ac1b22b1154d31f69543ba7b266d33191
Author: Dale Wahl <[email protected]>
Date:   Wed Jul 17 10:34:56 2024 +0200

    clip: handle new and old format

commit eb7693780cb191403f107817ca30d90373929bf0
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 16 14:27:08 2024 +0200

    DMI SM updates to use status endpoint w/ database records; run on CPU if no GPU enabled

commit d2a787e2c1559417bb5401f3208c82954052504f
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 15:58:06 2024 +0200

    Require most recent Telethon version

commit 346150bd9cc96ac099abd4d15fa3de39bd65e9d1
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 15:57:55 2024 +0200

    Catch UPDATE_APP_TO_LOGIN in Telegram

commit 04acc06e95098d7e2f9b4af404447c9cfaee5b99
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 15 11:27:30 2024 +0200

    Unbreak Twitter error handling

commit e9b5232a963be02c2e86dabacb607b2315a4e0e6
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 13:27:15 2024 +0200

    Ensure str type when trying to extract video URLs from a field

commit d69dd6f337cac05ed31c05334890679976a1e6de
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 12:31:14 2024 +0200

    Make CSV column mapping params look nicer on result page

commit 9bd9da568f593085a8d54744836e3290a75b51a7
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 12 12:22:03 2024 +0200

    Add "empty" and "current timestamp" as options to CSV mapping

commit 0b574571952a206904440faf8601ddf95ab42b24
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 11 16:59:56 2024 +0200

    image_wall: backup fit method

commit eeb1ddeb7ca85b6802dfed3c74d1352062383d50
Merge: 2504c37b 43239467
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 11 16:47:45 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 43239467db046eea5eb5268f91d1b63a1042238d
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 11 12:08:08 2024 +0200

    fix processor more button

    would only show top level analysis if not logged in

commit d6ab2b0783f8e40ecd8fadbc2abccffa6f093e39
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 9 15:35:25 2024 +0200

    search_gab - use MappedItem

commit 2504c37b67ff6f19720b44d8bb6054b1c3d5a155
Author: Stijn Peeters <[email protected]>
Date:   Sat Jul 6 17:51:22 2024 +0200

    Fix multiline spacing in multi select list

commit fea66ce38be0717da6c1f847e7124f7069c096e2
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 13:15:45 2024 +0200

    use processor media_type if dataset does not have media_type; set default media_type for downloaders

commit d41fa34514e8177efdac7e64a31f2ee75c7d1652
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:57:18 2024 +0200

    video_hasher: handle no metadata file

commit 2820dcecc36ed4705a2776064d387ff7ed14e84f
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:50:09 2024 +0200

    num_rows not num_items()

commit fb09162db902fa22fdf2d7a3ed171ce1489bd92f
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:44:03 2024 +0200

    Google vision API returning 400s; properly log and record processed entries; google networks should not run on empty datasets

commit ebf39d8262d199895aedc4f7fa275c5685e58563
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 12:28:13 2024 +0200

    fix image_category_wall

    whoops, cleared categories and post_values after filling them!

commit 1ad9ec2c2e76604793ec37584c051f116af2fdab
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 12:03:54 2024 +0200

    fsdfdsgd sorry

commit c7254c08a477c6cdc8497507e8452c3eff7101c9
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 12:01:21 2024 +0200

    Fix razdel versioning

commit b9a327abe99f2d9ede4f2747f34f20d1dc6803cb
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 11:57:47 2024 +0200

    Reorganise tokeniser, stopwords

commit fb13bc483af9ba0d677ee35fd045bf36ab1cddf7
Merge: 0b745692 e3046496
Author: Stijn Peeters <[email protected]>
Date:   Fri Jul 5 11:56:08 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit e30464964262870c54c73f65a3bce630d6576f45
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 10:51:53 2024 +0200

    media_upload allow setting for max_form_part and warn users of failure above certain number of files

commit e4f982b4550b352a5d1a131abd78d52e6c196e48
Author: Dale Wahl <[email protected]>
Date:   Fri Jul 5 09:50:49 2024 +0200

    Update media_import help text; looks like failure happens somewhere between 600-1000 files due to Flask request size limits

commit 0b74569280f8f87376a964a6b160ea1993cb3354
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 4 17:55:36 2024 +0200

    Add razdel as option for Russian tokenisation

commit 9f15a2b8d666c3b6fddeb151b7c424cb44df18a6
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 17:13:15 2024 +0200

    remove the log

commit ffcb6a4239075ba190fb534b25b89507e09e5f56
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 17:12:43 2024 +0200

    Inform user if too many files are uploaded

    I do not understand why this is appearing. app.config['MAX_CONTENT_LENGTH'] is set to None. Problem persists in Flask alone (i.e., does not appear to be Gunicorn/Nginx/Apache).

commit 9cad12dd6f64a63c48d3b5b304b5c7d9d1a6ddb7
Author: Stijn Peeters <[email protected]>
Date:   Thu Jul 4 15:09:42 2024 +0200

    Bump version

commit aad94f393de77cc9d4f578e1f5be66a3601a4c90
Author: Dale Wahl <[email protected]>
Date:   Thu Jul 4 10:51:01 2024 +0200

    Update setup.py to ensure videohash updates

commit d9154a6f9c46a5c793909b88da751bc71d6f759f
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:45:26 2024 +0200

    clip: categorizing requires categories...

    seriously, guys?

commit 0af9a5ec49bd2bcfbb87bda33976c65683f68777
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:31:49 2024 +0200

    blip2: fix no metadata file found (uploads...)

commit d695053f440bd938a57f06adea7b9c732ecf30d7
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 17:25:26 2024 +0200

    cat_vis_wall - use str as category type if mixed

    i.e., use floats as string categories

commit bcb914076760ea1fb0e277cdcd1782ffa101b535
Author: Sal Hagen <[email protected]>
Date:   Tue Jul 2 16:06:43 2024 +0200

    Add Twitter author profile pic and banner URLs

commit 1b3b02f826578e8f702ea84a27c8ced7b1fab345
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:42:50 2024 +0200

    add migrate.py log file in Docker

commit 2aaa972e6888743fc329d721c37fa626cf2eeae3
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:42:22 2024 +0200

    add necessary pip packages for upgrade in Docker environment; add error logging and save to file for trouble shooting

commit 18b8a53c01b334e0f70610b1305d380b25dbe9c6
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:41:36 2024 +0200

    update Dockerfile to keep build environment

    useful for interactive upgrade

commit 7b224b9b798c9aaf956b5b618b98d742c4a2e7cd
Author: Dale Wahl <[email protected]>
Date:   Tue Jul 2 11:41:12 2024 +0200

    remove docker-compose.yml versions

commit acf5de0ed02e144b920a80abfdfa35986dd0ed4c
Author: Stijn Peeters <[email protected]>
Date:   Mon Jul 1 17:38:32 2024 +0200

    Better issues.md, footer link

commit 1953ca3895656ca9a12d2657e58019795ae64b3a
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 1 12:00:07 2024 +0200

    FIX: get_key() is more of a creating of a key then general getting of a key...

commit 12289bb5c766d1af23799ff11278b46b48fc2841
Author: Dale Wahl <[email protected]>
Date:   Mon Jul 1 11:37:06 2024 +0200

    .metadata.json may not have top_parent via Media Uploader

    This may exist in other processors if a proper check is not in place; will need to review

commit 25f4ed65ec2c32298a90490cf51037a7ea2d0bf9
Author: Dale Wahl <[email protected]>
Date:   Tue Jun 25 14:43:40 2024 +0200

    Media upload datasource! (#419)

    * basic changes to allow files box

    * basic imports, yay!

    * video_scene_timelines to work on video imports!

    * add is_compatible_with checks to processors that cannot run on new media top_datasets

    * more is_compatible fixes

    * necessary function for checking media_types

    * enable more processors on media datasets

    * consolidate user_input file type

    * detect mimetype from filename

    best I can do without downloading all the files first.

    * handle zip archives; allow log and metadata files

    * do not count metadata or log files in num_files

    * move machine learning processors so they can be imported elsewhere

    * audio_to_text datasource

    * When validating zip file uploads, send list of file attributes instead of the first 128K of the zip file

    * Check type of files in zip when uploading media

    * Skip useless files when uploading media as zip

    * check multiple zip types in JS

    * js !=== python

    * fix media_type for loose file imports; fix extension for audio_to_text preset; fix merge for some processors w/ media_type

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>

commit 4ce689bdc3e441a7adf85883ddcda6bae0525ed9
Author: Stijn Peeters <[email protected]>
Date:   Mon Jun 24 11:58:50 2024 +0200

    Avoid KeyError

commit 155522d0817d19ac7b6b0b0164242156d6f7443a
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 20 15:58:21 2024 +0200

    add generated images to image wall w/ text visual

commit eecde519eab1208eeb6ee53c2d8febff7fb8febf
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 20 15:57:56 2024 +0200

    allow users to NOT generate all images from prompts

commit d0b9574093a109997e63b1062b2bdd8e71300a29
Author: Stijn Peeters <[email protected]>
Date:   Wed Jun 19 16:28:26 2024 +0200

    ...don't mangle URLs in preview links

commit c105e368a521ec54ae717bb9eb2fe9fae66cf6e8
Merge: 0028a999 8d4f99b2
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 16:25:36 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 0028a9994d698611dd8b546b9b3bccbeec30b74f
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 16:25:12 2024 +0200

    add followups to processors

commit 8d4f99b22e0308606c7f713ef704dfa939e85247
Author: Stijn Peeters <[email protected]>
Date:   Wed Jun 19 16:17:22 2024 +0200

    More flexible URL linking in CSV preview

commit f4f8e6621bd6f2504dc3afc2078280bf5edb6444
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 19 13:54:00 2024 +0200

    tokeniser fix: use default lang for word_tokenize if language is 'other'

commit 127472e91d8e510f3de2a9cc4a87be6cf2d0deaa
Author: Stijn Peeters <[email protected]>
Date:   Tue Jun 18 16:45:01 2024 +0200

    Better log messages for Telegram data source

commit e8714b6fba72e00c690a8d643d8dc54d2250c94a
Author: Stijn Peeters <[email protected]>
Date:   Mon Jun 17 17:42:21 2024 +0200

    Add 'crawl' feature to Telegram data source

    Fixes #321 (though might need a bit more testing)

commit 25fded7b596097f7916e1793f1841bae2b63d453
Merge: d67cf440 b10e3bb8
Author: sal-phd-desktop <[email protected]>
Date:   Fri Jun 14 16:23:02 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit d67cf440730ea1d4e124c76a4c21d65b56f39c68
Author: sal-phd-desktop <[email protected]>
Date:   Fri Jun 14 16:22:59 2024 +0200

    Fix export 4chan script and remove some unecessary code

commit b10e3bb8f0c8a67aa5fdbba1962301d8acdf625c
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 15:14:06 2024 +0200

    video_hasher prefix: fix extension type

commit ba565cdaa2ebeecf23fd60889d546c76b9ea5eb1
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 14:53:13 2024 +0200

    video_hasher: fix to work with Pillow updates; add max amount videos

commit 90da5d231eff6a4249bef5468fcdbf1ebcf9247a
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 13 10:25:24 2024 +0200

    image_cat_wall fix the fix

commit a8b943d8e2c5471f82ea0442e2659d84fe8d9760
Author: Dale Wahl <[email protected]>
Date:   Wed Jun 12 13:29:41 2024 +0200

    add OCR processor to image w/ text visualization

commit e7e636b6b89b6163fa6976e67edba68e7d75b7ac
Author: Dale Wahl <[email protected]>
Date:   Tue Jun 11 15:23:12 2024 +0200

    add image_wall_w_text to follow on BLIP captions

commit f74b97827f0465baf8483040471a77e4654e70b1
Author: Dale Wahl <[email protected]>
Date:   Thu Jun 6 11:05:25 2024 +0200

    image_category_wall: allow multiple images per item/post

commit e3c9ea57d46b32ba47b00a6047a278ddd530adc1
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 16:27:50 2024 +0200

    image_category_wall convert None to str for category

commit 00874576c354235f4655f1d433ec4382010e18e3
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 14:54:51 2024 +0200

    image_category_wall fix float categories

commit e0c55a8ae132bedef5da27ecbbb9489a094d454c
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 12:51:42 2024 +0200

    download_images fix divide by zero when user can download all

commit 3580fc9450501262badb8e61ef4b4df4b4c54322
Author: Dale Wahl <[email protected]>
Date:   Thu May 30 12:51:24 2024 +0200

    image_category_wall remove 'max' when user can use all images

commit f2145bdeff1d68e46cdd3521ecbb61573f01a2f2
Author: Dale Wahl <[email protected]>
Date:   Wed May 29 17:59:23 2024 +0200

    rank_attributes: option to count missing data or blanks

commit 01e7ab9677a75181bbedc62fa00e636ce2b17c18
Author: Dale Wahl <[email protected]>
Date:   Wed May 29 16:53:57 2024 +0200

    fix missing field strategy so default_stategy not overwritten on second loop

    default_stategy would be set to correctly to the callable, but overwritten on second loop (and map_missing is a dictionary at that point).

commit 097f838af1f5f2748578dd9072eb9e3a8b3a7057
Author: Dale Wahl <[email protected]>
Date:   Tue May 28 12:16:08 2024 +0200

    add log_level arg to 4cat-daemon.py

    I've been using this forever and don't know why I haven't commited it

commit fd3ac238e60f052889d99c71588170570a384900
Author: Dale Wahl <[email protected]>
Date:   Tue May 28 10:10:56 2024 +0200

    google & clarifai to csv had identical "type"

    possibly caused issue w/ preset

commit 1b9965d40aa33035a73f685c13a1ab50cc877f78
Author: Stijn Peeters <[email protected]>
Date:   Mon May 27 15:54:20 2024 +0200

    Ensure file cleanup worker always exists

commit 0e0917f2232e240df3412fd4df51cf0be19248b5
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:36:22 2024 +0200

    Also update Spacy model versions...

commit f40128213529d154cfb77afa7aa67a72d5bb640f
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:32:35 2024 +0200

    *Actually* remove typing_extensions dependency

    ???

commit ba3d83b824c5fb6fcb0aec5e1c36b35070d6e5d9
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:30:08 2024 +0200

    Update minimum Pillow dependency version

commit 1c3485648bf2a911052eeeae4f293f303a944aec
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:27:27 2024 +0200

    Do not require typing_extensions explicitly

    This was required to ensure Spacy could load - looks like Spacy has since been updated to work with newer versions of typing_extensions as well

commit 3828de83ba123254463a904392f24daec626c136
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:02:04 2024 +0200

    Bump version

commit 8f0d098107a4bbc9d55cc6048f7a38f1d1891a32
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 17:01:28 2024 +0200

    Require non-broken version of emoji library

commit 4b2ad805fcc99a83e46732fc991d98d78ef06c6c
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 13:11:03 2024 +0200

    Show worker progress in control panel if available

commit 9144d4503f46108437616d6bc0cf4fde74df3aca
Author: Stijn Peeters <[email protected]>
Date:   Thu May 23 11:07:41 2024 +0200

    Bump version

commit 807ab77101d197ec897640480a2140439d570c05
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 21:57:11 2024 +0200

    Fix Instagram upload with missing media URL

commit d0b4840fd465b6d21657c3d50f9291ac911b6082
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:35:04 2024 +0200

    Comma comma comma

commit 7fd2e14c9505d0ed1ac77dc09c24f766ea61ee6c
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:25:26 2024 +0200

    Fix progress indicator for scene extractor

commit 661c42c2d083da7004335b0e14910935c3d392f6
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:12:21 2024 +0200

    Don't crash video hasher non non-str item IDs

commit 1f280321cdde27a9909885fa2f64dbeffa549fb1
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:09:53 2024 +0200

    Do not crash timelines processor when metadata has unexpected format

commit 572d03f1f368f0ad5f47e705a119b37646148d1d
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:09:30 2024 +0200

    More efficient video frame extractor

commit 1b51d224ca544d7e2913238adbff2049412bc41e
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:04:27 2024 +0200

    Fix crash in video stack processor with ffmpeg < 5.1

commit ddc73cb2e2f0985e64f84ca86bc167fa9e9dc81a
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 17:03:48 2024 +0200

    Helper function for determining ffmpeg version

commit ef9dd482b2258c428584997dc661156f63f68b91
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 12:14:58 2024 +0200

    Allow absence of articleComponent in LinkedIn posts

commit 060f2cd7f922e7fae337b0697f7c477442d21ef1
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 12:12:54 2024 +0200

    Cast post IDs to string when mapping video scenes

commit ab34c415c9ada23763b45676639ce3e80a34f594
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 11:46:39 2024 +0200

    Twitter -> X/Twitter

commit de6d97554ccb68375979e5ff09c7e65d8d70a6cd
Author: Stijn Peeters <[email protected]>
Date:   Wed May 22 11:45:19 2024 +0200

    Colleges -> Collages

commit 30365580dc59b4d95e8a62d1b3c666bef60ce7e8
Author: Stijn Peeters <[email protected]>
Date:   Tue May 21 15:41:55 2024 +0200

    Explicit disconnect after Telegram image download

commit 5727ff7230db42463a824f45d63f0b8343caac14
Author: Stijn Peeters <[email protected]>
Date:   Tue May 21 14:05:50 2024 +0200

    Catch TimedOutError while downloading Telegram images

commit e0e06686e78976f971aac620267d7e009eaaadff
Author: Sal Hagen <[email protected]>
Date:   Mon May 13 13:01:42 2024 +0200

    Typo in LinkedIn search

commit 51e58dde6ca21278a80f252a8c22dc83d87ace1f
Author: Dale Wahl <[email protected]>
Date:   Tue May 7 13:10:43 2024 +0200

    text_from_image: fix metadata missing (indent issue)

commit c1f8ecc1674375bba2b2e38cb29c9d4d44098f0a
Author: Dale Wahl <[email protected]>
Date:   Tue May 7 09:45:25 2024 +0200

    text_from_image fix: ensure metadata success before attempting to update original

commit 72dbf80db71499c59133e1128205b756d240b300
Merge: d7561625 baacc86b
Author: Stijn Peeters <[email protected]>
Date:   Fri May 3 13:14:08 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit d7561625b127573fbb0332fbb713be6a3cb3d953
Author: Stijn Peeters <[email protected]>
Date:   Fri May 3 13:14:03 2024 +0200

    Comments without replies don't always have reply_comment_total

commit baacc86b269612b4b0956345f8b9fa902df1b61f
Author: Dale Wahl <[email protected]>
Date:   Fri May 3 12:01:22 2024 +0200

    DSM fix and simplify GPU mem check

commit 9b662e9f9b4f4ce194608c8e20a8fc50bc6d9ae3
Author: Parker-Kasiewicz <[email protected]>
Date:   Thu May 2 00:53:45 2024 -0700

    Adding Gab as a Data Source! (#401)

    * Can successfully import gab data, although
    can't tell if formatting is right becuase
    waiting on queued requests.

    * Version w/ different item types

    * Ingest Gab posts from Zeeschuimer

    * Small fix for merge conflicts (whoops)

    * Gab processing logic transferred from Zeeschuimer

    * fixing small errors for Gab data source

    * basic processing for truth social from Zeeschuimer

    ---------

    Co-authored-by: Dale Wahl <[email protected]>

commit 3ecb8fd9c27aee4c457f03516794c6c4eac19c09
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:51:36 2024 +0200

    Fix duplicate line in views_admin.py

commit 8b66ae7e467913f8e7571cf4b45493f63804266f
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:49:54 2024 +0200

    Allow processors to define which fields should be pseudonymised

commit c973750c8cabb8698704c5997903e92d1de866d2
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:15:32 2024 +0200

    Allow auto-queue of pseudonymisation after import

commit 49ad9f0ff785fd44ae494755b785c7fdf7c9cf15
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 17:08:35 2024 +0200

    Get rid of redundant and buggy next/copy_to implementation in Search class

commit 106d3659e2fda89867d3a4f587c1c1addfaff2f7
Author: Dale Wahl <[email protected]>
Date:   Wed May 1 16:14:03 2024 +0200

    use current branch in settings

commit 60bef4157d807f7c01ef3b425295244e91919f31
Author: Stijn Peeters <[email protected]>
Date:   Wed May 1 11:04:07 2024 +0200

    Nicer code

commit 4182c436e4fb5109c5e041dc729f77a58d877889
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 16:19:36 2024 +0200

    Always shut down API worker only after everything else has been shut down

commit e685108b3cbe5f005ce2df21906267071ad8118e
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 16:12:42 2024 +0200

    Properly interrupt expiration worker when asked

commit 27a568eca7f2f3742223fef6285eaf80583e0fc4
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 30 13:40:50 2024 +0200

    Allow floats-as-strings as timestamps when importing CSV

commit 2d2bbb9fdb9b426b8f4a80782f04257721a97f2e
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 30 13:05:07 2024 +0200

    douyin: add consistency to map_item stats

commit 289aa342c9912aceeca35887c079c72aa6ffbf52
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 29 15:26:38 2024 +0200

    fix collection data in Douyin to handle $undefined

commit 5b9b23fb1696bc1b69e1d902c0a2ad4b7d168984
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 29 13:00:03 2024 +0200

    add scipy requirement to make compatible with gensim

    https://stackoverflow.com/questions/78279136/importerror-cannot-import-name-triu-from-scipy-linalg-gensim

commit 7eab746e944f1ababe3dcd6a5d25387a64c2237d
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 29 12:00:09 2024 +0200

    stupid, stupid, stupid

commit 90577982ac05019a7ac76818a62f91e84dd65902
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 29 11:56:22 2024 +0200

    Fix leftover iterate_mapped_items

commit 57dbdf74c49c34c05784debb9f7e258da7ae7d54
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 15:26:39 2024 +0200

    Woops

commit f11760d2c13e817e23cfa5e26b24f74cf817f65e
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 15:26:04 2024 +0200

    Update list of supported platforms in readme

commit 760ff1cdeb006f70acaa00ded82fb3cbc7617c9d
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 26 12:13:28 2024 +0200

    Bump version

commit 1fd78b2362840299e80f5540c9fedc1be3b06da1
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:58:24 2024 +0200

    Use MissingMappedField for Douyin fields undefined in the source data

commit 6918baeabc7a08b6a63495c5d38c86b2c88bca44
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:31:11 2024 +0200

    Fix Douyin mapping failure if cellRoom is $undefined

commit aad6208167c07686348234daff4dcf9cd036f5a5
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:30:53 2024 +0200

    Better error when trying to import data for unknown datasource

commit 43c6ed646994111188bde66d5bcfe4ab602e8512
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:30:31 2024 +0200

    Fix Twitter mapping on URLs that cannot be expanded

commit 91c3da176fad90ba16871fa8892fac5a0df13785
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 25 12:12:54 2024 +0200

    Safe cast to int in CrowdTangle import

commit 765f29e9232afdf284ab1667b0f371951e0bf2f4
Author: Stijn Peeters <[email protected]>
Date:   Wed Apr 24 12:37:02 2024 +0200

    Fix erroneous shell command in front-end restart trigger

commit c99fdd9eca8f5925d93375cac846e8b7633194fb
Merge: 342a4037 bc1deddf
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 12:29:35 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 342a4037411e7ccaa50b25a4686434bec39e2568
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 12:29:32 2024 +0200

    Enable TikTok comment and Gab import by default

commit bc1deddf57aa5049fb79622c4309fb7051d77bdb
Merge: 537d7645 3c644f01
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 23 12:16:37 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 537d76456e2826e8c4dd7026ec5b2d436370fad8
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 23 12:14:46 2024 +0200

    do the todo: fix column_filter to match exact/contains with int

commit 3c644f01baeca34e712d36efdf5c77ccd3ef7a06
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 23 11:16:07 2024 +0200

    Don't crash on empty URLs in dataset merge

commit f1574c26e2e3bdc40cc04bb8193cf6d3fa14792b
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 18 12:08:55 2024 +0200

    fix: do not fail when no processor exists

    weird! failed on a dataset `type="custom-search"` which was created by an import script w/ no processor. Also likely would make deprecated processors fail.
    500 server error:
    ```
    File "/opt/4cat/common/lib/dataset.py", line 800, in get_columns
         return self.get_item_keys(processor=self.get_own_processor())
       File "/opt/4cat/common/lib/dataset.py", line 405, in get_item_keys
         keys = list(items.__next__().keys())
       File "/opt/4cat/common/lib/dataset.py", line 337, in iterate_items
         if own_processor.map_item_method_available(dataset=self):
     AttributeError: 'NoneType' object has no attribute 'map_item_method_available'
    ```

commit 50a4434a37d71af6a9470c7fc4a236b043cbfb4d
Author: Stijn Peeters <[email protected]>
Date:   Wed Apr 17 14:30:58 2024 +0200

    Add "TikTok comments" data source

commit c43e76daae3c2e6ecdb218ee749315b985eccca4
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 16 17:59:25 2024 +0200

    Allow notifications per tag

commit 36984104e674e8577756bfc3fdd5c72f6569d9e1
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 16 17:25:38 2024 +0200

    fix: pass dataset to get_options when queuing processors

commit 59cb19a3c88f7f4a4ac02d0b7a891afde50ea069
Author: Dale Wahl <[email protected]>
Date:   Tue Apr 16 10:55:29 2024 +0200

    fix: dicts are shared in classes & you cannot delete a key more than once

    randomly found this; probably as no one else has reddit enabled!

commit 3ec9c6ea471bcdbe9fb1caad1e5fe1502a705444
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 15 13:22:19 2024 +0200

    fix results page error when dataset was being created; do not check for resultspage updates when user not focused on page

commit db05ae5e565248e865e67b8ea60e6653357bb1f4
Author: Dale Wahl <[email protected]>
Date:   Mon Apr 15 11:27:33 2024 +0200

    on import file, differentiate between missing field(s) and unable to map item

commit 940bac72c7e53bec9e136867c13e2a0a355961a4
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:57:48 2024 +0200

    Case-insensitive username/note matching in user list

commit d0f34245bd07b5ad2fd3e90754ef0264ffc350a9
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:29:12 2024 +0200

    Only determine settings tab name in one place

commit 9f69d7bc0bbb657be1e725d5fb3fe350b7205bff
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 12:20:34 2024 +0200

    git != github

commit 9b4981d8c7358f31ed65d9f161d556e578389801
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:56:04 2024 +0200

    Fix issues with user tags

    Fix number of users in tag overview; allow filtering by user tags on user list; don't delete all user tags when deleting one

commit 9e8ccd3a78765acdfd2005eaa215dc0dc07266e0
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:32:45 2024 +0200

    Do not hide all non-hidden child processors

    lol

commit 3f15410af3a278f5644f41f49e25498a1fac3c76
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:23:52 2024 +0200

    Disable standard video downloader for Telegram

commit 94c814b9cab2ae2be10d5c5d3f6cfe20898e349c
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:14:16 2024 +0200

    Telegram video downloader processor

commit d36254a188947fff507e8df59f793e98b3be1570
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 12 11:14:04 2024 +0200

    Better styling for 4CAT settings, alphabetic order, submenus

commit 808300fa109f306a921f2048b2cf4b6dafc4ba5f
Author: Stijn Peeters <[email protected]>
Date:   Thu Apr 11 14:44:32 2024 +0200

    Fix multiselect in UI

commit 131a0eca0ad514b1ee57803e5c560ab0e56de42d
Author: Stijn Peeters <[email protected]>
Date:   Mon Apr 8 18:28:04 2024 +0200

    Do not attempt to load crashed file as module in Slack webhook. Fixes #422 (hopefully)

commit 6d8cb067bc12f8be68749f74a7291e0849494225
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:43:58 2024 +0200

    Allow comma-separated list when adding new dataset owners

commit 2612aea49f63c37ac691cc89c553c764ead2344f
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:40:04 2024 +0200

    Include number of users with tag on tag page

commit 39f2ec40faa3b8493bd5525279aeaeb2e4f586e0
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:26:02 2024 +0200

    Fix confirmation before deleting user tag

commit b00a410a3441e7f2a9d73a9f2dfb0f4ef70ea8a5
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 19:25:01 2024 +0200

    Add link to users with tag on tag admin page

commit 3ef3e5ec9adbd8ddd128ce2b3f8fa3b1de1297e3
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 18:49:25 2024 +0200

    Give filtered datasets a more sensible label, based on source dataset

commit 0d5870b78fb73cb58231736cc8a2efbb0b3cd88a
Author: Dale Wahl <[email protected]>
Date:   Fri Apr 5 17:40:57 2024 +0200

    update iterate methods (#418)

    * working to make iterate_mapped_item primary method used by processors and elsewhere in 4CAT; iterate_item method only internally (and provide item directly as is from file) with iterate_mapped_object as intermediate method to use map_missing method and handle missing values as well as warn if needed

    * switch from iterate_items to iterate_mapped_items; careful attention to item_to_yield allowing a choice of the original item, the mapped item, or both

    * revert some unecessary renaming

    * fix annotations bug...

    this fixes the bug, but i noticed that the notations saved in the database do not have the correct post IDs.

    * Introduce DatasetItem class and simplify iterate_items

    * Don't crash when no item mapper

    * ...actually commit the DatasetItem class

    * Fix typos in comment

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>
    Co-authored-by: Sal Hagen <[email protected]>

commit 17b77351c51ace21b7057276bbae9da2643a3fc4
Author: Stijn Peeters <[email protected]>
Date:   Fri Apr 5 16:20:19 2024 +0200

    Allow dynamic form options in processors (#397)

    * Allow dynamic form options in processors

    * Allow 'requires' on data source options as well

    * Handle list values with requires

    * Wider support for file upload in processors

    * Log file uploads in DMI service manager

    * fix error w/ datasources having file option

    * fix fourcat.js use of checkboxes for dynamic settings

    * Fix faulty toggleButton targeting

    ---------

    Co-authored-by: Dale Wahl <[email protected]>

commit 693fcedc93ee4476a60d0e0876e688f82a8526fa
Author: Dale Wahl <[email protected]>
Date:   Fri Apr 5 15:59:10 2024 +0200

    Add method to processors to toggle display in UI (#411)

    * add ui_only parameter to DataSet.get_available_processors() and BasicProcessor.display_in_ui()

    Allow using `display_in_ui` to hide processors from UI but allow them to be queued either via API or presets. This avoids issue of is_compatible_with() having to be used to hide processors with sometimes ill effects.

    * keep same data structure....

    * don't delete twice; it's redundant... and raises an error

    * Rename arguments/properties

    * Exclude hidden processors in top level view

    * fix logic

    * Exclude in child template as well

    ---------

    Co-authored-by: Stijn Peeters <[email protected]>

commit 3cd146c2908da6b3a06a0c1511bf042c4223af0f
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 16:41:39 2024 +0200

    fix: whoops remove debug

commit daa7291e813e62fed4600a4acb8430004836cb86
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 15:16:30 2024 +0200

    CSV preview add hyperlinks if "url" or "link" in column header

commit 5f2d6e65bad4f71b2c3cc75d2cdab76f15671d4c
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 15:16:01 2024 +0200

    blip2 processor to work w/ DMI Service Manager

commit fe881dec18778d99ac4a0f60ca40a1f43fdb1689
Author: Dale Wahl <[email protected]>
Date:   Thu Apr 4 09:53:30 2024 +0200

    catch AttributeError on slackhook if unable to read file

    ever vigilant against a lack of flavour...

commit 2808256b1fabf2e6e8a5a94aad98af60c50fb7b0
Merge: 14123847 eb474640
Author: Dale Wahl <[email protected]>
Date:   Wed Apr 3 17:28:40 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 14123847b5852bf0e7c84fced6c2380165ec93f6
Author: Dale Wahl <[email protected]>
Date:   Wed Apr 3 17:28:38 2024 +0200

    staging_areas should not be made for completed datasets (else they may be deleted prematurely)

commit eb474640559ee3e914d9c95adb60be09b906f1d6
Merge: bbdf2ab9 3f8b285c
Author: sal-phd-desktop <[email protected]>
Date:   Wed Apr 3 16:50:54 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit bbdf2ab9b4292c14911ac01b481c829defa85e5c
Author: sal-phd-desktop <[email protected]>
Date:   Wed Apr 3 16:50:36 2024 +0200

    Helper script to export the 'classic' 4CAT 4chan data

commit 3f8b285c44c33a3ce08e885889b311bc454a70ea
Merge: 8f40f3f5 f7cc5b8d
Author: Sal Hagen <[email protected]>
Date:   Wed Apr 3 12:12:17 2024 +0200

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 8f40f3f5222a63e93f46eb3b57791d10060a0cc8
Author: Sal Hagen <[email protected]>
Date:   Wed Apr 3 12:12:13 2024 +0200

    Tumblr search typo

commit f7cc5b8d012dec3d8e0c8847ae16c662e82040b5
Author: Stijn Peeters <[email protected]>
Date:   Tue Apr 2 12:32:51 2024 +0200

    More/less flavour in restart worker

commit 073587efc581adca0608988573ac83ea8b0c93d0
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 14:15:27 2024 +0100

    create favicon.ico (remove from repo)

    be sure to keep webtool/static/img/favicon/favicon-bw.ico as basis

commit 28d733d56204231f4089660ff61282174aac7aed
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 09:44:45 2024 +0100

    add allow_access_request check to request-password page

    clicking it would only return the user to the login page anyway, but better not even show it

commit 1f2cb77e3cb0fc9b5403da52aaa925b33089d18f
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 27 09:37:51 2024 +0100

    fix can_request_access to use 4cat.allow_access_request option

commit 0d66f11d3619af798d5acc41dbf4fe118b7ddad8
Merge: 25825383 05b3fc07
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 17:54:48 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 2582538303e31470ed6bf8a01645f7b45af15e5d
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 17:54:45 2024 +0100

    More permissive timeout for pixplot

commit 05b3fc0771ded10dc55db799e8f47e42add08d43
Author: Dale Wahl <[email protected]>
Date:   Tue Mar 26 14:01:59 2024 +0100

    remove redundant call of Path

commit e4a93442efb84d73d6a4c9af9bc46a8f3e3fdda2
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 26 11:52:09 2024 +0100

    Include column with link description in Telegram mapping

commit 876f4a4b6df51ec4b30a048c32191438b6778f90
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 25 14:48:47 2024 +0100

    douyin handle image posts

commit 81ad61baabaf965b1c848f55a80c23bd3e1a9000
Author: Stijn Peeters <[email protected]>
Date:   Mon Mar 25 08:01:44 2024 +0100

    Accept non-numeric IDs in Telegram image downloader

commit a8b36dc5682df7c16e25474ea8fdbfc4f12f9d46
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 23:15:51 2024 +0100

    Ensure unique IDs for Telegram datasets

commit 4a3e9ffee072c4d3efb7bfd8744369b46f19eef2
Merge: 0c119130 d749237e
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 22:56:59 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit 0c11913049aabb5a83ffe26d58bdf17affdbc0b9
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:09:10 2024 +0100

    Better string formatting in Telegram image downloader

commit 8a7da5317defdafb5bdbf74dcbeb68e464fa21f4
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:06:06 2024 +0100

    Add 'link thumbnails' option to Telegram image downloader

commit a0baae17d8f11e4cae7cc261f8d406b1b1ce628a
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:24 2024 +0100

    Add 'Fetch URL metadata' processor

commit b9a0668f35c6d1fc5bfb42e1ae706418cbe6e0a7
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:15 2024 +0100

    Update ural dependency

commit a28036186f5d35e435cade7638ed35361054967e
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:05:08 2024 +0100

    Add emoji library dependency

commit bb50fc946fb6cdd8454969514bdc6d5ecf3f3530
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:04:59 2024 +0100

    Add 'emoji' option to Count Values processor

commit e653e3d8fb9c01697d96316df6f7634454671191
Author: Stijn Peeters <[email protected]>
Date:   Sun Mar 24 20:04:42 2024 +0100

    Add 'forwards', 'reactions', 'link_title', 'link_attached' columns to mapped Telegram items

commit d749237ec5c103b286ba8086904e405e232fc14c
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 11:02:14 2024 +0100

    telegram: sp too?

    this is why i test locally first...

commit 9d7d27c61425bbbbccd18a8e3de35ab372dbfbf3
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 10:58:48 2024 +0100

    telegram: missed reference to options

commit c1671ce0ef69c71c81c3ae69a59e4ad7dc1bda79
Author: Dale Wahl <[email protected]>
Date:   Fri Mar 22 10:49:02 2024 +0100

    telegram fix: class dictionaries are shared between all workers

    admin calls get_options and `del options["max_posts"]["max"]` runs, then normal user calls get_options and there is no longer max. could also copy cls.options, but not sure why we cannot create the options in `get_options`.

commit cd2e74d251491a93bc66dc7a64e8b2a60b0ed8ae
Author: Stijn Peeters <[email protected]>
Date:   Wed Mar 20 11:10:30 2024 +0100

    Make Telegram max entities a setting

commit 38fcabb81da956e5513bd0246ee086d1ab4896c9
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:47:59 2024 +0100

    Make metrics table use BIGINT

    Folder size may not fit otherwise!

commit 34013cb91eed7fac725defd408b67bddee4b806b
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:37:10 2024 +0100

    Fix duplicate stats in metrics table

commit c8ad90b3436cff600320d3b2efdf6144240ea59d
Author: Stijn Peeters <[email protected]>
Date:   Fri Mar 15 18:14:39 2024 +0100

    Calculate disk use stats via worker instead of on demand

commit e4e0c4e3a375bf14bdca7b633231b60e34c322e0
Author: Stijn Peeters <[email protected]>
Date:   Thu Mar 14 10:25:23 2024 +0100

    Spelling thing

commit ae1c00fb3a521a2c3258b2597b04322d202c3ee7
Author: Stijn Peeters <[email protected]>
Date:   Thu Mar 14 10:25:10 2024 +0100

    Disable direct editing of tag order

commit e3ce81452ad8ee3231309383c24fb26e553b0dff
Merge: fa3be93b a7b5820c
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 13 16:25:46 2024 +0100

    Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat

commit fa3be93bafef17e95881207604efa1212d562d9e
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 13 16:25:43 2024 +0100

    instagram: check both user and owner for full_name

commit a7b5820c9f2acb5081ef80ea0293f42ee91925a3
Author: Dale Wahl <[email protected]>
Date:   Tue Mar 12 15:59:43 2024 +0100

    proposed fix to results filter (#417)

    * proposed fix to results filter

    * do not filter datasources at all for results/ view

commit b930b6e964b460ef5160398c6cd1038f766b0548
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 11 12:00:12 2024 +0100

    remove unused code

    the `can_preview` attribute does not appear to exist so this is always hidden

commit 97cd2d52966bd751da704a4a06cfa5478f999885
Author: Dale Wahl <[email protected]>
Date:   Mon Mar 11 11:51:28 2024 +0100

    faster collection of folder size for admin panel

    was between five and six times faster in my tests around 11G of data files)

commit 108fd28b594a95b94727ccc601fec59da61a8d3d
Author: Dale Wahl <[email protected]>
Date:   Thu Mar 7 11:09:33 2024 +0100

    typo fixes, log fix

commit 44848a8f4b9fea07e7f9ce03e4fe0d696d5f1d27
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 6 10:17:34 2024 +0100

    fix tf_idf - sometimes less results than max

commit e5f1f703247a5763d3d0e03c44ee31ab60b8a8ed
Author: Dale Wahl <[email protected]>
Date:   Wed Mar 6 09:33:21 2024 +0100

    fix image downloader failing on 4chan images

    we do not often rename datasources, but when we do...

commit f5e50d508096729bccdc0dafa460f83c419c2606
Author: Stijn Peeters <[email protected]>
Date:   Tue Mar 5 16:23:34 2024 +0100

    Version 1.39 -> 1.40

commit 4b3e4efa25914f5f9509f69596a82935440e5f9f
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:28:15 2024 +0100

    Add 'safe' parameter to get_item_data

commit b98f62ab6a3a21815cc0fa899cdca1d48eab0fdb
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:27:57 2024 +0100

    Use iterate_mapped_items in dataset view

commit 6d9baa9c228168dce7fe946681c95d471d45c6e0
Author: Stijn Peeters <[email protected]>
Date:   Mon Feb 26 18:27:33 2024 +0100

    Update TikTok downloader for new item mapper

commit 1622ec660754582eb2791f0d114df76e71640370
Author: Dale Wahl <[email protected]>
Date:   Mon Feb 26 12:51:31 2024 +0100

    flawless was removed from dataset class, but used by telegram

    adding back to fix telegram, but perhaps it should be changed

commit 84168e945e2ecf963cfdac3409d60544b521f694
Author: Dale Wahl <[email protected]>
Date:   Wed Feb 21 15:56:24 2024 +0100

    webtool checks for gunicorn and if exists sets up error log

    this normally only ran in Docker

commit 7119862feac1e9993b8dedccc59887830e7715a1
Author: Stijn Peeters <[email protected]>
Date:   Tue Feb 20 18:36:21 2024 +0100

    Use MappedItem in ML processors

commit 32b8790420af8572f4a3db2d2bc8ffd696872114
Author: Stijn Peeters <[email protected]>
Date:   Tue Feb 20 16:58:22 2024 +0100

    Map items to objects instead of dicts (#409)

    * Consistent parameter name for map_item()

    * Wrap mapped items in MappedItem() object

    * Keep track of import warnings in search.py

    * Add warning when mapping a tweet with missing metric data

    * Add new iterate_mapped_objects method

    * Log mapping warnings when merging datasets

    * Pass object instead of dict

    * Clarify Twitter warning

    * Documenting MappedItem

    * Explain things to myself

    * R…
In spirit, at least
@stijn-uva stijn-uva marked this pull request as ready for review September 20, 2024 17:19
@dale-wahl
Copy link
Member

dale-wahl commented Nov 11, 2024

I have been reviewing this and ran into an issue preventing logging in. That's cool that flask_login manages to update current_user with our User class, I am assuming, on login. But I am not sure how to handle this particular chicken/egg situation. Maybe I can give the not really a user current_user the config in the show_login function if it is a GET?

Fixed in below commits.

File "/usr/src/app/webtool/views/views_user.py", line 331, in show_login
... 
File "/usr/src/app/webtool/lib/template_filters.py", line 384, in inject_now
2024-11-11 16:32:09     "__notifications": current_user.get_notifications(),
2024-11-11 16:32:09   File "/usr/src/app/common/lib/user.py", line 447, in get_notifications
2024-11-11 16:32:09     raise ValueError("User not instantiated with a configuration reader. Provide a ConfigManager at "
2024-11-11 16:32:09 ValueError: User not instantiated with a configuration reader. Provide a ConfigManager at instantiation or use with_config().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants