Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
commit 3f2a62a124926cfeb840796f104a702878ac10e5 Author: Carsten Schnober <[email protected]> Date: Wed Sep 18 18:18:29 2024 +0200 Update Gensim to >=4.3.3, <4.4.0 (#450) * Update Gensim to >=4.3.3, <4.4.0 * update nltk as well --------- Co-authored-by: Dale Wahl <[email protected]> Co-authored-by: Sal Hagen <[email protected]> commit fee2c8c08617094f28496963da282d2e2dddeab7 Merge: 3d94b666 f8e93eda Author: sal-phd-desktop <[email protected]> Date: Wed Sep 18 18:11:19 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 3d94b666cedd0de4e0bee953cbf1d787fdc38854 Author: sal-phd-desktop <[email protected]> Date: Wed Sep 18 18:11:04 2024 +0200 FINALLY remove 'News' from the front page, replace with 4CAT BlueSky updates and potential information about the specific server (to be set on config page) commit f8e93edabe9013a2c1229caa4c454fab09620125 Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 15:11:21 2024 +0200 Simple extensions page in Control Panel commit b5be128c7b8682fb233d962326d9118a61053165 Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 14:08:13 2024 +0200 Remove 'docs' directory commit 1e2010af44817016c274c9ec9f7f9971deb57f66 Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 14:07:38 2024 +0200 Forgot TikTok and Douyin commit c757dd51884e7ec9cf62ca1726feacab4b2283b7 Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 14:01:31 2024 +0200 Say 'zeeschuimer' instead of 'extension' to avoid confusion with 4CAT extensions commit ee7f4345478f923541536c86a5b06246deae03f6 Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 14:00:40 2024 +0200 RIP Parler data source commit 11300f2430b51887823b280405de4ded4f15ede1 Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 11:21:37 2024 +0200 Tuplestring commit 547265240eba81ca0ad270cd3c536a2b1dcf512d Author: Stijn Peeters <[email protected]> Date: Wed Sep 18 11:15:29 2024 +0200 Pass user obj instead of str to ConfigWrapper in Processor commit b21866d7900b5d20ed6ce61ee9aff50f3c0df910 Author: Stijn Peeters <[email protected]> Date: Tue Sep 17 17:45:01 2024 +0200 Ensure request-aware config reader in user object when using config wrapper commit bbe79e4b0fe870ccc36cab7bfe7963b28d1948e3 Author: Sal Hagen <[email protected]> Date: Tue Sep 17 15:12:46 2024 +0200 Fix extension path walk for Windows commit d6064beaf31a6a85b0e34ed4f8126eb4c4fc07e3 Author: Stijn Peeters <[email protected]> Date: Mon Sep 16 14:50:45 2024 +0200 Allow tags that have no users Use case: tag-based frontend differentiation using X-4CAT-Config-Via-Proxy commit b542ded6f976809ec88445e7b04f2c81b900188e Author: Stijn Peeters <[email protected]> Date: Mon Sep 16 14:13:14 2024 +0200 Trailing slash in query results list commit a4bddae575b22a009925206a1337bdd89349e567 Author: Dale Wahl <[email protected]> Date: Mon Sep 16 13:57:23 2024 +0200 4CAT Extension - easy(ier) adding of new datasources/processors that can be mainted seperately from 4CAT base code (#451) * domain only * fix reference * try and collect links with selenium * update column_filter to find multiple matches * fix up the normal url_scraper datasource * ensure all selenium links are strings for join * change output of url_scraper to ndjson with map_items * missed key/index change * update web archive to use json and map to 4CAT * fix no text found * and none on scraped_links * check key first * fix up web_archive error reporting * handle None type for error * record web archive "bad request" * add wait after redirect movement * increase waittime for redirects * add processor for trackers * dict to list for addition * allow both newline and comma seperated links * attempt to scrape iframes as seperate pages * Fixes for selenium scraper to work with config database * installation of packages, geckodriver, and firefox if selenium enabled * update install instructions * fix merge error * fix dropped function * have to be kidding me * add note; setup requires docker... need to think about IF this will ever be installed without Docker * seperate selenium class into wrapper and Search class so wrapper can be used in processors! * add screenshots; add firefox extension support * update selenium definitions * regex for extracting urls from strings * screenshots processor; extract urls from text and takes screenshots * Allow producing zip files from data sources * import time * pick better default * test screenshot datasource * validate all params * fix enable extension * haha break out of while loop * count my items * whoops, len() is important here * must be getting tired... * remove redundant logging * Eager loading for screenshots, viewport options, etc * Woops, wrong folder * Fix label shortening * Just 'queue' instead of 'search queue' * Yeah, make it headless * README -> DESCRIPTION * h1 -> h2 * Actually just have no header * Use proper filename for downloaded files * Configure whether to offer pseudonymisation etc * Tweak descriptions * fix log missing data * add columns to post_topic_matrix * fix breadcrumb bug * Add top topics column * Fix selenium config install parameter (Docker uses this/manual would need to run install_selenium, well, manually) * this processor is slow; i thought it was broken long before it updated! * refactor detect_trackers as conversion processor not filter * add geckodriver executable to docker install * Auto-configure webdrivers if available in PATH * update screenshots to act as image-downloader and benefit from processors * fix is_compatible_with * Delete helper-scripts/migrate/migrate-1.30-1.31.py * fix embeddings is_compatible_with * fix up UI options for hashing and private * abstract was moved to lib * various fixes to selenium based datasources * processors not compatible with image datasets * update firefox extension handling * screenshots datasource fix get_options * rename screenshots processor to be detected as image dataset * add monthly and weekly frequencies to wayback machine datasource * wayback ds: fix fail if all attempts do not realize results; addion frequency options to options; add daily * add scroll down page to allow lazy loading for entire page screenshots * screenshots: adjust pause time so it can be used to force a wait for images to load I have not successfully come up with or found a way to wait for all images to load; document.readyState == 'complete' does not function in this way on certain sites including the wayback machine * hash URLs to create filenames * remove log * add setting to toggle display advanced options * add progress bars * web archive fix query validation * count subpages in progress * remove overwritten function * move http response to own column * special filenames * add timestamps to all screenshots * restart selenium on failure * new build have selenium * process urls after start (keep original query parameters) * undo default firefox * quick max * rename SeleniumScraper to SeleniumSearch todo: build SeleniumProcessor! * max number screenshots configurable * method to get url with error handling * use get_with_error_handling * d'oh, screenshot processor needs to quit selenium * update log to contain URL * Update scrolling to use Page down key if necessary * improve logs * update image_category_wall as screenshot datasource does not have category column; this is not ideal and ought to be solved in another way. Also, could I get categories from the metadata? That's... ugh. * no category, no processor * str errors * screenshots: dismiss alerts when checking ready state is complete * set screenshot timeout to 30 seconds * update gensim package * screenshots: move processor interrupt into attempts loop * if alert disappears before we can dismiss it... * selenium specific logger * do not switch window when no alert found on dismiss * extract wait for page to load to selenium class * improve descriptions of screenshot options * remove unused line * treat timeouts differently from other errors these are more likely due to an issue with the website in question * debug if requested * increase pause time * restart browser w/ PID * increase max_workers for selenium this is by individual worker class not for all selenium classes... so you can really crank them out if desired * quick fix restart by pid * avoid bad urls * missing bracket & attempt to fix-missing dependencies in Docker install * Allow dynamic form options in processors * Allow 'requires' on data source options as well * Handle list values with requires * basic processor for apple store; setup checks for additional requirements * fix is_4cat_class * show preview when no map_item * add google store datasource * Docker setup.py use extensions * Wider support for file upload in processors * Log file uploads in DMI service manager * add map_item methods and record more data per item need additional item data as map_item is staticmethod * update from master; merge conflicts * fix docker build context (ignore data files) * fix option requirements * apple store fix: list still tries to get query * apple & google stores fix up item mapping * missed merge error * minor fix * remove unused import * fix datasources w/ files frontend error * fix error w/ datasources having file option * better way to name docker volumes * update two other docker compose files * fix docker-compose ymls * minor bug: fix and add warning; fix no results fail * update apple field names to better match interface * update google store fieldnames and order * sneak in jinja logger if needed * fix fourcat.js handling checkboxes for dynamic settings * add new endpoint for app details to apple store * apple_store map new beta app data * add default lang/country * not all apps have advisories * revert so button works * add chart positions to beta map items * basic scheduler To-do - fix up and add options to scheduler view (e.g. delete/change) - add scheduler view to navigator - tie jobs to datasets? (either in scheduler view or, perhaps, filter dataset view) - more testing... * update scheduler view, add functions to update job interval * revert .env * working scheduler! * basic scheduler view w/ datasets * fix postgres tag * update job status in scheduled_jobs table * fix timestamp; end_date needed for last run check; add dataset label * improve scheduler view * remove dataset from scheduled_jobs table on delete * scheduler view order by last creation * scheduler views: separate scheduler list from scheduled dataset list * additional update from master fixes * apple_store map_items fix missing locales * add back depth for pagination * correct route * modify pagination to accept args * pagination fun * pagination: i hate testing on live servers... * ok ok need the pagination route * pagination: add route_args * fix up scheduler header * improve app store descriptions * add azure store * fix azure links * azure_store: add category search * azure fix type of config update timestamp OPTION_DATE does not appear correctly in settings and causes it to be written incorrectly * basic aws store * check if selenium available; get correct app_id * aws: implement pagination * add logging; wait for elements to load after next page; attempts to rework filter option collection * apple_store: handle invalid param error * fix filter_options * aws: fix filter option collection! * more merge * move new datasources and processors to extensions and modify setup.py and module loader to use the new locations * migrate.py to run extension "fourcat_install.py" files * formatting * remove extensions; add gitignore * excise scheduler merge * some additional cleanup from app_studies branch * allow nested datasources folders; ignore files in extensions main folder * allow extension install scripts to run pip if migrate.py has not * Remove unused URL functions we could use ural for * Take care of git commit hash tracking for extension processors * Get rid of unused path.versionfile config setting * Add extensions README * Squashed commit of the following: commit cd356f7a69d15e8ecc8efffc6d63a16368e62962 Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 17:36:18 2024 +0200 UI setting for 4CAT install ad in login commit 0945d8c0a11803a6bb411f15099d50fea25f10ab Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 17:32:55 2024 +0200 UI setting for anonymisation controls Todo: make per-datasource commit 1a2562c2f9a368dbe0fc03264fb387e44313213b Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 15:53:27 2024 +0200 Debug panel for HTTP headers in control panel commit 203314ec83fb631d985926a0b5c5c440cfaba9aa Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 15:53:17 2024 +0200 Preview for HTML datasets commit 48c20c2ebac382bd41b92da4481ff7d832dc1538 Author: Desktop Sal <[email protected]> Date: Wed Sep 11 13:54:23 2024 +0200 Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3 Author: Dale Wahl <[email protected]> Date: Fri Sep 6 16:29:19 2024 +0200 fix nltk where it matters commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d Author: Stijn Peeters <[email protected]> Date: Tue Sep 3 12:05:14 2024 +0200 Actually check progress in text annotator commit 693960f41b73e39eda0c2f23eb361c18bde632cd Author: Stijn Peeters <[email protected]> Date: Mon Sep 2 18:03:18 2024 +0200 Add processor for stormtrooper DMI service commit 6ae964aad492527bc5d016a00f870145aab6e1af Author: Stijn Peeters <[email protected]> Date: Fri Aug 30 17:31:37 2024 +0200 Fix reference to old stopwords list in neologisms preset * Fix Github links for extensions * Fix commit detection in extensions * Fix extension detection in module loader * Follow symlinks when loading extensions Probably not uncommon to have a checked out repo somewhere to then symlink into the extensions dir * Make queue message on create page more generic * Markdown in datasource option tooltips * Remove Spacy model from requirements * Add software_source to database SQL --------- Co-authored-by: Stijn Peeters <[email protected]> Co-authored-by: Stijn Peeters <[email protected]> commit cd356f7a69d15e8ecc8efffc6d63a16368e62962 Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 17:36:18 2024 +0200 UI setting for 4CAT install ad in login commit 0945d8c0a11803a6bb411f15099d50fea25f10ab Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 17:32:55 2024 +0200 UI setting for anonymisation controls Todo: make per-datasource commit 1a2562c2f9a368dbe0fc03264fb387e44313213b Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 15:53:27 2024 +0200 Debug panel for HTTP headers in control panel commit 203314ec83fb631d985926a0b5c5c440cfaba9aa Author: Stijn Peeters <[email protected]> Date: Sat Sep 14 15:53:17 2024 +0200 Preview for HTML datasets commit 48c20c2ebac382bd41b92da4481ff7d832dc1538 Author: Desktop Sal <[email protected]> Date: Wed Sep 11 13:54:23 2024 +0200 Remove spacy processors (linguistic extractor, get nouns, get entities) and remove dependencies commit 657ffd75a7f48ba4537449127e5fa39debf4fdf3 Author: Dale Wahl <[email protected]> Date: Fri Sep 6 16:29:19 2024 +0200 fix nltk where it matters commit 2ef5c80f2d1a5b5f893c8977d8394740de6d796d Author: Stijn Peeters <[email protected]> Date: Tue Sep 3 12:05:14 2024 +0200 Actually check progress in text annotator commit 693960f41b73e39eda0c2f23eb361c18bde632cd Author: Stijn Peeters <[email protected]> Date: Mon Sep 2 18:03:18 2024 +0200 Add processor for stormtrooper DMI service commit 6ae964aad492527bc5d016a00f870145aab6e1af Author: Stijn Peeters <[email protected]> Date: Fri Aug 30 17:31:37 2024 +0200 Fix reference to old stopwords list in neologisms preset commit 4ba872bef2968f7f8bf5831fd3a4f413420b36ed Author: Dale Wahl <[email protected]> Date: Tue Aug 27 13:04:46 2024 +0200 fix hatebase: default column option for OPTION_MULTI_SELECT must be list commit e276033542f2d22e7f614f318a01d65114a21482 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Wed Aug 21 12:53:10 2024 +0200 Bump nltk from 3.6.7 to 3.9 (#447) Bumps [nltk](https://github.com/nltk/nltk) from 3.6.7 to 3.9. - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.6.7...3.9) --- updated-dependencies: - dependency-name: nltk dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 1d749c3cf83b130ba70bdb09174f382d6711a14b Author: sal-phd-desktop <[email protected]> Date: Wed Aug 21 12:52:54 2024 +0200 Set UTF-8 encoding when opening stop words (fixes Windows bug) commit a03e5fd4252e7242563c291558606440256eb3d1 Author: Dale Wahl <[email protected]> Date: Mon Aug 19 14:19:21 2024 +0200 remove duplicate line commit aa07e8c13c2d59c6b699f78133036514659ee420 Author: Dale Wahl <[email protected]> Date: Mon Jul 29 09:35:22 2024 +0200 tweet import fix: author banner key missing when author has no banner commit 32dac5d2ffb936210f12f5c725514fd25a0286f1 Author: Dale Wahl <[email protected]> Date: Mon Jul 29 08:52:08 2024 +0200 tell user when dataset is not found we could have a proper 404 page, but at least leave a message commit 2c8c860fc5378113d1352016ac26ca761adecb32 Author: Dale Wahl <[email protected]> Date: Mon Jul 22 17:41:00 2024 +0200 telegram fix: reactions datastructure commit 1c0bf5e580eb16d8a6f9afa415f9febce449a537 Author: Dale Wahl <[email protected]> Date: Mon Jul 22 11:19:52 2024 +0200 fix telegram: crawl_max_depth can be None if it is not enabled for a user commit 3dfe7af292b33574a31630e3a0da10954ed87d0a Author: Dale Wahl <[email protected]> Date: Fri Jul 19 11:52:31 2024 +0200 fix more config.get() magic commit 2453182bcee6e54b396b762ab77b60b8a0893638 Author: Dale Wahl <[email protected]> Date: Fri Jul 19 10:54:23 2024 +0200 config_manager - fix `get_all` w/ one results (super rare edge); fix overwriting self.db in `with_db` commit 6b9cb0b5479e6e64e09a49fa2ca9effe1c5a7415 Author: Dale Wahl <[email protected]> Date: Wed Jul 17 15:20:49 2024 +0200 add surf nginx init file commit 5e984e13a08d9fba7d5806a7ef4e012ce7d57319 Author: Dale Wahl <[email protected]> Date: Wed Jul 17 14:30:34 2024 +0200 change port for surf commit 2ce8c354e90f939a16dad3f0155fd7d79405c79e Author: Dale Wahl <[email protected]> Date: Wed Jul 17 12:54:11 2024 +0200 use latest image on surf commit 13ec0fd3f2bed86c3b2dff73014093a6a92fbfb5 Author: Dale Wahl <[email protected]> Date: Wed Jul 17 12:46:59 2024 +0200 update surf docker-compose.yml this may require a new release commit 78698f6ac1b22b1154d31f69543ba7b266d33191 Author: Dale Wahl <[email protected]> Date: Wed Jul 17 10:34:56 2024 +0200 clip: handle new and old format commit eb7693780cb191403f107817ca30d90373929bf0 Author: Dale Wahl <[email protected]> Date: Tue Jul 16 14:27:08 2024 +0200 DMI SM updates to use status endpoint w/ database records; run on CPU if no GPU enabled commit d2a787e2c1559417bb5401f3208c82954052504f Author: Stijn Peeters <[email protected]> Date: Mon Jul 15 15:58:06 2024 +0200 Require most recent Telethon version commit 346150bd9cc96ac099abd4d15fa3de39bd65e9d1 Author: Stijn Peeters <[email protected]> Date: Mon Jul 15 15:57:55 2024 +0200 Catch UPDATE_APP_TO_LOGIN in Telegram commit 04acc06e95098d7e2f9b4af404447c9cfaee5b99 Author: Stijn Peeters <[email protected]> Date: Mon Jul 15 11:27:30 2024 +0200 Unbreak Twitter error handling commit e9b5232a963be02c2e86dabacb607b2315a4e0e6 Author: Stijn Peeters <[email protected]> Date: Fri Jul 12 13:27:15 2024 +0200 Ensure str type when trying to extract video URLs from a field commit d69dd6f337cac05ed31c05334890679976a1e6de Author: Stijn Peeters <[email protected]> Date: Fri Jul 12 12:31:14 2024 +0200 Make CSV column mapping params look nicer on result page commit 9bd9da568f593085a8d54744836e3290a75b51a7 Author: Stijn Peeters <[email protected]> Date: Fri Jul 12 12:22:03 2024 +0200 Add "empty" and "current timestamp" as options to CSV mapping commit 0b574571952a206904440faf8601ddf95ab42b24 Author: Dale Wahl <[email protected]> Date: Thu Jul 11 16:59:56 2024 +0200 image_wall: backup fit method commit eeb1ddeb7ca85b6802dfed3c74d1352062383d50 Merge: 2504c37b 43239467 Author: Stijn Peeters <[email protected]> Date: Thu Jul 11 16:47:45 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 43239467db046eea5eb5268f91d1b63a1042238d Author: Dale Wahl <[email protected]> Date: Thu Jul 11 12:08:08 2024 +0200 fix processor more button would only show top level analysis if not logged in commit d6ab2b0783f8e40ecd8fadbc2abccffa6f093e39 Author: Dale Wahl <[email protected]> Date: Tue Jul 9 15:35:25 2024 +0200 search_gab - use MappedItem commit 2504c37b67ff6f19720b44d8bb6054b1c3d5a155 Author: Stijn Peeters <[email protected]> Date: Sat Jul 6 17:51:22 2024 +0200 Fix multiline spacing in multi select list commit fea66ce38be0717da6c1f847e7124f7069c096e2 Author: Dale Wahl <[email protected]> Date: Fri Jul 5 13:15:45 2024 +0200 use processor media_type if dataset does not have media_type; set default media_type for downloaders commit d41fa34514e8177efdac7e64a31f2ee75c7d1652 Author: Dale Wahl <[email protected]> Date: Fri Jul 5 12:57:18 2024 +0200 video_hasher: handle no metadata file commit 2820dcecc36ed4705a2776064d387ff7ed14e84f Author: Dale Wahl <[email protected]> Date: Fri Jul 5 12:50:09 2024 +0200 num_rows not num_items() commit fb09162db902fa22fdf2d7a3ed171ce1489bd92f Author: Dale Wahl <[email protected]> Date: Fri Jul 5 12:44:03 2024 +0200 Google vision API returning 400s; properly log and record processed entries; google networks should not run on empty datasets commit ebf39d8262d199895aedc4f7fa275c5685e58563 Author: Dale Wahl <[email protected]> Date: Fri Jul 5 12:28:13 2024 +0200 fix image_category_wall whoops, cleared categories and post_values after filling them! commit 1ad9ec2c2e76604793ec37584c051f116af2fdab Author: Stijn Peeters <[email protected]> Date: Fri Jul 5 12:03:54 2024 +0200 fsdfdsgd sorry commit c7254c08a477c6cdc8497507e8452c3eff7101c9 Author: Stijn Peeters <[email protected]> Date: Fri Jul 5 12:01:21 2024 +0200 Fix razdel versioning commit b9a327abe99f2d9ede4f2747f34f20d1dc6803cb Author: Stijn Peeters <[email protected]> Date: Fri Jul 5 11:57:47 2024 +0200 Reorganise tokeniser, stopwords commit fb13bc483af9ba0d677ee35fd045bf36ab1cddf7 Merge: 0b745692 e3046496 Author: Stijn Peeters <[email protected]> Date: Fri Jul 5 11:56:08 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit e30464964262870c54c73f65a3bce630d6576f45 Author: Dale Wahl <[email protected]> Date: Fri Jul 5 10:51:53 2024 +0200 media_upload allow setting for max_form_part and warn users of failure above certain number of files commit e4f982b4550b352a5d1a131abd78d52e6c196e48 Author: Dale Wahl <[email protected]> Date: Fri Jul 5 09:50:49 2024 +0200 Update media_import help text; looks like failure happens somewhere between 600-1000 files due to Flask request size limits commit 0b74569280f8f87376a964a6b160ea1993cb3354 Author: Stijn Peeters <[email protected]> Date: Thu Jul 4 17:55:36 2024 +0200 Add razdel as option for Russian tokenisation commit 9f15a2b8d666c3b6fddeb151b7c424cb44df18a6 Author: Dale Wahl <[email protected]> Date: Thu Jul 4 17:13:15 2024 +0200 remove the log commit ffcb6a4239075ba190fb534b25b89507e09e5f56 Author: Dale Wahl <[email protected]> Date: Thu Jul 4 17:12:43 2024 +0200 Inform user if too many files are uploaded I do not understand why this is appearing. app.config['MAX_CONTENT_LENGTH'] is set to None. Problem persists in Flask alone (i.e., does not appear to be Gunicorn/Nginx/Apache). commit 9cad12dd6f64a63c48d3b5b304b5c7d9d1a6ddb7 Author: Stijn Peeters <[email protected]> Date: Thu Jul 4 15:09:42 2024 +0200 Bump version commit aad94f393de77cc9d4f578e1f5be66a3601a4c90 Author: Dale Wahl <[email protected]> Date: Thu Jul 4 10:51:01 2024 +0200 Update setup.py to ensure videohash updates commit d9154a6f9c46a5c793909b88da751bc71d6f759f Author: Dale Wahl <[email protected]> Date: Tue Jul 2 17:45:26 2024 +0200 clip: categorizing requires categories... seriously, guys? commit 0af9a5ec49bd2bcfbb87bda33976c65683f68777 Author: Dale Wahl <[email protected]> Date: Tue Jul 2 17:31:49 2024 +0200 blip2: fix no metadata file found (uploads...) commit d695053f440bd938a57f06adea7b9c732ecf30d7 Author: Dale Wahl <[email protected]> Date: Tue Jul 2 17:25:26 2024 +0200 cat_vis_wall - use str as category type if mixed i.e., use floats as string categories commit bcb914076760ea1fb0e277cdcd1782ffa101b535 Author: Sal Hagen <[email protected]> Date: Tue Jul 2 16:06:43 2024 +0200 Add Twitter author profile pic and banner URLs commit 1b3b02f826578e8f702ea84a27c8ced7b1fab345 Author: Dale Wahl <[email protected]> Date: Tue Jul 2 11:42:50 2024 +0200 add migrate.py log file in Docker commit 2aaa972e6888743fc329d721c37fa626cf2eeae3 Author: Dale Wahl <[email protected]> Date: Tue Jul 2 11:42:22 2024 +0200 add necessary pip packages for upgrade in Docker environment; add error logging and save to file for trouble shooting commit 18b8a53c01b334e0f70610b1305d380b25dbe9c6 Author: Dale Wahl <[email protected]> Date: Tue Jul 2 11:41:36 2024 +0200 update Dockerfile to keep build environment useful for interactive upgrade commit 7b224b9b798c9aaf956b5b618b98d742c4a2e7cd Author: Dale Wahl <[email protected]> Date: Tue Jul 2 11:41:12 2024 +0200 remove docker-compose.yml versions commit acf5de0ed02e144b920a80abfdfa35986dd0ed4c Author: Stijn Peeters <[email protected]> Date: Mon Jul 1 17:38:32 2024 +0200 Better issues.md, footer link commit 1953ca3895656ca9a12d2657e58019795ae64b3a Author: Dale Wahl <[email protected]> Date: Mon Jul 1 12:00:07 2024 +0200 FIX: get_key() is more of a creating of a key then general getting of a key... commit 12289bb5c766d1af23799ff11278b46b48fc2841 Author: Dale Wahl <[email protected]> Date: Mon Jul 1 11:37:06 2024 +0200 .metadata.json may not have top_parent via Media Uploader This may exist in other processors if a proper check is not in place; will need to review commit 25f4ed65ec2c32298a90490cf51037a7ea2d0bf9 Author: Dale Wahl <[email protected]> Date: Tue Jun 25 14:43:40 2024 +0200 Media upload datasource! (#419) * basic changes to allow files box * basic imports, yay! * video_scene_timelines to work on video imports! * add is_compatible_with checks to processors that cannot run on new media top_datasets * more is_compatible fixes * necessary function for checking media_types * enable more processors on media datasets * consolidate user_input file type * detect mimetype from filename best I can do without downloading all the files first. * handle zip archives; allow log and metadata files * do not count metadata or log files in num_files * move machine learning processors so they can be imported elsewhere * audio_to_text datasource * When validating zip file uploads, send list of file attributes instead of the first 128K of the zip file * Check type of files in zip when uploading media * Skip useless files when uploading media as zip * check multiple zip types in JS * js !=== python * fix media_type for loose file imports; fix extension for audio_to_text preset; fix merge for some processors w/ media_type --------- Co-authored-by: Stijn Peeters <[email protected]> commit 4ce689bdc3e441a7adf85883ddcda6bae0525ed9 Author: Stijn Peeters <[email protected]> Date: Mon Jun 24 11:58:50 2024 +0200 Avoid KeyError commit 155522d0817d19ac7b6b0b0164242156d6f7443a Author: Dale Wahl <[email protected]> Date: Thu Jun 20 15:58:21 2024 +0200 add generated images to image wall w/ text visual commit eecde519eab1208eeb6ee53c2d8febff7fb8febf Author: Dale Wahl <[email protected]> Date: Thu Jun 20 15:57:56 2024 +0200 allow users to NOT generate all images from prompts commit d0b9574093a109997e63b1062b2bdd8e71300a29 Author: Stijn Peeters <[email protected]> Date: Wed Jun 19 16:28:26 2024 +0200 ...don't mangle URLs in preview links commit c105e368a521ec54ae717bb9eb2fe9fae66cf6e8 Merge: 0028a999 8d4f99b2 Author: Dale Wahl <[email protected]> Date: Wed Jun 19 16:25:36 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 0028a9994d698611dd8b546b9b3bccbeec30b74f Author: Dale Wahl <[email protected]> Date: Wed Jun 19 16:25:12 2024 +0200 add followups to processors commit 8d4f99b22e0308606c7f713ef704dfa939e85247 Author: Stijn Peeters <[email protected]> Date: Wed Jun 19 16:17:22 2024 +0200 More flexible URL linking in CSV preview commit f4f8e6621bd6f2504dc3afc2078280bf5edb6444 Author: Dale Wahl <[email protected]> Date: Wed Jun 19 13:54:00 2024 +0200 tokeniser fix: use default lang for word_tokenize if language is 'other' commit 127472e91d8e510f3de2a9cc4a87be6cf2d0deaa Author: Stijn Peeters <[email protected]> Date: Tue Jun 18 16:45:01 2024 +0200 Better log messages for Telegram data source commit e8714b6fba72e00c690a8d643d8dc54d2250c94a Author: Stijn Peeters <[email protected]> Date: Mon Jun 17 17:42:21 2024 +0200 Add 'crawl' feature to Telegram data source Fixes #321 (though might need a bit more testing) commit 25fded7b596097f7916e1793f1841bae2b63d453 Merge: d67cf440 b10e3bb8 Author: sal-phd-desktop <[email protected]> Date: Fri Jun 14 16:23:02 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit d67cf440730ea1d4e124c76a4c21d65b56f39c68 Author: sal-phd-desktop <[email protected]> Date: Fri Jun 14 16:22:59 2024 +0200 Fix export 4chan script and remove some unecessary code commit b10e3bb8f0c8a67aa5fdbba1962301d8acdf625c Author: Dale Wahl <[email protected]> Date: Thu Jun 13 15:14:06 2024 +0200 video_hasher prefix: fix extension type commit ba565cdaa2ebeecf23fd60889d546c76b9ea5eb1 Author: Dale Wahl <[email protected]> Date: Thu Jun 13 14:53:13 2024 +0200 video_hasher: fix to work with Pillow updates; add max amount videos commit 90da5d231eff6a4249bef5468fcdbf1ebcf9247a Author: Dale Wahl <[email protected]> Date: Thu Jun 13 10:25:24 2024 +0200 image_cat_wall fix the fix commit a8b943d8e2c5471f82ea0442e2659d84fe8d9760 Author: Dale Wahl <[email protected]> Date: Wed Jun 12 13:29:41 2024 +0200 add OCR processor to image w/ text visualization commit e7e636b6b89b6163fa6976e67edba68e7d75b7ac Author: Dale Wahl <[email protected]> Date: Tue Jun 11 15:23:12 2024 +0200 add image_wall_w_text to follow on BLIP captions commit f74b97827f0465baf8483040471a77e4654e70b1 Author: Dale Wahl <[email protected]> Date: Thu Jun 6 11:05:25 2024 +0200 image_category_wall: allow multiple images per item/post commit e3c9ea57d46b32ba47b00a6047a278ddd530adc1 Author: Dale Wahl <[email protected]> Date: Thu May 30 16:27:50 2024 +0200 image_category_wall convert None to str for category commit 00874576c354235f4655f1d433ec4382010e18e3 Author: Dale Wahl <[email protected]> Date: Thu May 30 14:54:51 2024 +0200 image_category_wall fix float categories commit e0c55a8ae132bedef5da27ecbbb9489a094d454c Author: Dale Wahl <[email protected]> Date: Thu May 30 12:51:42 2024 +0200 download_images fix divide by zero when user can download all commit 3580fc9450501262badb8e61ef4b4df4b4c54322 Author: Dale Wahl <[email protected]> Date: Thu May 30 12:51:24 2024 +0200 image_category_wall remove 'max' when user can use all images commit f2145bdeff1d68e46cdd3521ecbb61573f01a2f2 Author: Dale Wahl <[email protected]> Date: Wed May 29 17:59:23 2024 +0200 rank_attributes: option to count missing data or blanks commit 01e7ab9677a75181bbedc62fa00e636ce2b17c18 Author: Dale Wahl <[email protected]> Date: Wed May 29 16:53:57 2024 +0200 fix missing field strategy so default_stategy not overwritten on second loop default_stategy would be set to correctly to the callable, but overwritten on second loop (and map_missing is a dictionary at that point). commit 097f838af1f5f2748578dd9072eb9e3a8b3a7057 Author: Dale Wahl <[email protected]> Date: Tue May 28 12:16:08 2024 +0200 add log_level arg to 4cat-daemon.py I've been using this forever and don't know why I haven't commited it commit fd3ac238e60f052889d99c71588170570a384900 Author: Dale Wahl <[email protected]> Date: Tue May 28 10:10:56 2024 +0200 google & clarifai to csv had identical "type" possibly caused issue w/ preset commit 1b9965d40aa33035a73f685c13a1ab50cc877f78 Author: Stijn Peeters <[email protected]> Date: Mon May 27 15:54:20 2024 +0200 Ensure file cleanup worker always exists commit 0e0917f2232e240df3412fd4df51cf0be19248b5 Author: Stijn Peeters <[email protected]> Date: Thu May 23 17:36:22 2024 +0200 Also update Spacy model versions... commit f40128213529d154cfb77afa7aa67a72d5bb640f Author: Stijn Peeters <[email protected]> Date: Thu May 23 17:32:35 2024 +0200 *Actually* remove typing_extensions dependency ??? commit ba3d83b824c5fb6fcb0aec5e1c36b35070d6e5d9 Author: Stijn Peeters <[email protected]> Date: Thu May 23 17:30:08 2024 +0200 Update minimum Pillow dependency version commit 1c3485648bf2a911052eeeae4f293f303a944aec Author: Stijn Peeters <[email protected]> Date: Thu May 23 17:27:27 2024 +0200 Do not require typing_extensions explicitly This was required to ensure Spacy could load - looks like Spacy has since been updated to work with newer versions of typing_extensions as well commit 3828de83ba123254463a904392f24daec626c136 Author: Stijn Peeters <[email protected]> Date: Thu May 23 17:02:04 2024 +0200 Bump version commit 8f0d098107a4bbc9d55cc6048f7a38f1d1891a32 Author: Stijn Peeters <[email protected]> Date: Thu May 23 17:01:28 2024 +0200 Require non-broken version of emoji library commit 4b2ad805fcc99a83e46732fc991d98d78ef06c6c Author: Stijn Peeters <[email protected]> Date: Thu May 23 13:11:03 2024 +0200 Show worker progress in control panel if available commit 9144d4503f46108437616d6bc0cf4fde74df3aca Author: Stijn Peeters <[email protected]> Date: Thu May 23 11:07:41 2024 +0200 Bump version commit 807ab77101d197ec897640480a2140439d570c05 Author: Stijn Peeters <[email protected]> Date: Wed May 22 21:57:11 2024 +0200 Fix Instagram upload with missing media URL commit d0b4840fd465b6d21657c3d50f9291ac911b6082 Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:35:04 2024 +0200 Comma comma comma commit 7fd2e14c9505d0ed1ac77dc09c24f766ea61ee6c Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:25:26 2024 +0200 Fix progress indicator for scene extractor commit 661c42c2d083da7004335b0e14910935c3d392f6 Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:12:21 2024 +0200 Don't crash video hasher non non-str item IDs commit 1f280321cdde27a9909885fa2f64dbeffa549fb1 Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:09:53 2024 +0200 Do not crash timelines processor when metadata has unexpected format commit 572d03f1f368f0ad5f47e705a119b37646148d1d Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:09:30 2024 +0200 More efficient video frame extractor commit 1b51d224ca544d7e2913238adbff2049412bc41e Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:04:27 2024 +0200 Fix crash in video stack processor with ffmpeg < 5.1 commit ddc73cb2e2f0985e64f84ca86bc167fa9e9dc81a Author: Stijn Peeters <[email protected]> Date: Wed May 22 17:03:48 2024 +0200 Helper function for determining ffmpeg version commit ef9dd482b2258c428584997dc661156f63f68b91 Author: Stijn Peeters <[email protected]> Date: Wed May 22 12:14:58 2024 +0200 Allow absence of articleComponent in LinkedIn posts commit 060f2cd7f922e7fae337b0697f7c477442d21ef1 Author: Stijn Peeters <[email protected]> Date: Wed May 22 12:12:54 2024 +0200 Cast post IDs to string when mapping video scenes commit ab34c415c9ada23763b45676639ce3e80a34f594 Author: Stijn Peeters <[email protected]> Date: Wed May 22 11:46:39 2024 +0200 Twitter -> X/Twitter commit de6d97554ccb68375979e5ff09c7e65d8d70a6cd Author: Stijn Peeters <[email protected]> Date: Wed May 22 11:45:19 2024 +0200 Colleges -> Collages commit 30365580dc59b4d95e8a62d1b3c666bef60ce7e8 Author: Stijn Peeters <[email protected]> Date: Tue May 21 15:41:55 2024 +0200 Explicit disconnect after Telegram image download commit 5727ff7230db42463a824f45d63f0b8343caac14 Author: Stijn Peeters <[email protected]> Date: Tue May 21 14:05:50 2024 +0200 Catch TimedOutError while downloading Telegram images commit e0e06686e78976f971aac620267d7e009eaaadff Author: Sal Hagen <[email protected]> Date: Mon May 13 13:01:42 2024 +0200 Typo in LinkedIn search commit 51e58dde6ca21278a80f252a8c22dc83d87ace1f Author: Dale Wahl <[email protected]> Date: Tue May 7 13:10:43 2024 +0200 text_from_image: fix metadata missing (indent issue) commit c1f8ecc1674375bba2b2e38cb29c9d4d44098f0a Author: Dale Wahl <[email protected]> Date: Tue May 7 09:45:25 2024 +0200 text_from_image fix: ensure metadata success before attempting to update original commit 72dbf80db71499c59133e1128205b756d240b300 Merge: d7561625 baacc86b Author: Stijn Peeters <[email protected]> Date: Fri May 3 13:14:08 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit d7561625b127573fbb0332fbb713be6a3cb3d953 Author: Stijn Peeters <[email protected]> Date: Fri May 3 13:14:03 2024 +0200 Comments without replies don't always have reply_comment_total commit baacc86b269612b4b0956345f8b9fa902df1b61f Author: Dale Wahl <[email protected]> Date: Fri May 3 12:01:22 2024 +0200 DSM fix and simplify GPU mem check commit 9b662e9f9b4f4ce194608c8e20a8fc50bc6d9ae3 Author: Parker-Kasiewicz <[email protected]> Date: Thu May 2 00:53:45 2024 -0700 Adding Gab as a Data Source! (#401) * Can successfully import gab data, although can't tell if formatting is right becuase waiting on queued requests. * Version w/ different item types * Ingest Gab posts from Zeeschuimer * Small fix for merge conflicts (whoops) * Gab processing logic transferred from Zeeschuimer * fixing small errors for Gab data source * basic processing for truth social from Zeeschuimer --------- Co-authored-by: Dale Wahl <[email protected]> commit 3ecb8fd9c27aee4c457f03516794c6c4eac19c09 Author: Stijn Peeters <[email protected]> Date: Wed May 1 17:51:36 2024 +0200 Fix duplicate line in views_admin.py commit 8b66ae7e467913f8e7571cf4b45493f63804266f Author: Stijn Peeters <[email protected]> Date: Wed May 1 17:49:54 2024 +0200 Allow processors to define which fields should be pseudonymised commit c973750c8cabb8698704c5997903e92d1de866d2 Author: Stijn Peeters <[email protected]> Date: Wed May 1 17:15:32 2024 +0200 Allow auto-queue of pseudonymisation after import commit 49ad9f0ff785fd44ae494755b785c7fdf7c9cf15 Author: Stijn Peeters <[email protected]> Date: Wed May 1 17:08:35 2024 +0200 Get rid of redundant and buggy next/copy_to implementation in Search class commit 106d3659e2fda89867d3a4f587c1c1addfaff2f7 Author: Dale Wahl <[email protected]> Date: Wed May 1 16:14:03 2024 +0200 use current branch in settings commit 60bef4157d807f7c01ef3b425295244e91919f31 Author: Stijn Peeters <[email protected]> Date: Wed May 1 11:04:07 2024 +0200 Nicer code commit 4182c436e4fb5109c5e041dc729f77a58d877889 Author: Stijn Peeters <[email protected]> Date: Tue Apr 30 16:19:36 2024 +0200 Always shut down API worker only after everything else has been shut down commit e685108b3cbe5f005ce2df21906267071ad8118e Author: Stijn Peeters <[email protected]> Date: Tue Apr 30 16:12:42 2024 +0200 Properly interrupt expiration worker when asked commit 27a568eca7f2f3742223fef6285eaf80583e0fc4 Author: Stijn Peeters <[email protected]> Date: Tue Apr 30 13:40:50 2024 +0200 Allow floats-as-strings as timestamps when importing CSV commit 2d2bbb9fdb9b426b8f4a80782f04257721a97f2e Author: Dale Wahl <[email protected]> Date: Tue Apr 30 13:05:07 2024 +0200 douyin: add consistency to map_item stats commit 289aa342c9912aceeca35887c079c72aa6ffbf52 Author: Dale Wahl <[email protected]> Date: Mon Apr 29 15:26:38 2024 +0200 fix collection data in Douyin to handle $undefined commit 5b9b23fb1696bc1b69e1d902c0a2ad4b7d168984 Author: Dale Wahl <[email protected]> Date: Mon Apr 29 13:00:03 2024 +0200 add scipy requirement to make compatible with gensim https://stackoverflow.com/questions/78279136/importerror-cannot-import-name-triu-from-scipy-linalg-gensim commit 7eab746e944f1ababe3dcd6a5d25387a64c2237d Author: Stijn Peeters <[email protected]> Date: Mon Apr 29 12:00:09 2024 +0200 stupid, stupid, stupid commit 90577982ac05019a7ac76818a62f91e84dd65902 Author: Stijn Peeters <[email protected]> Date: Mon Apr 29 11:56:22 2024 +0200 Fix leftover iterate_mapped_items commit 57dbdf74c49c34c05784debb9f7e258da7ae7d54 Author: Stijn Peeters <[email protected]> Date: Fri Apr 26 15:26:39 2024 +0200 Woops commit f11760d2c13e817e23cfa5e26b24f74cf817f65e Author: Stijn Peeters <[email protected]> Date: Fri Apr 26 15:26:04 2024 +0200 Update list of supported platforms in readme commit 760ff1cdeb006f70acaa00ded82fb3cbc7617c9d Author: Stijn Peeters <[email protected]> Date: Fri Apr 26 12:13:28 2024 +0200 Bump version commit 1fd78b2362840299e80f5540c9fedc1be3b06da1 Author: Stijn Peeters <[email protected]> Date: Thu Apr 25 12:58:24 2024 +0200 Use MissingMappedField for Douyin fields undefined in the source data commit 6918baeabc7a08b6a63495c5d38c86b2c88bca44 Author: Stijn Peeters <[email protected]> Date: Thu Apr 25 12:31:11 2024 +0200 Fix Douyin mapping failure if cellRoom is $undefined commit aad6208167c07686348234daff4dcf9cd036f5a5 Author: Stijn Peeters <[email protected]> Date: Thu Apr 25 12:30:53 2024 +0200 Better error when trying to import data for unknown datasource commit 43c6ed646994111188bde66d5bcfe4ab602e8512 Author: Stijn Peeters <[email protected]> Date: Thu Apr 25 12:30:31 2024 +0200 Fix Twitter mapping on URLs that cannot be expanded commit 91c3da176fad90ba16871fa8892fac5a0df13785 Author: Stijn Peeters <[email protected]> Date: Thu Apr 25 12:12:54 2024 +0200 Safe cast to int in CrowdTangle import commit 765f29e9232afdf284ab1667b0f371951e0bf2f4 Author: Stijn Peeters <[email protected]> Date: Wed Apr 24 12:37:02 2024 +0200 Fix erroneous shell command in front-end restart trigger commit c99fdd9eca8f5925d93375cac846e8b7633194fb Merge: 342a4037 bc1deddf Author: Stijn Peeters <[email protected]> Date: Tue Apr 23 12:29:35 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 342a4037411e7ccaa50b25a4686434bec39e2568 Author: Stijn Peeters <[email protected]> Date: Tue Apr 23 12:29:32 2024 +0200 Enable TikTok comment and Gab import by default commit bc1deddf57aa5049fb79622c4309fb7051d77bdb Merge: 537d7645 3c644f01 Author: Dale Wahl <[email protected]> Date: Tue Apr 23 12:16:37 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 537d76456e2826e8c4dd7026ec5b2d436370fad8 Author: Dale Wahl <[email protected]> Date: Tue Apr 23 12:14:46 2024 +0200 do the todo: fix column_filter to match exact/contains with int commit 3c644f01baeca34e712d36efdf5c77ccd3ef7a06 Author: Stijn Peeters <[email protected]> Date: Tue Apr 23 11:16:07 2024 +0200 Don't crash on empty URLs in dataset merge commit f1574c26e2e3bdc40cc04bb8193cf6d3fa14792b Author: Dale Wahl <[email protected]> Date: Thu Apr 18 12:08:55 2024 +0200 fix: do not fail when no processor exists weird! failed on a dataset `type="custom-search"` which was created by an import script w/ no processor. Also likely would make deprecated processors fail. 500 server error: ``` File "/opt/4cat/common/lib/dataset.py", line 800, in get_columns return self.get_item_keys(processor=self.get_own_processor()) File "/opt/4cat/common/lib/dataset.py", line 405, in get_item_keys keys = list(items.__next__().keys()) File "/opt/4cat/common/lib/dataset.py", line 337, in iterate_items if own_processor.map_item_method_available(dataset=self): AttributeError: 'NoneType' object has no attribute 'map_item_method_available' ``` commit 50a4434a37d71af6a9470c7fc4a236b043cbfb4d Author: Stijn Peeters <[email protected]> Date: Wed Apr 17 14:30:58 2024 +0200 Add "TikTok comments" data source commit c43e76daae3c2e6ecdb218ee749315b985eccca4 Author: Stijn Peeters <[email protected]> Date: Tue Apr 16 17:59:25 2024 +0200 Allow notifications per tag commit 36984104e674e8577756bfc3fdd5c72f6569d9e1 Author: Dale Wahl <[email protected]> Date: Tue Apr 16 17:25:38 2024 +0200 fix: pass dataset to get_options when queuing processors commit 59cb19a3c88f7f4a4ac02d0b7a891afde50ea069 Author: Dale Wahl <[email protected]> Date: Tue Apr 16 10:55:29 2024 +0200 fix: dicts are shared in classes & you cannot delete a key more than once randomly found this; probably as no one else has reddit enabled! commit 3ec9c6ea471bcdbe9fb1caad1e5fe1502a705444 Author: Dale Wahl <[email protected]> Date: Mon Apr 15 13:22:19 2024 +0200 fix results page error when dataset was being created; do not check for resultspage updates when user not focused on page commit db05ae5e565248e865e67b8ea60e6653357bb1f4 Author: Dale Wahl <[email protected]> Date: Mon Apr 15 11:27:33 2024 +0200 on import file, differentiate between missing field(s) and unable to map item commit 940bac72c7e53bec9e136867c13e2a0a355961a4 Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 12:57:48 2024 +0200 Case-insensitive username/note matching in user list commit d0f34245bd07b5ad2fd3e90754ef0264ffc350a9 Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 12:29:12 2024 +0200 Only determine settings tab name in one place commit 9f69d7bc0bbb657be1e725d5fb3fe350b7205bff Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 12:20:34 2024 +0200 git != github commit 9b4981d8c7358f31ed65d9f161d556e578389801 Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 11:56:04 2024 +0200 Fix issues with user tags Fix number of users in tag overview; allow filtering by user tags on user list; don't delete all user tags when deleting one commit 9e8ccd3a78765acdfd2005eaa215dc0dc07266e0 Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 11:32:45 2024 +0200 Do not hide all non-hidden child processors lol commit 3f15410af3a278f5644f41f49e25498a1fac3c76 Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 11:23:52 2024 +0200 Disable standard video downloader for Telegram commit 94c814b9cab2ae2be10d5c5d3f6cfe20898e349c Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 11:14:16 2024 +0200 Telegram video downloader processor commit d36254a188947fff507e8df59f793e98b3be1570 Author: Stijn Peeters <[email protected]> Date: Fri Apr 12 11:14:04 2024 +0200 Better styling for 4CAT settings, alphabetic order, submenus commit 808300fa109f306a921f2048b2cf4b6dafc4ba5f Author: Stijn Peeters <[email protected]> Date: Thu Apr 11 14:44:32 2024 +0200 Fix multiselect in UI commit 131a0eca0ad514b1ee57803e5c560ab0e56de42d Author: Stijn Peeters <[email protected]> Date: Mon Apr 8 18:28:04 2024 +0200 Do not attempt to load crashed file as module in Slack webhook. Fixes #422 (hopefully) commit 6d8cb067bc12f8be68749f74a7291e0849494225 Author: Stijn Peeters <[email protected]> Date: Fri Apr 5 19:43:58 2024 +0200 Allow comma-separated list when adding new dataset owners commit 2612aea49f63c37ac691cc89c553c764ead2344f Author: Stijn Peeters <[email protected]> Date: Fri Apr 5 19:40:04 2024 +0200 Include number of users with tag on tag page commit 39f2ec40faa3b8493bd5525279aeaeb2e4f586e0 Author: Stijn Peeters <[email protected]> Date: Fri Apr 5 19:26:02 2024 +0200 Fix confirmation before deleting user tag commit b00a410a3441e7f2a9d73a9f2dfb0f4ef70ea8a5 Author: Stijn Peeters <[email protected]> Date: Fri Apr 5 19:25:01 2024 +0200 Add link to users with tag on tag admin page commit 3ef3e5ec9adbd8ddd128ce2b3f8fa3b1de1297e3 Author: Stijn Peeters <[email protected]> Date: Fri Apr 5 18:49:25 2024 +0200 Give filtered datasets a more sensible label, based on source dataset commit 0d5870b78fb73cb58231736cc8a2efbb0b3cd88a Author: Dale Wahl <[email protected]> Date: Fri Apr 5 17:40:57 2024 +0200 update iterate methods (#418) * working to make iterate_mapped_item primary method used by processors and elsewhere in 4CAT; iterate_item method only internally (and provide item directly as is from file) with iterate_mapped_object as intermediate method to use map_missing method and handle missing values as well as warn if needed * switch from iterate_items to iterate_mapped_items; careful attention to item_to_yield allowing a choice of the original item, the mapped item, or both * revert some unecessary renaming * fix annotations bug... this fixes the bug, but i noticed that the notations saved in the database do not have the correct post IDs. * Introduce DatasetItem class and simplify iterate_items * Don't crash when no item mapper * ...actually commit the DatasetItem class * Fix typos in comment --------- Co-authored-by: Stijn Peeters <[email protected]> Co-authored-by: Sal Hagen <[email protected]> commit 17b77351c51ace21b7057276bbae9da2643a3fc4 Author: Stijn Peeters <[email protected]> Date: Fri Apr 5 16:20:19 2024 +0200 Allow dynamic form options in processors (#397) * Allow dynamic form options in processors * Allow 'requires' on data source options as well * Handle list values with requires * Wider support for file upload in processors * Log file uploads in DMI service manager * fix error w/ datasources having file option * fix fourcat.js use of checkboxes for dynamic settings * Fix faulty toggleButton targeting --------- Co-authored-by: Dale Wahl <[email protected]> commit 693fcedc93ee4476a60d0e0876e688f82a8526fa Author: Dale Wahl <[email protected]> Date: Fri Apr 5 15:59:10 2024 +0200 Add method to processors to toggle display in UI (#411) * add ui_only parameter to DataSet.get_available_processors() and BasicProcessor.display_in_ui() Allow using `display_in_ui` to hide processors from UI but allow them to be queued either via API or presets. This avoids issue of is_compatible_with() having to be used to hide processors with sometimes ill effects. * keep same data structure.... * don't delete twice; it's redundant... and raises an error * Rename arguments/properties * Exclude hidden processors in top level view * fix logic * Exclude in child template as well --------- Co-authored-by: Stijn Peeters <[email protected]> commit 3cd146c2908da6b3a06a0c1511bf042c4223af0f Author: Dale Wahl <[email protected]> Date: Thu Apr 4 16:41:39 2024 +0200 fix: whoops remove debug commit daa7291e813e62fed4600a4acb8430004836cb86 Author: Dale Wahl <[email protected]> Date: Thu Apr 4 15:16:30 2024 +0200 CSV preview add hyperlinks if "url" or "link" in column header commit 5f2d6e65bad4f71b2c3cc75d2cdab76f15671d4c Author: Dale Wahl <[email protected]> Date: Thu Apr 4 15:16:01 2024 +0200 blip2 processor to work w/ DMI Service Manager commit fe881dec18778d99ac4a0f60ca40a1f43fdb1689 Author: Dale Wahl <[email protected]> Date: Thu Apr 4 09:53:30 2024 +0200 catch AttributeError on slackhook if unable to read file ever vigilant against a lack of flavour... commit 2808256b1fabf2e6e8a5a94aad98af60c50fb7b0 Merge: 14123847 eb474640 Author: Dale Wahl <[email protected]> Date: Wed Apr 3 17:28:40 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 14123847b5852bf0e7c84fced6c2380165ec93f6 Author: Dale Wahl <[email protected]> Date: Wed Apr 3 17:28:38 2024 +0200 staging_areas should not be made for completed datasets (else they may be deleted prematurely) commit eb474640559ee3e914d9c95adb60be09b906f1d6 Merge: bbdf2ab9 3f8b285c Author: sal-phd-desktop <[email protected]> Date: Wed Apr 3 16:50:54 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit bbdf2ab9b4292c14911ac01b481c829defa85e5c Author: sal-phd-desktop <[email protected]> Date: Wed Apr 3 16:50:36 2024 +0200 Helper script to export the 'classic' 4CAT 4chan data commit 3f8b285c44c33a3ce08e885889b311bc454a70ea Merge: 8f40f3f5 f7cc5b8d Author: Sal Hagen <[email protected]> Date: Wed Apr 3 12:12:17 2024 +0200 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 8f40f3f5222a63e93f46eb3b57791d10060a0cc8 Author: Sal Hagen <[email protected]> Date: Wed Apr 3 12:12:13 2024 +0200 Tumblr search typo commit f7cc5b8d012dec3d8e0c8847ae16c662e82040b5 Author: Stijn Peeters <[email protected]> Date: Tue Apr 2 12:32:51 2024 +0200 More/less flavour in restart worker commit 073587efc581adca0608988573ac83ea8b0c93d0 Author: Dale Wahl <[email protected]> Date: Wed Mar 27 14:15:27 2024 +0100 create favicon.ico (remove from repo) be sure to keep webtool/static/img/favicon/favicon-bw.ico as basis commit 28d733d56204231f4089660ff61282174aac7aed Author: Dale Wahl <[email protected]> Date: Wed Mar 27 09:44:45 2024 +0100 add allow_access_request check to request-password page clicking it would only return the user to the login page anyway, but better not even show it commit 1f2cb77e3cb0fc9b5403da52aaa925b33089d18f Author: Dale Wahl <[email protected]> Date: Wed Mar 27 09:37:51 2024 +0100 fix can_request_access to use 4cat.allow_access_request option commit 0d66f11d3619af798d5acc41dbf4fe118b7ddad8 Merge: 25825383 05b3fc07 Author: Stijn Peeters <[email protected]> Date: Tue Mar 26 17:54:48 2024 +0100 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 2582538303e31470ed6bf8a01645f7b45af15e5d Author: Stijn Peeters <[email protected]> Date: Tue Mar 26 17:54:45 2024 +0100 More permissive timeout for pixplot commit 05b3fc0771ded10dc55db799e8f47e42add08d43 Author: Dale Wahl <[email protected]> Date: Tue Mar 26 14:01:59 2024 +0100 remove redundant call of Path commit e4a93442efb84d73d6a4c9af9bc46a8f3e3fdda2 Author: Stijn Peeters <[email protected]> Date: Tue Mar 26 11:52:09 2024 +0100 Include column with link description in Telegram mapping commit 876f4a4b6df51ec4b30a048c32191438b6778f90 Author: Dale Wahl <[email protected]> Date: Mon Mar 25 14:48:47 2024 +0100 douyin handle image posts commit 81ad61baabaf965b1c848f55a80c23bd3e1a9000 Author: Stijn Peeters <[email protected]> Date: Mon Mar 25 08:01:44 2024 +0100 Accept non-numeric IDs in Telegram image downloader commit a8b36dc5682df7c16e25474ea8fdbfc4f12f9d46 Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 23:15:51 2024 +0100 Ensure unique IDs for Telegram datasets commit 4a3e9ffee072c4d3efb7bfd8744369b46f19eef2 Merge: 0c119130 d749237e Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 22:56:59 2024 +0100 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit 0c11913049aabb5a83ffe26d58bdf17affdbc0b9 Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:09:10 2024 +0100 Better string formatting in Telegram image downloader commit 8a7da5317defdafb5bdbf74dcbeb68e464fa21f4 Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:06:06 2024 +0100 Add 'link thumbnails' option to Telegram image downloader commit a0baae17d8f11e4cae7cc261f8d406b1b1ce628a Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:05:24 2024 +0100 Add 'Fetch URL metadata' processor commit b9a0668f35c6d1fc5bfb42e1ae706418cbe6e0a7 Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:05:15 2024 +0100 Update ural dependency commit a28036186f5d35e435cade7638ed35361054967e Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:05:08 2024 +0100 Add emoji library dependency commit bb50fc946fb6cdd8454969514bdc6d5ecf3f3530 Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:04:59 2024 +0100 Add 'emoji' option to Count Values processor commit e653e3d8fb9c01697d96316df6f7634454671191 Author: Stijn Peeters <[email protected]> Date: Sun Mar 24 20:04:42 2024 +0100 Add 'forwards', 'reactions', 'link_title', 'link_attached' columns to mapped Telegram items commit d749237ec5c103b286ba8086904e405e232fc14c Author: Dale Wahl <[email protected]> Date: Fri Mar 22 11:02:14 2024 +0100 telegram: sp too? this is why i test locally first... commit 9d7d27c61425bbbbccd18a8e3de35ab372dbfbf3 Author: Dale Wahl <[email protected]> Date: Fri Mar 22 10:58:48 2024 +0100 telegram: missed reference to options commit c1671ce0ef69c71c81c3ae69a59e4ad7dc1bda79 Author: Dale Wahl <[email protected]> Date: Fri Mar 22 10:49:02 2024 +0100 telegram fix: class dictionaries are shared between all workers admin calls get_options and `del options["max_posts"]["max"]` runs, then normal user calls get_options and there is no longer max. could also copy cls.options, but not sure why we cannot create the options in `get_options`. commit cd2e74d251491a93bc66dc7a64e8b2a60b0ed8ae Author: Stijn Peeters <[email protected]> Date: Wed Mar 20 11:10:30 2024 +0100 Make Telegram max entities a setting commit 38fcabb81da956e5513bd0246ee086d1ab4896c9 Author: Stijn Peeters <[email protected]> Date: Fri Mar 15 18:47:59 2024 +0100 Make metrics table use BIGINT Folder size may not fit otherwise! commit 34013cb91eed7fac725defd408b67bddee4b806b Author: Stijn Peeters <[email protected]> Date: Fri Mar 15 18:37:10 2024 +0100 Fix duplicate stats in metrics table commit c8ad90b3436cff600320d3b2efdf6144240ea59d Author: Stijn Peeters <[email protected]> Date: Fri Mar 15 18:14:39 2024 +0100 Calculate disk use stats via worker instead of on demand commit e4e0c4e3a375bf14bdca7b633231b60e34c322e0 Author: Stijn Peeters <[email protected]> Date: Thu Mar 14 10:25:23 2024 +0100 Spelling thing commit ae1c00fb3a521a2c3258b2597b04322d202c3ee7 Author: Stijn Peeters <[email protected]> Date: Thu Mar 14 10:25:10 2024 +0100 Disable direct editing of tag order commit e3ce81452ad8ee3231309383c24fb26e553b0dff Merge: fa3be93b a7b5820c Author: Dale Wahl <[email protected]> Date: Wed Mar 13 16:25:46 2024 +0100 Merge branch 'master' of https://github.com/digitalmethodsinitiative/4cat commit fa3be93bafef17e95881207604efa1212d562d9e Author: Dale Wahl <[email protected]> Date: Wed Mar 13 16:25:43 2024 +0100 instagram: check both user and owner for full_name commit a7b5820c9f2acb5081ef80ea0293f42ee91925a3 Author: Dale Wahl <[email protected]> Date: Tue Mar 12 15:59:43 2024 +0100 proposed fix to results filter (#417) * proposed fix to results filter * do not filter datasources at all for results/ view commit b930b6e964b460ef5160398c6cd1038f766b0548 Author: Dale Wahl <[email protected]> Date: Mon Mar 11 12:00:12 2024 +0100 remove unused code the `can_preview` attribute does not appear to exist so this is always hidden commit 97cd2d52966bd751da704a4a06cfa5478f999885 Author: Dale Wahl <[email protected]> Date: Mon Mar 11 11:51:28 2024 +0100 faster collection of folder size for admin panel was between five and six times faster in my tests around 11G of data files) commit 108fd28b594a95b94727ccc601fec59da61a8d3d Author: Dale Wahl <[email protected]> Date: Thu Mar 7 11:09:33 2024 +0100 typo fixes, log fix commit 44848a8f4b9fea07e7f9ce03e4fe0d696d5f1d27 Author: Dale Wahl <[email protected]> Date: Wed Mar 6 10:17:34 2024 +0100 fix tf_idf - sometimes less results than max commit e5f1f703247a5763d3d0e03c44ee31ab60b8a8ed Author: Dale Wahl <[email protected]> Date: Wed Mar 6 09:33:21 2024 +0100 fix image downloader failing on 4chan images we do not often rename datasources, but when we do... commit f5e50d508096729bccdc0dafa460f83c419c2606 Author: Stijn Peeters <[email protected]> Date: Tue Mar 5 16:23:34 2024 +0100 Version 1.39 -> 1.40 commit 4b3e4efa25914f5f9509f69596a82935440e5f9f Author: Stijn Peeters <[email protected]> Date: Mon Feb 26 18:28:15 2024 +0100 Add 'safe' parameter to get_item_data commit b98f62ab6a3a21815cc0fa899cdca1d48eab0fdb Author: Stijn Peeters <[email protected]> Date: Mon Feb 26 18:27:57 2024 +0100 Use iterate_mapped_items in dataset view commit 6d9baa9c228168dce7fe946681c95d471d45c6e0 Author: Stijn Peeters <[email protected]> Date: Mon Feb 26 18:27:33 2024 +0100 Update TikTok downloader for new item mapper commit 1622ec660754582eb2791f0d114df76e71640370 Author: Dale Wahl <[email protected]> Date: Mon Feb 26 12:51:31 2024 +0100 flawless was removed from dataset class, but used by telegram adding back to fix telegram, but perhaps it should be changed commit 84168e945e2ecf963cfdac3409d60544b521f694 Author: Dale Wahl <[email protected]> Date: Wed Feb 21 15:56:24 2024 +0100 webtool checks for gunicorn and if exists sets up error log this normally only ran in Docker commit 7119862feac1e9993b8dedccc59887830e7715a1 Author: Stijn Peeters <[email protected]> Date: Tue Feb 20 18:36:21 2024 +0100 Use MappedItem in ML processors commit 32b8790420af8572f4a3db2d2bc8ffd696872114 Author: Stijn Peeters <[email protected]> Date: Tue Feb 20 16:58:22 2024 +0100 Map items to objects instead of dicts (#409) * Consistent parameter name for map_item() * Wrap mapped items in MappedItem() object * Keep track of import warnings in search.py * Add warning when mapping a tweet with missing metric data * Add new iterate_mapped_objects method * Log mapping warnings when merging datasets * Pass object instead of dict * Clarify Twitter warning * Documenting MappedItem * Explain things to myself * R…
- Loading branch information