Skip to content

Conversation

@diehl
Copy link

@diehl diehl commented Jul 14, 2020

Additional data elements that are now collected per post:

  • Post creator
  • Post creation datetime
  • Post creation like count
  • Post creation share count
    -- Previously collected inconsistently as a string. Now collected reliably as an integer.
  • Post creation comment count
  • Complete post text
    -- If a post was being shared by a FB user and additional text was added in the act of sharing, that text was lost. Fixed now.

Fixed a bug in the collection of comment threads. In the previous implementation, comment text was saved in dictionaries that were indexed by the comment author. This would result in dropped content when the same FB user would post multiple times in the comment thread.

The code has been refactored a bit as well to allow the contents of the web scraping to be read from disk and parsed. The contents of the web scraping is saved to disk prior to parsing in case there's an error downstream. This allows for subsequent debugging.

…ments, number of shares, number of likes, and the full comment history. Also added the capability to parse the html separately after acquiring the page source which is now written to a binary file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant