Skip to content

Simple, pseudo-generic tooling for extracting user Info from LinkedIn

Notifications You must be signed in to change notification settings

smcclure17/linkedin-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rugby Alumni LinkedIn Scraper

Simple, pseudo-generic tooling for extracting user Info from LinkedIn profiles, and posting that data to Google Sheets.

Setup

Create a Conda env. and install dependencies

  • If necessary, download and install Anaconda, a Python environment handler.

  • Create an environment (I'll call mine rugby-alumni for this example),

    conda create -n rugby-alumni python=3.9
  • Say yes to anything it asks you, then activate your environment,

    conda activate rugby-alumni
  • Run the makefile to install the necessary dependencies,

    make setup-dev

Setup gsheet service account credentials

  • Ask Sean (McClure) for a copy of the ghseet service account key

  • Folow the step 7 in the gspread package documentation to put the account key in the correct location such that the library can access it:

    Move the downloaded file to ~/.config/gspread/service_account.json. Windows users should put this file to %APPDATA%\gspread\service_account.json.

  • Test that everything was successful by running the following in an interactive python window:

    from gsheet.google_sheet_helpers import init_service_account
    init_service_account()

    NOTE: I really don't like this method, and we should find a way to use a standard env variable or something.

Setup LinkedIn account credentials

This repo relies on LinkedIn credentials (email and password) stored in environment variables to make requests against the LinkedIn API/server. I have created an empty/new account specifically for this so that we don't use our real accounts, reach out to me (Sean McClure) for the credentials.

Setting environment variables

To set the environment variables such that the code can access them, on Mac or Linux, run:

export LINKEDIN_USER="{email address here}"
export LINKEDIN_PASS="{password here}"

Note that this will only store the variables for the current terminal session. These will need to be reinitialized each time you open/close the terminal. To make these permanent, you will need to add them to either your ~/.zshrc or ~/.bash_profile.

These variables are then accessed by the Alum class.

Usage

The main usage at the moment is to update a google sheet with personal data from LinkedIn profiles. We specify the profiles we want to scrape with URLS in a csv (by default data/alumni_urls.csv).

To run, we can use command line tools from the invoke package.

To update the google sheet with data from the urls in data/alumni_urls.csv, run:

inv update-from-csv-urls

To update from any other csv, run:

inv update-from-csv-urls --file "path/to/file.csv"

At the moment, there is near-zero error handling, so things may go very wrong.

About

Simple, pseudo-generic tooling for extracting user Info from LinkedIn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published