Skip to content

barahona-research-group/Adzuna_skills_clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Skills_clustering

Code and data for Patterns of co-occurrent skills in UK job adverts

Overview

This repository contains two main files:

  1. Jupyter Notebook: This notebook provides a step-by-step guide on how to process raw JSON files from Adzuna to generate a co-occurrence matrix. The matrix is then used for carrying out the clustering analysis by PyGenStability.
  2. Excel Spreadsheet: This spreadsheet consists of the two optimal clusterings from the co-occurrence matrix obtained by PyGenStability.

Repository Contents

Jupyter Notebook

  • File Name: tutorial.ipynb
  • Description: This notebook walks you through the process of converting raw JSON data into a co-occurrence matrix. It includes data loading and matrix generation.

Key Steps in the Notebook:

  1. Get co-occurrences from raw Adzuna data: Instructions on how to load and extract skills co-occurrences from raw JSONs after deduplication.
  2. Convert to Pandas dataframe: Creating a Pandas dataframe in which the columns and rows are skills and the entries are the number of co-occurrences.

Excel Spreadsheet

  • File Name: optimal_clusterings_fulldata.xlsx
  • Description: This spreadsheet contains two sheets that consist of the clustering of the skills into 7 clusters and 21 clusters.
  • Columns: Lightcast_skills contains the unique skills in the job adverts. Cluster contains the clusters of the skills.

Cite

Please cite our paper if you use this code in your own work:

@article{liu2025patterns,
  title={Patterns of co-occurrent skills in UK job adverts},
  author={Liu, Zhaolu and Clarke, Jonathan M and Rohenkohl, Bertha and Barahona, Mauricio},
  journal={PLOS Complex Systems},
  volume={2},
  number={2},
  pages={e0000028},
  year={2025},
  publisher={Public Library of Science San Francisco, CA USA}
}

About

Code and data for Patterns of co-occurrent skills in UK job adverts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published