Skip to content

nighttttrain/JDs-BigDataAnalysis-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

JDsAnalysis

The task is to use Pyspark to solve a big data problem. In this project, the Databricks for Community Edition is used. Amazon Web Services (AWS) is chosen as the cloud provider. A notebook is created within the Databricks workplace with PySpark using a cluster (12.2 LTS (Scala 2.12, Spark 3.3.2)).

  • Work out the frequencies with which distinct skills are mentioned in job descriptions, and present the top 10 skills, alongside the frequency of each across the entire dataset; check how the distribution of the frequencies with which distinct skills are mentioned in JDs changes if lowercase all the skills
image image
  • Find the 5 most frequent numbers of skills in JDs across the dataset
image
  • Join the skills from JDs in the O*NET dataset to gain more insight
image image
  • Find the 10 most frequent “Commodity Title” across all the job descriptions
image

About

This is to use Pyspark to solve a big data problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published