JDsAnalysis

The task is to use Pyspark to solve a big data problem. In this project, the Databricks for Community Edition is used. Amazon Web Services (AWS) is chosen as the cloud provider. A notebook is created within the Databricks workplace with PySpark using a cluster (12.2 LTS (Scala 2.12, Spark 3.3.2)).

Work out the frequencies with which distinct skills are mentioned in job descriptions, and present the top 10 skills, alongside the frequency of each across the entire dataset; check how the distribution of the frequencies with which distinct skills are mentioned in JDs changes if lowercase all the skills

Find the 5 most frequent numbers of skills in JDs across the dataset

Join the skills from JDs in the O*NET dataset to gain more insight

Find the 10 most frequent “Commodity Title” across all the job descriptions

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
JDsAnalysisCode.ipynb		JDsAnalysisCode.ipynb
JDsAnalysisCode.zip		JDsAnalysisCode.zip
JDsAnalysisData.zip		JDsAnalysisData.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JDsAnalysis

About

Releases

Packages

Languages

nighttttrain/JDs-BigDataAnalysis-pyspark

Folders and files

Latest commit

History

Repository files navigation

JDsAnalysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages