The main analysis draws from data scraped and cleaned by Shanshan Lu from Kaggle. This Indeed dataset originates from Indeed website, containing 7,000 data scientist jobs around the U.S. by August 3rd, 2018. Main variables include Company Name
, Position Name
, Location
, Job Description
, and Number of Reviews of the Company
. We mainly squared at the job description column that contains information such as a short description of company and position, requirement and route of application.
Based on the ranking of total revenues of each company’s retrospective fiscal year, Fortune magazine’s annual report of top 500 largest companies in the U.S has always been regarded as a reliable measurement for the value of a company. Many of the Fortune 500 companies now have a job title of Chief Data Scientist or Head of Analytics, and some Internet magnets have invested much on data mining, Artificial Intelligence or related areas.
flag
to indicate whether each company falls into Fortune 500 companies category or not, this full dataset will be adopted for our Exploratory analysis.In this project, main r projects we adopted to create the content are: