Dr Padma Murali
5 min readJun 3, 2021

--

An Analysis on Salary Data using ANOVA technique

ANOVA is a technique which belongs to the domain called “Experimental Designs”. It helps in establishing in an exact way, the Cause- Effect relation between variables. From the statistical inference point of view, ANOVA is an extension of independent t test for testing the equality of two population means. When more than two population means have to be compared, ANOVA technique is used. In this case, the null hypothesis( H0) is defined as

H 0 : µ 1 = µ 2 =µ 3 =µ 4 =……=µ k for testing the equality of population means for k populations where µ denotes the mean of the population.

In this work, an analysis of salary data has been performed and the results and business insights drawn are listed.

Salary is hypothesized to depend on educational qualification and occupation. To understand the dependency, the salaries of 40 individuals [SalaryData.csv] are collected and each person’s educational qualification and occupation are noted. Educational qualification is at three levels, High school graduate, Bachelor, and Doctorate. Occupation is at four levels, Administrative and clerical, Sales, Professional or specialty, and Executive or managerial. A different number of observations are in each level of education — occupation combination.

We first perform One way ANOVA.

One way ANOVA(Education)

Null Hypothesis 𝐻0: The mean salary is the same across all the 3 categories of education (Doctorate, Bachelors, HS-Grad).

Alternate Hypothesis 𝐻1: The mean salary is different in at least one category of education.

One way ANOVA(Occupation)

Null Hypothesis 𝐻0: The mean salary is the same across all the 4 categories of occupation(Prof-Specialty, Sales, Adm-clerical, Exec-Managerial).

Alternate Hypothesis 𝐻1: The mean salary is different in at least one category of occupation.

One way ANOVA for ‘Education’

The above is the ANOVA table for Education variable.

Since the p value = 1.257709e-08 is less than the significance level (alpha = 0.05), we can reject the null hypothesis and conclude that there is a significant difference in the mean salaries for at least one category of education.

One way ANOVA for ‘Occupation’

The above is the ANOVA table for Occupation variable.

Since the p value = 0.458508 is greater than the significance level (alpha = 0.05), we fail to reject the null hypothesis (i.e. we accept H0) and conclude that there is no significant difference in the mean salaries across the 4 categories of occupation.

To find out which class means are significantly different, the Tukey Honest Significant Difference test is performed.

Using, the Tukey Honest Significant Difference test, we get the following table for the category education:

Tukey HSD for variable ‘Education’

The table shows that since the p- values(p-adj in the table) are lesser than the significance level for all the three categories of education, this implies that the mean salaries across all categories of education are different.

For the category occupation, the Tukey Honest Significant Difference test has further confirmed that the mean salaries across all occupation classes are significantly same. The table below confirms the same, wherein we see that all p-values are greater than 0.05.

Tukey HSD for variable ‘Occupation’

We analyze the effects of one variable on the other (Education and Occupation) with the help of an interaction plot.

Interaction Plot

The interaction plot shows that there is significant amount of interaction between the categorical variables, Education and Occupation.

The following are some of the observations from the interaction plot:

· People with HS-grad education do not reach the position of Exec-managerial and they hold only Adm-clerk, Sales and Prof-Specialty occupations.

· People with education as Bachelors or Doctorate and occupation as Adm-clerical and Sales almost earn the same salaries(salaries ranging from 170000–190000).

· People with education as Bachelors and occupation as Prof-Specialty earn lesser than people with education as Bachelors and occupations as Adm-clerical and Sales.

· People with education as Bachelors and occupation Sales earn higher than people with education as Bachelors and occupation Prof-Specialty whereas people with education as Doctorate and occupation Sales earn lesser than people with Doctorate and occupation Prof-Specialty. We see a reversal in this part of the plot.

· Similarly, people with education as Bachelors and occupation as Prof-Specialty earn lesser than people with education as Bachelors and occupation Exec-Managerial whereas people with education as Doctorate and occupation as Prof-Specialty earn higher than people with education as Doctorate and occupation Exec-Managerial. There is a reversal in this part of the plot too.

· Salespeople with Bachelors or Doctorate education earn the same salaries and earn higher than people with education as HS-grad.

· Adm clerical people with education as HS-grad earn the lowest salaries when compared to people with education as Bachelors or Doctorate.

· Prof-Specialty people with education as Doctorate earn maximum salaries and people with education as HS-Grad earn the minimum.

· People with education as HS -Grad earn the minimum salaries.

· There are no people with education as HS -grad who hold Exec-managerial occupation.

· People with education as Bachelors and occupation, Sales and Exec-Managerial earn the same salaries.

We next perform two way.

Two way ANOVA

𝐻0: The effect of the independent variable ‘education’ on the mean ‘salary’ does not depend on the effect of the other independent variable ‘occupation’ (i. e. there is no interaction effect between the 2 independent variables, education and occupation).

𝐻1: There is an interaction effect between the independent variable ‘education’ and the independent variable ‘occupation’ on the mean Salary.

By performing two way ANOVA, we get the following table:

Two way ANOVA table

From the table, we see that there is a significant amount of interaction between the variables, Education and Occupation.

As p value = 2.232500e-05 is lesser than the significance level (alpha = 0.05), we reject the null hypothesis.

Thus, we see that there is an interaction effect between education and occupation on the mean salary.

From the ANOVA method and the interaction plot, we see that education combined with occupation results in higher and better salaries among the people. It is clearly seen that people with education as Doctorate draw the maximum salaries and people with education HS-grad earn the least. Thus, we can conclude that Salary is dependent on educational qualifications and occupation.

--

--

Dr Padma Murali

Senior AI Research Scientist with 19 years experience working in AI/ML,NLP, Responsible AI & Large Language Models