Skip to content
matthewputra edited this page Dec 5, 2019 · 25 revisions

College Value Based On Starting Salary

Team Members

Affiliation

Info-201: Technical Foundations of Informatics
The Information School
University of Washington
Autumn 2019

Problem Situation

The stakeholders of this problem situation are: students, colleges, and employers. Indirectly affected by skyrocketing tuitions are parents, who often bear the brunt of college costs. The setting of the problem is the United States in 2019. The values and tensions at play are the opportunities afforded by college and the costs of attending college. There is an element of prestige in attending a good college and earning a high income. Relevant ethics and policies around the issue include government support of education, greed in the educational industry, and student-lending practices.

What is the problem?

In recent decades, the price of college has skyrocketed while the median income coming out of college has not experienced a dramatic increase. As such, the value of college, particularly the degrees being obtained, are facing scrutiny. In other words, the problem facing many Americans: is it worth it to attend college? If so, which colleges and which majors offer the best changes of financial success?

Why does it matter?

The rising issue of expensive college tuition and student loans has affected millions–both in the short and long term–including ourselves. In this situation, it would be crucial to evaluate whether the investment we are making into education is actually "worth" its cost. By investigating whether the cost of college increases the "worthiness" of college, students are able to make better decisions about their education.

How will it be addressed?

The problem will be addressed by analyzing the data set while providing the context of the overall issue and impact. The analysis compares the ratio between different college tuition and the starting salary of students from various colleges. The ratio will then be used to calculate the value of a college, demonstrating if better a “better” life–defined by financial status in this case–is guaranteed with the price of a college education.

Research Question

  1. How is the value of each college affected by the starting salary?
  2. How is the value of a major affected by the starting salary?
  3. Which area (for working) compensates for the value of college?

The Data Set

The original data set that we will use is called "Where it Pays to Attend College". The Wall Street Journal created the Kaggle page, but the original data set is also present on the Wall Street Journal website. It was originally created to demonstrate which college, region, major, and type of school "leads" to the highest starting salary.
This file contains 3 data sets: the starting salary by degrees, by college, and by region. The data sets we are using are well organized and have high credibility. However, there are majors and schools that are excluded; desired values are chosen for a specific reason. There are also possibly hidden political realities in the choosing of these values, remaining a mystery.
Our data, as mentioned, is very organized and clean. There were no missing values that we needed to deal with, but the data set did lack some information we needed. We will be using additional data sets to fill in other values we need (college tuition and income tax by state). More information about these data sets can be found in Appendix 1: Data Dictionary below.

Data Set 1: "Degrees that Pay Back"

  • Each observation (50 majors) represents one major.
  • Variables:
    • Undergraduate Major
    • Starting Median Salary
    • Mid-Career Median Salary
    • Percent change from Starting to Mid-Career Salary
    • Mid-Career 10th Percentile Salary
    • Mid-Career 25th Percentile Salary
    • Mid-Career 75th Percentile Salary
    • Mid-Career 90th Percentile Salary
  • More specific or uncommon majors were excluded.

Data Set 2: "Salaries by College Type"

  • Each observation (269 schools) represents one school.
  • Variables:
    • School Name
    • School Type
    • Starting Median Salary
    • Mid-Career Median Salary
    • Mid-Career 10th Percentile Salary
    • Mid-Career 25th Percentile Salary
    • Mid-Career 75th Percentile Salary
    • Mid-Career 90th Percentile Salary
  • Not all colleges across the US have been included.

Information Visualizations

Our main approach for visualizing the data sets is using maps.

  • The rise of college tuition over the past 50 years will be displayed on a scatterplot.
  • The mid-career salaries will be displayed by each percentile through a bar graph along with the first bar graph.
  • The income tax of each state will be plotted depending on the starting salary of each school/major and the tax bracket (low income, middle income, high income, etc). This will help the users determine the most "efficient" place to work.

Shiny Application

In this shiny application, we load our data using the read.csv function. There are 3 datasets that we use, all of them are stored in a “data” folder inside the repository. The major libraries that we use are shiny, ggplot2, and dplyr. We use the shiny library in order to create the shiny application. The ggplot2 is used to create a scatter plot between Starting Salary and Mid Career Salary. The dplyr library is used in for easier data wrangling. In this project, we instead of creating one app.R file, we split the file into two: ui.R and server.R. The point of splitting this file is to create a better organization between the ui code and the server code. Moreover, we also created an analysis file where we perform our data wrangling and simply used the variables and functions from this file in the server file. We also created our own data folder which consists of our data sets. This is done in order for a better organization between our code file and data sets.

Conclusion

The project’s strengths are that it is extremely relevant. For one, students at the University of Washington often determine their major based on median salaries post graduation. Our discussion of student debt in America is also relevant to our place and time. Furthermore, the project strongly reveals the relationships between college, major, median income, and region through visualizations such as maps and bar graphs. We obtained our data from a robust and reliable news source. The project’s weaknesses are that it could use more specificity. For instance, the dataset broadly categorizes colleges into U.S. regions, not states. Furthermore, it’s harder for us to geographically analyze the dataset because we are not provided the coordinates of each school.
Key takeaways from our research are that the most lucrative professions tend to be engineering related. Case in point: chemical, computer, electrical, and mechanical engineering are the top five best paying majors out of college. However, the best paying degree out of college is actually Physician Assistant. Chemical engineering offers among the best mid-career salary, in large part due to its high demand in the petroleum industry. Spanish, religion, education, and interior design ranked among the worst majors for starting salary. The colleges that offered the best starting salary were California Institute of Technology, Massachusetts Institute of Technology, Harvey Mudd, Princeton, and Harvard. Schools in California and the northeastern corridor were the best regions to attend college for good salary. Finally, our analysis revealed that the price of college has nearly tripled in the last fifty years. Future work to be pursued in this subject may include income disparities between the coastal regions and the rural parts of America. Further analysis could be done on the resources provided to public institutions compared to private ones, or funding in rural states versus urban areas.

References (Other Data Sets Used)

Main Data Set
Income Tax Data Set 1
Income Tax Data Set 2
Tuition by Year

Appendix 1: Data Dictionary

Variable Name Description Data Type Measurement Type
Starting Median Salary The median starting salary of a student just out of college integer numerical
Tuition The annual cost to attend a university integer numerical
Undergraduate Major Major being pursued at university string categorical
Region The region of the United States (e.g. Midwest, Northeast) string categical
School Name Name of university string nominal

Appendix 2: Reflections

Daniel

I've learned a great deal about collaboration in coding. Namely, how to use GitHub for version control and merge conflicts. More profoundly, I've learned how to divide programming and writing tasks to optimize for different skills, time constraints, and interests. Technically speaking, I've also learned a great deal about R and how to build a Shiny app. The data visualization aspects of this class informed my ability to design graphs and maps in our web application. It was satisfying finally getting the code to run as intended. Most frustrating of all was having to learn many aspects of the app on our own, through tedious web searching and trial and error. In the future, I would organize our workload and our production timeline better. This project felt rushed and like controlled chaos at times. Nevertheless, the project still positioned me to think creatively about visualizations and aspects of data that could be analyzed, visualized, and related.

June

So far, I've learned more about the Shiny app and the collaborative process of coding. Some frustration came along with coding enigmas–trying to create a Shiny app from scratch-but also from trying to prevent merge conflicts. Also, it was sometimes disappointing to discard ideas due to the lack of skill or knowledge about the app itself. Some things just weren't possible to execute. However, I think our team overcame this by presenting alternatives to original ideas. After going through these struggles, it was really satisfying to see our app completed and running after hours of work. One thing I would do differently is changing our working schedule; I think we could've worked more ahead. This project has not only helped me as a coder–by presenting a whole new challenge–but also as a thinker-by pushing me into a new experience of working with teammates to code. This is my first time creating anything related to data science, and it's been an interesting ride!

Matthew

From this project I really learned deeply how to create an application in R Shiny. Moreover, I realize the difference between a user interface and a server. This has also taught me how a fluid website works. In the future what I would do differently is not to do the programming part last minute. Moreover, I realize that I need to do an extensive research on the dataset itself. I realize that one of the most important part in order to analyze something is to understand the dataset first. From this project I see myself mostly as coder because I really enjoy creating something and putting my thoughts into code.

Appendix 3: Use of Envisioning Cards

Envisioning cards were used throughout the creation of this project.

  • Stakeholders: Indirect Stakeholders We considered who would be affected by our project: colleges included in our project, college students, college applicants, and employers.
  • Values: Evaluate User Experience of Values The values of the users may be affected by how we define the "value" of each college or major. Since we are measuring the "value" in financial terms, some users that value education more than financial aspects may feel that their view on "value" may change.