The mission of this site is to share and discuss information about applications of data science in the field of human resource development. Dharc defines data science as “the study of the generalizable extraction of knowledge from data.” Information is shared as blog postings, about which your comments and discussion are invited.
This site is not designed to teach about data science, in general, or R programming, in particular. I teach a course, Data Analysis in Workforce Education and Development, at Penn State every Fall about these matters. During May I offer a free, ,two-hour webinar, Starting with R, that demonstrates processes and useful features of R programming. A link to registration for this free webinar will appear annually in a posting on this site.
This site is oriented heavily toward R, a programming language and environment for statistical computing and graphics. R is free, open source software that compiles and runs on a wide variety of UNIX platforms, Windows, and MacOS, but not iOS or Android devices. R is paired with RStudio to provide a complete graphic user interface for conducting a wide range of data analysis tasks.
Outlined in R for Data Science is a conception of the workflow for data science:
The components of this workflow include:
⁌ Importing data so that it can be rendered useful for analysis. Imports can arrive through many routes: online, from databases, from spreadsheets, just to name a few.
⁌ Tidying imported data so that it is refined for subsequent transformation, visualization, and modeling. Imported data often are messy with missing, miscoded, or disorganized entries that require correction and reorganization.
⁌ An iterative cycle of understanding involving transforming, visualizing, and modeling tidy data.
⁌ Data transformation reshapes the data to functional forms that are most easily and correctly visualized and modeled.
⁌ Visualizing data underscores and reveals patterns and trends that the data represent. Visualization can reveal surprises, but reviewing many visualizations to examine patterns and trends does not scale as well for humans as modeling does.
⁌ Modeling the data uncovers relationships between outcome variables of interest and variables that explain and predict those outcome variables. Reviewing many model results scales well, but rarely produces surprises from data as visualization does.
⁌ Understanding data is an iterative process, not a “one-and-done” activity. Data analysis can require cycles of refined understanding through transformation, visualization, and modeling until lessons from data are extracted.
⁌ Communicating involves diffusing the knowledge gained from an understanding of imported and refined data through print and non–print reports that are targeted to the information needs and preferences of various audiences.