This research was an Independent Study / Undergraduate Research position during the Spring 2020 semester for my HCI minor, but due to my summer internship being pushed back, I decided to extend my time working on this research project into the first half of summer. The goal of the study was to figure out the following:
What are the differences in tools and strategies that data scientists and auditors use?
Do auditors and data scientists have similar approaches when given a dataset to find fraud or outliers in data?
Does expertise influence data validation work? More specifically, is there a difference in the quality of financial data validation between data scientists and accountants?
The paper for our research can be found here!
Initially, our biggest question was figuring out how to test the thinking processes of data scientists and auditors. Instead of giving the participants a dataset and asking them to find fraud in the data, we decided to pre-create data visualizations in Tableau. We thought giving the dataset without providing much instruction would put the accounting students at a disadvantage, because data science students are probably familiar with data manipulation and visualization. Furthermore, we hypothesized that people in different roles may favor certain visualizations over others due to their prior experience with looking at data. For example, we thought that auditors would spend more time looking at the data in tabular form, while data scientists would spend more time interpreting histograms and scatterplots.
Our study consisted of visualizations from two different datsets, as well as a toy dataset for the participant to be accustomed to logging insights and playing with the Tableau interface. The two datasets were on expenses for a company and salaries and bonus pays for another company. The data was initially from the book "Data Analytics for Auditing using ACL, 4th Edition," but additional columns were added and fraud was injected to make the visualizations more interesting and to make it easier for the participants to find outliers and insights about the data. Below are example visualizations from each dataset.
Company Expenses Dataset
Company Salaries Dataset
Each dataset had 10 different visualizations in its Tableau "story." To account for the fact that auditors and data scientists may prefer different visualizations over others, we tried our best to add a variety of visualization types, such as boxplots, bar graphs, line plots, and tables. Most of these visualizations were interactive in the sense that the participant could sort and filter the data, and hovering over specific points of the visualization led to more information about the data point.