Algorithms, Diversity, and Privacy: Better Data Practices to Create Greater Student Equity
Published by: WCET | 1/30/2020
Everything we do online or out in the world is collected as data and stored. This data is tracked and analyzed and used to inform predictions about the future. Data about our spending habits informs companies about strategies for Internet advertising. Data about our values and interests informs online dating websites. And data about our beliefs informs political predictions.
Student data is no different. Educational institutions and organizations can collect data about students including anything from where students are originally from to how much assistance they need in their classes. This data, in turn, can be used to make predictions about student outcomes and hopefully then used to have a positive impact on students’ success.
However, there are dark sides to the prevalence of data. While issues in algorithmic bias have made headlines recently in industries such as criminal justice and healthcare, these same issues can exist anywhere that data is analyzed and utilized by machines, including in higher education. Problems arise both from the way that algorithms themselves are written (and who is writing them), and from biased data being used to make future predictions, which happens as a result of human bias that already exists in our society and creates feedback loops.
For many generations in the United States, the most successful and powerful people in society were white men. Other members of society were not permitted many of the rights that would allow them to flourish – the right to citizenship, the right to vote, the right to an education, and more. Although rights in the United States have changed significantly since its early days, it would be an exaggeration to say that all issues have been resolved. Racism, sexism, transphobia, and other varieties of xenophobia are alive in America today, especially when considering issues such as wealth disparities, housing access, unequal criminal sentencing, stereotyping and prejudice, and much more. And those human biases present themselves in the data that we are creating today.
The tricky thing about algorithms and about technology more generally is that the tech itself cannot evaluate the decisions that it is suggesting or understand if a piece of the puzzle is missing. Past data may be suggestive of certain trends, but if we don’t look at the events that led to those trends then we have an incomplete picture.
One of the best and most amusing examples to help understand algorithmic bias is to look at the neural network experiments being done by Janelle Shane, which she chronicles on her blog AI Weirdness. In her experiments, she feeds public data into a neural network to create something new. For example, she has collected names of real Pokémon characters to train a neural network to create new characters, collected names of cats to come up with new cat names, and, most amusingly, collected real, preexisting recipes to create new recipes.
In the latter experiment, the neural network has created recipes that call for bizarre ingredients including mashed potato fillets and artichoke gelatin dogs, to make up equally strange dish names including things like Completely Meat Chocolate Pie and Strawberry-Onions Marshmallow Cracker Pie Filling. Something immediately apparent about all these neural network-created food items, aside from how strange and gross they sound, is that they are all based primarily on Western cuisine. In its own way, the data that was put in was biased in favor of Western food, so the results that come out are also biased in favor of the same. A collection of data can reflect a human bias, but algorithms do not have a mind of their own to correct the error.
Algorithms are increasingly being used in higher education to help with things such as admissions and retention, adaptive learning, student support in the form of things like financial aid and early warning systems, and more. However, without careful development of said algorithms, we will see bias negatively impacting our students, especially many of whom need the most assistance and opportunity to succeed.
Considering students only as numbers and data can have devastating effects. To look at students merely as these data points fails to see the societal barriers that they may be up against – as individuals or as members of a specific socioeconomic group.
Aside from failing to help certain students, schools may also be giving additional privilege to students who already have it when they use algorithms. If a school already has a historical bias towards having students from one background more than another, it is likely that the trend will be perpetuated with the addition of the algorithm. If fed biased data, algorithms will compute results that match the data and thus are also biased.
The issue of biased algorithms leads to another problem as well: the issue of students having ownership and privacy of their own data. Often in conversations about privacy and data security, some version of this popular argument will come up: “I don’t have anything to hide, so it doesn’t matter to me who can see my data.” However, without the knowledge of how algorithms are designed or how the data that informs them is collected, it is more difficult to say with certainty that it doesn’t matter who can see (and use) the data.
Unfortunately, there are no easy answers to these issues. However, here are some ideas of where we can start to ensure the development of unbiased algorithms:
Do you have experience working with algorithms and machine learning that you would like to share with us? We want to know! Tell us your stories of how algorithms are implemented at your school and what people working with them do to make sure they are not biased. We hope to publish additional blogs about specific experiences.
Manager, Digital Design
WCET – WICHE Cooperative for Educational Technologies