Ethics of Data Collection and AI

Adam Clarke
Software Engineer
15th Apr 2024
2 min read

As the saying goes, "garbage in, garbage out", which is a good reminder of the implications of biased data in general, and more recently in the the realm of artificial intelligence and machine learning too.

From perpetuating systemic inequalities to distorting business insights, the consequences of data bias can have a huge impact on any business that relies on data to inform decision making.

In this excerpt from our second AI Guidebook: Good Data: Why every AI needs it and how to get it to them, we delve a little deeper.


Data should have known and understood biases, if any

The issue of bias in datasets extends beyond ethical considerations; it also has a profound impact on the effectiveness and profitability of business operations. Bias in data can manifest in various forms, leading to skewed AI models that not only perpetuate societal inequalities but also mislead business decisions, potentially costing companies significantly.

Ethics

The most commonly discussed aspect of data bias pertains to its ethical and social implications. For instance, an AI hiring tool trained on historical data might perpetuate historical biases, favouring candidates from a specific gender, race, or socio-economic background. Similarly, credit scoring algorithms that rely on biased datasets could unjustly favour or penalise certain demographic groups, leading to unfair practices and potential legal repercussions.

Impact on business decisions and profitability

From a business perspective, biased data can lead to misguided strategies and financial losses. Consider a retail company that uses AI to analyse customer purchasing patterns. If their dataset primarily includes transactions from urban, high-income areas, the AI model might inaccurately predict the preferences of customers in rural or lower-income regions. This misalignment can lead to poor inventory decisions, ineffective marketing strategies, and ultimately, lost sales and revenue.

Another example is targeted advertising. If an AI model is trained on skewed user interaction data, it might conclude that certain products are unpopular, leading to reduced advertising efforts for those products. However, the lack of interaction could be due to the product being under-promoted initially, not a lack of interest. This cycle can cause potentially profitable products to be overlooked.

Accidental Bias

Bias in datasets can often be accidental, stemming from seemingly innocuous decisions or oversights. For instance, a company developing a voice recognition system collects voice samples from its predominantly young, urban-based employees. While unintentional, this sampling method introduces a bias towards a specific age group and possibly a certain accent or speech pattern. When deployed, the system might struggle to accurately recognise voices from older demographics or different regions, limiting its effectiveness and market appeal.

Consider a business that collects customer feedback exclusively through its online platform. This method inadvertently biases the dataset towards a tech-savvy demographic, potentially one younger and more digitally inclined. Based on this feedback, the business might make decisions that cater predominantly to this group's preferences.

This could prove to be acceptable if that is also the demographic that the business should be focusing on, but it could also be the case that the demographics from which the data originated do not align with the overall demographic of the customer base. This skew in data can lead to misinformed product development, marketing strategies, and customer service improvements, ultimately impacting the business's bottom line and market reach as core demographics.

Ultimately what matters is that organisations understand how their methods for collecting and using data can introduce bias, and that they know who their usage of that data will impact and act accordingly.

Final Thoughts

By implementing robust measures to detect, mitigate, and prevent bias, businesses can enhance the reliability and fairness of their data-driven initiatives. In doing so, they not only fulfill their ethical responsibilities but also unlock new opportunities for innovation, growth, and social impact in an increasingly data-driven world.