Pre Existing Bias

Pre-existing bias occurs when historical or societal prejudices are present in the training data. This can look like a lack of certain data points, over representation or under representation of groups, a bias in the selection of data points that make up the AI model, or data labels that are discriminatory or subjective. Outputs from AI models that have a pre-existing bias can result in inferior performance and outcomes that disadvantage certain groups.

Business Impact

Pre-existing bias in this AI model can result in reputational damage and indirect monetary loss due to the loss of customer trust in the output of the model.

Steps to Reproduce

Input the following text into the model. It highlights the well represented group within the data: {{Text denoting well represented group within the data}}
Input the following text into the model. It highlights the well insufficiently represented group within the data: {{Text denoting the insufficiently represented group within the data}}
Note that the output of the AI model classifies these two groups disparately, showing a pre-existing bias.

Proof of Concept (PoC)

The screenshot(s) below demonstrate(s) the vulnerability:

Guidance

Provide a step-by-step walkthrough with a screenshot on how you exploited the bias. This will speed up triage time and result in faster rewards. Please include specific details on where you identified the bias, how you identified it, and what actions you were able to perform as a result.

Recommendation(s)

Establish practices and policies that ensure responsible data collection and training. This can include:

Conducting a comprehensive review of the training data to find and remediate biases. This includes re-sampling underrepresented groups and adjusting the model parameters to promote fairness.
Business processes that index ethical frameworks, best practices, and concerns should be developed, monitored, and evaluated.
Clearly define the desired outcomes of the AI model, then frame the key variables to capture.
Ensuring that the data collected and used to train the AI model illustrates the environment that it will be deployed in and contains diverse and representative data.
Design and develop algorithms that are sensitive to fairness considerations, and audit these regularly.
Practice data collection principles that do not disadvantage specific groups.
Document the development of the AI model, including all datasets, variables identified, and decisions made throughout the development cycle.

PreviousData Biases NextRepresentation Bias

Last updated 1 year ago