Alignment practices in AI raise concerns over value encoding and ethics
A new study published in AI & Society critically examines the rise of alignment as a central ethical framework in machine learning, arguing that the term has become a dominant, yet under-scrutinized, method for regulating artificial intelligence (AI) systems. The research, led by scholars from Ruhr-Universität Bochum and Durham University, explores how alignment practices are reshaping both technical development and ethical thinking in AI.
The concept of alignment has become central to how companies and researchers approach machine learning ethics. It is now widely used to describe efforts to ensure AI systems behave according to human values - typically framed as helpful, honest, and harmless.
The study titled "Desired behaviors”: alignment and the emergence of a machine learning ethics" traces how alignment evolved from a technical challenge in sequence modeling to a broad ethical imperative. Originally, alignment referred to the accurate mapping between inputs and outputs in tasks like language translation. As AI systems became more generalized and began producing unintended outputs, alignment shifted toward a method for regulating model behavior through fine-tuning, human feedback, and reinforcement learning.
This shift attempts to resolve what the authors call a tension between “is” and “ought” normativity. AI models learn statistical patterns from training data - what they “is” - but are increasingly judged based on normative expectations about what they “ought” to do. The alignment process, particularly in large models like GPT, uses curated examples and human evaluations to retrofit ethical expectations onto pretrained systems.
At the core of modern alignment practices is reinforcement learning from human feedback (RLHF). In this process, human labelers evaluate model outputs based on criteria like truthfulness or appropriateness. These evaluations are then used to train reward models that guide AI behavior.
The researchers emphasize that this form of value encoding is indirect. Ethical values are not hardcoded but introduced through example-based learning. Labelers must interpret complex instructions and apply ethical reasoning to ambiguous outputs. This leads to what the authors describe as a recursive optimization process, where ethical judgments are approximated statistically.
While the values used - helpfulness, honesty, and harmlessness - appear universal, the study warns that they are often selected pragmatically rather than philosophically. Moreover, they are reduced to performance benchmarks rather than serving as guiding ethical principles.
Existential risk discourse adds weight - but limits debate
The paper also explores how alignment has taken on broader, more speculative meaning in the context of artificial general intelligence (AGI). Leading thinkers in AI safety have positioned alignment as essential to preventing existential risks, framing future AI systems as potentially catastrophic if not perfectly aligned with human interests.
In this narrative, alignment becomes less about present-day harms and more about averting theoretical extinction scenarios. The authors caution that this shift elevates speculative risks over immediate ethical and political concerns - such as labor displacement, surveillance, or algorithmic bias - and narrows the space for democratic debate about how AI should be governed.
Although alignment is often described in technical or neutral terms, the study argues that it carries significant political implications. By standardizing vague ethical values across systems, companies and researchers gain interpretive control over what constitutes “desirable behavior.” This process, the study suggests, risks excluding alternative value systems or modes of AI interaction.
The authors describe alignment as a form of ethical minimalism that facilitates commercial scalability and regulatory compliance. However, they warn that this minimalism may obscure deeper political and cultural questions, including whose values are embedded in AI systems and who has the authority to define them.
Rather than fostering ethical deliberation, alignment as currently practiced may function more like behavior management. AI outputs are steered away from undesired behaviors post hoc, based on observed failures rather than proactive ethical commitments. This reactive logic, the authors argue, limits alignment’s capacity to serve as a foundation for meaningful ethical engagement.
The study highlights that alignment does not escape the structures of power and control that shape the development and deployment of AI. In aligning model behaviors to narrowly defined benchmarks, it also aligns users to systems built around performance metrics, market incentives, and regulatory constraints.
Implications for AI governance and public trust
While alignment may help address safety concerns and reduce harmful outputs, it also shapes how ethical discourse around AI is framed and enacted. The authors call for greater scrutiny of alignment’s underlying assumptions and its impact on public values, arguing that ethical control should not replace political accountability. They suggest that alternative models - grounded in participatory governance, diverse ethical traditions, and transparency - are needed to ensure that AI systems serve broader societal goals.
- FIRST PUBLISHED IN:
- Devdiscourse

