# Bayes Classifier with Asymmetric Costs

Thanks to Prof. Larry for this problem!

Consider the following binary classification problem. Every individual of a population is associated with an independent replicate of the pair $(\mathbf{X}, Y)$, having known joint distribution and where the (observed) covariate $\mathbf{X}$ has a (marginal) distribution $\pi$, and the (unobserved) response $Y \in \{-1, 1\}$. Suppose the costs of misclassifying an individual with $Y = 1$ and $Y = -1$ are $a > 0$ and $b > 0$, respectively. What’s the Bayes decision rule?

A classification rule, say $g$, is a function of $\mathbf{X}$ taking values in $\{-1, 1\}$. We incur a loss when,

• $Y=1$ and we predict $-1$ (i.e $g(\mathbf{X}) = -1$). The loss in this case is $a$.
• $Y=-1$ and we predict $1$ (i.e $g(\mathbf{X}) = 1$). The loss in this case is $b$.

Thus, the expected loss or cost $L(g)$ of using the classification rule $g$ may be expressed as

To compute the above expected loss, it is useful to define the following quantities. Define the random variable,

Moreover,

• $R_1$ denotes the set of $\mathbf{X}$’s on which $g$ takes the value $1$.
• Similarly, $R_{-1}$ denotes the set of $\mathbf{X}$’s on which $g$ takes the value $-1$.

A touch of algebra yields, %

This enables us to compute the expected cost as %

Thus the Bayes decision rule, $g^*$ that minimizes the cost is the one with regions $R_1, R_{-1}$ chosen to minimize

How do we choose these regions? Pick any $\mathbf{x}$. If $% $ then we want that $\mathbf{x}$ to be part of $R_{-1}$ since otherwise that $\mathbf{x}$ would only serve to increase the above expression. This yields

and

Since by definition

checking if $% $ amounts to checking if

Similarly, checking if $\eta(\mathbf{x}) \ge 0,$ amounts to checking if

This yields the optimum decision rule,

Intuitively, if $a$ is much larger than $b$, then we care much more about (not) misclassifying $Y = 1$ which makes us more likely to classify a given covariate $\mathbf{x}$ as $1$. The decision rule derived above satisfies this intuition. In the limit $a \to \infty$, it is easy to see that the classifier will always classify an observation as $1$. Finally, when $a$ and $b$ are the same, we recover the original Bayes classifier which simply looks at which response is most likely: