Bayes Classifier with Asymmetric Costs
Thanks to Prof. Larry for this problem!
Consider the following binary classification problem. Every individual of a population is associated with an independent replicate of the pair , having known joint distribution and where the (observed) covariate has a (marginal) distribution , and the (unobserved) response . Suppose the costs of misclassifying an individual with and are and , respectively. What’s the Bayes decision rule?
A classification rule, say , is a function of taking values in . We incur a loss when,
- and we predict (i.e ). The loss in this case is .
- and we predict (i.e ). The loss in this case is .
Thus, the expected loss or cost of using the classification rule may be expressed as
To compute the above expected loss, it is useful to define the following quantities. Define the random variable,
Moreover,
- denotes the set of ’s on which takes the value .
- Similarly, denotes the set of ’s on which takes the value .
A touch of algebra yields,
This enables us to compute the expected cost as
Thus the Bayes decision rule, that minimizes the cost is the one with regions chosen to minimize
How do we choose these regions? Pick any . If then we want that to be part of since otherwise that would only serve to increase the above expression. This yields
and
Since by definition
checking if amounts to checking if
Similarly, checking if amounts to checking if
This yields the optimum decision rule,
Intuitively, if is much larger than , then we care much more about (not) misclassifying which makes us more likely to classify a given covariate as . The decision rule derived above satisfies this intuition. In the limit , it is easy to see that the classifier will always classify an observation as . Finally, when and are the same, we recover the original Bayes classifier which simply looks at which response is most likely: