Bayes Classifier with Asymmetric Costs

Thanks to Prof. Larry for this problem!

Consider the following binary classification problem. Every individual of a population is associated with an independent replicate of the pair , having known joint distribution and where the (observed) covariate has a (marginal) distribution , and the (unobserved) response . Suppose the costs of misclassifying an individual with and are and , respectively. What’s the Bayes decision rule?

A classification rule, say , is a function of taking values in . We incur a loss when,

  • and we predict (i.e ). The loss in this case is .
  • and we predict (i.e ). The loss in this case is .

Thus, the expected loss or cost of using the classification rule may be expressed as

To compute the above expected loss, it is useful to define the following quantities. Define the random variable,

Moreover,

  • denotes the set of ’s on which takes the value .
  • Similarly, denotes the set of ’s on which takes the value .

A touch of algebra yields,

This enables us to compute the expected cost as

Thus the Bayes decision rule, that minimizes the cost is the one with regions chosen to minimize

How do we choose these regions? Pick any . If then we want that to be part of since otherwise that would only serve to increase the above expression. This yields

and

Since by definition

checking if amounts to checking if

Similarly, checking if amounts to checking if

This yields the optimum decision rule,

Intuitively, if is much larger than , then we care much more about (not) misclassifying which makes us more likely to classify a given covariate as . The decision rule derived above satisfies this intuition. In the limit , it is easy to see that the classifier will always classify an observation as . Finally, when and are the same, we recover the original Bayes classifier which simply looks at which response is most likely: