Homepage | Course content |
Logistic regression, binary cross-entropy, and error metrics
Logistic regression
-
Last class we talked about how we can use linear regression to model linear relationships between features and a target .
-
But not all relationships between features and targets are linear. Sometimes the relationship is categorical (i.e. different instruments, or different musical genres).
- Now picture this scenario:
- You have datapoints, each with features, and you have organized them in a matrix .
- Half of these datapoints have features extracted from Violin tones, while the other half were extracted from Bass Tuba tones.
- You also have a vector , which is filled with zeros or ones, with each “zero” indicating that the features in the corresponding row of were extracted from a Violin tone, and each “one” indicating Bass Tuba.
-
You can use the logistic regression formula to find a vector and a bias term that allow you to transform the features into values between and .
-
The logistic regression formula is , where .
-
Once we have transformed our features into , we can define a threshold (usually
0.5
) under (above) which all values in will be treated as zeros (ones). -
With this procedure, we can assess the performance of our logistic regression model against the ground truth data .
- Question: how many parameters does logistic regression involve? How about linear regression? Why?
Binary cross-entropy
-
Last class, when we optimized linear regression, we used the function .
-
For logistic regression we must use the binary cross-entropy loss, which is defined by (the origins of this function come from statistics. If you are curious, you should take or review the materials for an introductory machine learning class, like Stanford’s CS229).
-
Inspecting the binary cross-entropy loss, you can see that when , . In contrast, when , .
-
When minimizing the binary cross-entropy loss using an algorithm like gradient descent, what we are effectively doing is making and as simiar to each other as possible.
-
Question: why does the binary cross-entropy loss have a negative sign at the beginning?
Error metrics for binary classification
-
When we are done optimizing our logistic regression model, we must evaluate it using our validation data splits (also, remember the evaluation data)?
- It’s very common practice to calculate a confusion matrix, which tells us the number of:
- true positives
- false negatives
- false positives
- true negatives
- Having the confusion matrix, it is also easy to calculate the:
- overall model accuracy
- true positive rate
- true negative rate
- false positive rate (type-i error)
- false negative rate (type-ii error)
- Error metrics are essential to interpret how our model performs on the different data splits of cross-validation.
Optimizing and evaluating logistic regression
© Iran R. Roman & Camille Noufi 2022