Homepage | Course content |
Softmax classification
Multilabel classification
-
We have talked about two types of regression: linear and binary
-
We also covered how to use gradient descent to optimize a linear regressor and a logistic (i.e. binary) classifier.
-
Linear and binary classification are extremelly powerful. However, sometimes we want to separate data into
C
distinct categories. This is known as multilabel (or multiclass) classification. -
As an example, imagine you want to classify the different vowels of human speech.
The PCVC speech dataset
-
The Persian Consonant Vowel Combination (PCVC) dataset contains audio for persian vowels, pronounced in combination with a preceding consonant.
-
Imagine that we want to use this dataset to develop a classification algorithm that can hear and identify which persian vowel is being said.
-
You can read more about this dataset in the original publication.
Softmax
-
Softmax is a function that allows you take an input and use it to generate “probabilities” across each of the
C
classes. -
The equation of softmax is
-
If we want to use softmax to classify datapoints , then (where is subtracted for computational stability purposes) and is our matrix with the parameters we are optimizing to carry out classification.
Cross-entropy loss
-
Last class we saw that we can use the binary cross entropy loss to optimize a logistic regression classifier.
-
To optimize a softmax (i.e. multiclass) classifier, we will use the categorical cross entropy loss .
-
, the ground truth, is a one-hot vector, where only an entry is the number and all other entries are . Each index in represents a different category. The index with the number indicates which class the corresponding datapoint (i.e. ) truly belongs to.
Regularization
-
So far, the optimization routines we have implemented (linear and logistic regression) can give different solutions for the parameters.
-
This is due to the objective function being only constrained by the comparison between a predicted value and its ground truth .
-
We can add a term to the loss to regularize the parameters , so that we impose a condition over the type of values that can be present in .
-
The most common type of regularization is the squared L2 norm, which results in the loss function , where is the “regularization strength” term.
-
There are other types of regularizaton though.
Softmax
Raw audio standarization
-
In this homework you will work with raw audio signals.
-
Standardizing audio signals is challenging when the number of datapoints is limited.
-
What would happen if we standardize a raw audio samples as we standardized other features?
-
As a result, our standarization will have to be different.
-
For this assignment, we will standardize each datapoint to have samples with zero mean and floating-point values normalized to be in range of
-1
and1
.- Why is this a good standarization when working with raw audio (specially when datapoints are limited)?
Data augmentation
-
When training data is limited, we can use a few techniques to expand the number of datapoints.
- To expand the number of audio datapoints we can create copy of the training data to:
- mix it with different levels of noise to simulate “noisy” samples.
- noise can also be added as a series of bursts instead of using uniform noise across samples in a datapoint.
- shift the pitch of each datapoint using an effect like
librosa.effects.pitch_shift
. - convolve with an impulse response to simulate different acoustic conditions.
- filter with diverse types of digital filters.
- compress with an audio format like mp3.
- mix it with different levels of noise to simulate “noisy” samples.
- The project
audiomentations
provides you with readily-available functions to augment audio data.
© Iran R. Roman & Camille Noufi 2022