Homepage | Course content |
Website for the graduate-level Deep Learning for Music Information Retrieval I & II workshops covering the deep learning theory, literature, and practice applied to digital audio. More specifically, the course covers generative models (including autoencoders).
Taught at the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University by Iran R. Roman and Camille Noufi as part of the CCRMA Summer 2022 Workshop Series.
Course content
- Introduction and digital audio review
- Audio features and musical objects
- Datasets and dimensionality reduction
- Cross-validation and linear regression
- Logistic regression, binary cross-entropy, and evaluation metrics
- Softmax classification
- Feedforward neural networks
- CNNs and optimization techniques
- Autoencoders
- VAEs and GANs
- Transformers and RNNs
Prerequisites
This is a graduate-level workshop that assumes knowledge of digital audio signal processing, object-oriented programming (we will work with Python3), differential calculus (chain rule), linear algebra, and basic probability/statistics. To ensure that everybody is on the same page, we will review these concepts as they become relevant to course content. However, if you have never been exposed to these concepts, this course will likely be more challenging that what it has to be.
If you need to review these concepts, checkout the following:
- Introduction to digital filters (make sure you understand everything in Chapter 1)
- Python review
- Chain rule
- Linear algebra (at least sections 1, 2, and most of 3)
- Normal distribution
Or make an online search for other materials covering these concepts.
Course logistics
The course runs from August 8th to 19th (2022) and meets daily from 10AM to 5PM (Pacific Daylight Time) in person at CCRMA and over Zoom.
Please register for the course if you would like to attend either in-person or online. The course will welcome all students who can meet the course prerequisites. All course materials are in English thus strong knowledge of English reading, writting, and speaking is assumed.
Getting help
You may also find a past version of the course’s sub-reddit deeplearningaudio helpful for particular questions relating to this workshop.
© Iran R. Roman & Camille Noufi 2022