Machine learning is a methodology of data analysis that allows software to learn about data, identify patterns, and make predictions without human intervention. Using machine learning, researchers can automatically generate high-quality images and write a novel. Creating art with machine learning is a state-of-the-art technique, which can bridge several different research areas, such as computer science, cognitive science, psychology, and music. In this dissertation, the research objectives are (i) to find effective machine learning practices for music and audio understanding based on emotional context, which improves the knowledge of affective computing, (ii) to design a music generative model and investigate melodic anticipation/expectation and information dynamics in machine generated music.
Music and emotion are strongly linked, and listeners can feel different emotions directly or indirectly through music. Engaging emotion as a component of a musical interface has great potential for composing creative music and expressing messages in an effective way. However, emotions are not tangible objects that can be exploited in music information retrieval as they are difficult to capture and quantify in algorithms. In this dissertation, we efficiently combine machine learning techniques for understanding and extracting the emotional context in music and audio data. Several machine learning models are implemented and tested in order to understand the connection between music and emotion in a way that wasn't possible before.
First, we look at the technology of music and audio understanding and design an algorithm for improving the real case scenario for the application of sound understanding. Next, we introduce a generative machine learning model for automatic music composition which uses emotional aspects in musical dynamics for the machine generated music. In order to classify music according to emotions, the deep audio embeddings method was tested, and we show its efficacy for the automatic music emotion recognition task. Lastly, we propose an interactive audio interface that sonifies emotion. The idea is to use human facial gesture data to detect emotions and categorize these into several emotional states for sonification. Rather than simply detecting facial gesture data, it automatically extracts emotional states and produces sound output transition. This dissertation makes the technology more accessible for creative purposes so people can analyze the emotions of music by using machine learning methods and generate machine learning applications using emotional attributes of music.