- Open Access
Semantic structures of timbre emerging from social and acoustic descriptions of music
© Ferrer and Eerola; licensee Springer. 2011
- Received: 25 March 2011
- Accepted: 7 December 2011
- Published: 7 December 2011
The perceptual attributes of timbre have inspired a considerable amount of multidisciplinary research, but because of the complexity of the phenomena, the approach has traditionally been confined to laboratory conditions, much to the detriment of its ecological validity. In this study, we present a purely bottom-up approach for mapping the concepts that emerge from sound qualities. A social media (http://www.last.fm) is used to obtain a wide sample of verbal descriptions of music (in the form of tags) that go beyond the commonly studied concept of genre, and from this the underlying semantic structure of this sample is extracted. The structure that is thereby obtained is then evaluated through a careful investigation of the acoustic features that characterize it. The results outline the degree to which such structures in music (connected to affects, instrumentation and performance characteristics) have particular timbral characteristics. Samples representing these semantic structures were then submitted to a similarity rating experiment to validate the findings. The outcome of this experiment strengthened the discovered links between the semantic structures and their perceived timbral qualities. The findings of both the computational and behavioural parts of the experiment imply that it is therefore possible to derive useful and meaningful structures from free verbal descriptions of music, that transcend musical genres, and that such descriptions can be linked to a set of acoustic features. This approach not only provides insights into the definition of timbre from an ecological perspective, but could also be implemented to develop applications in music information research that organize music collections according to both semantic and sound qualities.
- natural language processing
- vector-based semantic analysis
- music information retrieval
- social media
In this study, we have taken a purely bottom-up approach for mapping sound qualities to the conceptual meanings that emerge. We have used a social media (http://www.last.fm) for obtaining as wide a sample of music as possible, together with the free verbal descriptions made of music in this sample, to determine an underlying semantic structure. We then empirically evaluated the validity of the structure obtained, by investigating the acoustic features that corresponded to the semantic categories that had emerged. This was done through an experiment where participants were asked to rate the perceived similarity between acoustic examples of prototypical semantic categories. In this way, we were attempting to recover the correspondences between semantic and acoustic features that are ecologically relevant in the perceptual domain. This aim also meant that the study was designed to be more exploratory than confirmative. We applied the appropriate and recommended techniques for clustering, acoustic feature extraction and comparisons of similarities; but this was only after assessing the alternatives. But, the main focus of this study has been to demonstrate the elusive link that exists between the semantic, perceptual and physical properties of timbre.
1.1 The perception of timbre
Even short bursts of sound are enough to evoke mental imagery, memories and emotions, and thus provoke immediate reactions, such as the sensation of pleasure or fear. Attempts to craft a bridge between such acoustic features and the subjective sensations they provoke  have usually started with describing instrument sounds via adjectives on a bipolar scale (e.g. bright-dark, static-dynamic) and matching these with more precise acoustic descriptors (such as the envelope shape, or high-frequency energy content) [2, 3]. However, it has been difficult to compare these studies when such different patterns between acoustic features and listeners' evaluations have emerged . These differences may be attributed to the cross-study variations in context effects, as well as the choice of terms, stimuli and rating scales used. It has also been challenging to link the findings of such studies to the context of actual music , when one considers that real music consists of a complex combination of sound. A promising approach has been obtained to evaluate short excerpts of recorded music with a combination of bipolar scales and acoustic analysis . However, even this approach may well omit certain sounds and concepts that are important for the majority of people, since the music and scales have usually been chosen by the researcher, not the listeners.
1.2 Social tagging
Social tagging is a way of labelling items of interest, such as songs, images or links as a part of the normal use of popular online services, so that the tags then become a form of categorization in themselves. Tags are usually semantic representations of abstract concepts created essentially for mnemonic purposes and used typically to organize items [7, 8]. Within the theory of information foraging , tagging behaviour is one example of a transition from internalized to externalized forms of knowledge where, using transactional memory, people no longer have to know everything, but can use other people's knowledge . What is most evident in the social context is that what escapes one individual's perception can be captured by another, thus transforming tags into memory or knowledge cues for the undisclosed transaction .
Social tags are usually thought to have an underlying ontology  defined simply by people interested in the matter, but with no institutional or uniform direction. These characteristics make the vocabulary and implicit relations among the terms considerably richer and more complex than in formal taxonomies where a hierarchical structure and set of rules are designed apriori (cf. folksonomy versus taxonomy in ). When comparing ontologies based on social tagging and the classification by experts, it is presumed that there is an underlying organization of musical knowledge hidden among the tags. But, as raised by Celma and Serra ), this should perhaps not to be taken for granted. For this reason, Section 2 addresses the uncovering of an ontology from the tags  in an unsupervised form, to investigate whether such an ontology is not an imposed construction. Because a latent structure has been assumed, we use a technique called vector-based semantic analysis, which is a generalization of Latent Semantic Analysis  and similar to the methods used in latent semantic mapping  and latent perceptual indexing . Thus, although some of the terminology is borrowed from these areas, our method is also different in several crucial respects. While ours is designed to explore emergent structures in the semantic space (i.e. clusters of musical descriptions), the other methods are designed primarily to improve information retrieval by reducing the dimensionality of the space . In our method, the reduction is not part of the analytical step, but rather implemented as a pre-filtering stage (see Appendix sections A.1 and A.2). The indexing of documents (songs in our case) is also treated separately in Section 2.2 which presents our solution based on the Euclidean distances of clusters profiles in a vector space. The reasons outlined above show that tags, and the structures that can be derived from them, impart crucial cues about how people organize and make sense of their experiences, which in this case is music and in particular its timbre.
Previous research on explaining the semantic qualities of music in terms of its acoustic features has taken many forms: genre discrimination tasks [36, 37], the description of soundscapes , bipolar ratings encompassing a set of musical examples  and the prediction of musical tags from acoustic features [21, 38–40]. A common approach in these studies has been to extract a range of features, often low-level ones such as timbre, dynamics, articulation, Mel-frequency cepstral coefficients (MFCC) and subject them to further analysis. The parameters of the actual feature extraction are dependent on the goals of the particular study; some focus on shorter musical elements, particularly the MFCC and its derivatives [21, 39, 40]; while others utilize more high-level concepts, such as harmonic progression [41–43].
In this study, the aim was to characterize the semantic structures with a combined set of non-redundant, robust low-level acoustic and musical features suitable for this particular set of data. These requirements meant that we employed various data reduction operations to provide a stable and compact list of acoustic features suitable for this particular dataset . Initially, we considered a large number of acoustic and musical features divided into the following categories: dynamics (e.g. root mean square energy); rhythm (e.g. fluctuation  and attack slope ); spectral (e.g. brightness, roll-off [47, 48], spectral regularity  and roughness ); spectro-temporal (e.g. spectral flux ) and tonal features (e.g. key clarity  and harmonic change ). By considering the mean and variance of these features across 5-s samples of the excerpts (details given in the following section), we were initially presented with 50 possible features. However, these features contained significant redundancy, which limits the feasibility of constructing predictive classification or regression models and also hinders the interpretation of the results . For this reason, we did not include MFCC, since they are particularly problematic in terms of redundancy and interpretation .
The features were extracted with the MIRtoolbox  using a frame-based approach  with analysis frames of 50-ms using a 50% overlap for the dynamic, rhythmic, spectral and spectro-temporal features and 100-ms with an overlap of 87.5% for the remaining tonal features.
Selected 20 acoustic features
3.1 Classification of the clusters based on acoustic features
To investigate whether they differed in their acoustic qualities, four test sets were prepared to represent the clusters. For each cluster, the 50 most representative songs were selected using the ranking operation defined in Section 2.2. This number was chosen because an analysis of the rankings within clusters showed that the top 50 songs per cluster remained predominantly within the target cluster alone (89%), whereas this discriminative property became less clear with larger sets (100 songs at 80%, 150 songs at 71% and so on). From these candidates, two random 5-s excerpts were then extracted to establish two sets, to train and test each clustering, respectively. For 19 clusters, this resulted in 950 excerpts per set; and for the 5 meta-clusters, it resulted in 250 excerpts per set. After this, classification was carried out using Random Forest (RF) analysis . RF is a recent variant of the regression tree approach, which constructs classification rules by recursively partitioning the observations into smaller groups based on a single variable at a time. These splits are created to maximize the between groups sum of squares. Being a non-parametric method, regression trees are thereby able to uncover structures in observations which are hierarchical, and yet allow interactions and nonlinearity between the predictors . RF is designed to overcome the problem of overfitting; bootstrapped samples are drawn to construct multiple trees (typically 500 to 1000), which have randomized subsets of predictors. Out-of-bag samples are used to estimate error rate and variable importance, hence, eliminating the need for cross-validation, although in this particular case we still resorted to validation with a test set. Another advantage of RF is that the output is dependent only on one input variable, namely, the number of predictors chosen randomly at each node, heuristically set to 4 in this study. Most applications of RF have demonstrated that this technique has improved accuracy in comparison to other supervised learning methods.
Confusion matrix for five meta-clusters (showing 54.8% success in RF classification)
The classification results imply that the acoustic correlates of the clusters can be established if we are looking only at the broadest semantic level (meta-clusters). Even then, however, some of the meta-clusters were not adequately discriminated by their acoustical properties. This and the analysis with all 19 clusters suggest that many of the pairs of clusters have similar acoustic contents and are thus indistinguishable in terms of classification analysis. However, there remains the possibility that the overall structure of the cluster solution is nevertheless distributed in terms of the acoustic features along dimensions of the cluster space. The cluster space itself will therefore be explored in more detail next.
3.2 Acoustic characteristics of the cluster space
As classifying the clusters according to their acoustic features was not hugely accurate at the most detailed cluster level, another approach was taken to define the differences between the clusters in terms of their mutual distances. This approach examined in more detail their underlying acoustic properties; in other words, whether there were any salient acoustic markers delineating the concepts of cluster 19 ("Rousing, Exuberant, Confident, Playful, Passionate") from the "Mellow, Beautiful, Chillout, Chill, Sad" tags of cluster 7, even though the actual boundaries between the clusters were blurred.
where C i and C j represent a pair of clusters and x and y two different tags.
Nevertheless, before settling on this method of single linkage, we checked three other intercluster distance measures (Hausdorff, complete and average) for the purposes of comparison. Single linkage was finally chosen due to its intuitive and discriminative performance in this material and in general (cf. ).
where p is the number of dimensions and λ i represents the eigenvalues sorted in decreasing order .
Correlations between acoustic features and the inter-item distances between the clusters
Fluctuation centroid (M)
Chromagram centroid (M)
Harmonic change (SD)
Attack time (M)
Harmonic change (M)
Chromagram centroid (SD)
Attack time (SD)
Chromagram peak (M)
In order to explore whether the obtained clusters were perceptually meaningful, and to further understand what kinds of acoustic and musical attributes they actually consisted of, new empirical data about the clusters needed to be gathered. For this purpose, a similarity rating experiment was designed, which assessed the timbral qualities of songs from each of the tag clusters. We chose to focus on the low-level, non-structural qualities of music, since we wanted to minimize the possible confounding factor of association, caused by recognition of lyrics, songs or artists. The stimuli for the experiment therefore consisted of semi-randomly spliced [37, 65], brief excerpts. These stimuli, together with other details of the experiment, will be explained more fully in the remaining parts of this section.
4.1 Experiment details
Five-second excerpts were randomly taken from a middle part (P(t) for 0.25T ≤ t ≤· 0.75T, where T represents the total duration of a song) of each of the 25 top ranked songs from each cluster (see the ranking procedure detailed in Section 2.2). However, when splicing the excerpts together for similarity rating, we wanted to minimize the confounds that were caused by disrupting the onsets (i.e. bursts of energy). Therefore, the exact temporal position of the onsets for each excerpt was detected with the aid of the MIRToolbox . This process consisted of computing the spectral flux within each excerpt by focussing on the increase in energy in successive frames. It produced a temporal curve from which the highest peak was selected as the reference point for taking a slice, providing that this point was not too close to the end of the signal (t ≤ 4500 ms).
Slices of random length (150 ≤ t ≤ 250 ms) were then taken from a point that was 10 ms before the peak onset for each excerpt that was being used to represent a tag cluster. The slices were then equalized in loudness, and finally mixed together using a fade in/out of 50 ms and an overlap window of 100 ms. This resulted in 19 stimuli (examples of the spliced stimuli can be found at http://www.jyu.fi/music/coe/materials/splicedstimuli) of variable length, each corresponding to a cluster, and each of which was finally trimmed to 1750 ms (with a fade in/out of 100 ms). To finally prepare these 19 stimuli for a similarity rating experiment, the resulting 171 paired combinations were mixed with a silence of 600 ms between them.
Twelve females and nine males were participated in this experiment (age M = 26.8, SD = 4.15). Nine of them had at least 1 year of musical training. Twelve reported listening to music attentively between 1 and 10 h/week, and 19 of the subjects listened to music while doing another activity (63% 1 ≤ t ≤ 10, 26% 11·≤ t ≤ 20, 11% t ≤ 21 h/week).
Participants were presented with pairs of sound excerpts in random order using a computer interface and high-quality headphones. Their task was to rate the similarity of sounds on a 9-level Likert scale, the extremes of which were labelled as dissimilar and similar. Before the actual experimental trials, the participants were also given instructions and some practice to familiarize themselves with the task.
4.2 Results of experiment
The level at which participants' ratings agreed with each other was estimated with Cronbach's method (α = 0.94), and the similarity matrices derived from their ratings were used to make a representation of the perceptual space. Individual responses were thus aggregated by computing a mean similarity matrix, and this was subjected to a classical metric MDS analysis. With Cox and Cox's  method (8) we estimated that four dimensions were enough to represent the original space since these can explain 70% of the variance.
4.2.1 Perceptual distances
4.2.2 Acoustic attributes of the similarities between stimuli
Correlations between MDS solutions (dimensions 1 and 2) and acoustic features for the experiment
The first dimension correlates with features related to the organization of pitch and harmonics, as revealed by the mean chromagram peaks (r = 0.82) and the degree of variation between successive peaks in the spectrum (mean spectral regularity r = 0.72). There is also correlation with the variance of the energy distribution (standard deviation of the spectral roll-off at 95% r = 0.7); the distance between the spectrum of successive frames (mean spectral flux r = -0.7); and to a lesser degree with the shape of the spectrum in terms of its "width" (mean spectral spread r = -0.61). The second dimension correlates significantly with the perceived dissonance (mean roughness r = -0.74); pitch salience (chromagram centroid r = -0.72); and also captures the mean spectral spread (r = 0.65), although in an inverse fashion. Table 6 provides a more detailed summary of this.
4.2.3 Comparing a semantic structure based on social tags, to one based on perceptual similarities
As we have now explored the emergent structure from tags using a direct acoustic analysis of the best exemplars in each cluster, and probed this semantic space further in a perceptual experiment, the question remains as to whether the two approaches bear any similarities. The most direct way to examine this is to look at the pattern of correlations between both: i.e. to compare tables 5 and 6. Although the lists of features vary slightly, due to the difference in redundancy and robustness criteria applied to each set of data, convergent patterns can still be found. An important shared feature is the variation in brightness, which is both present in dimension 1 of the direct cluster analysis, and in the perceptual space depicting the spliced stimuli (from the same 19 clusters). In the first case, it takes the form of "brigthenss SD", and in the second, it is "roll-off SD" (virtually identical). In addition, the second dimension in both solutions is characterized by roughness, although the underlying polarities of the space have been flipped in each. Of course, one major reason for differences between the two sets of data must be due to the effects of splicing, conducted in the perceptual experiment but not in the other. However, there were nevertheless analogies between the two perspectives of the semantic structure that could be detected in the acoustic substrates. They have been used here to highlight such features that are little affected by form, harmony, lyrics and other high-level musical (and extra-musical) characteristics. From this perspective, a tentative convergence between the two approaches was successfully obtained.
Semantic structures within music have been extracted from the social media previously [20, 25] but the main difference between the prior genre-based studies and this study is that we focussed more on the way people describe music in terms of how it sounds in conceptual expressions. We argue that these expressions are more stable than musical genres, which have previously proven to be of a transient nature and a source of disagreement (cf. ), despite important arguments vindicating their value for classification systems . Perhaps the biggest problem with expert classifications (such as genre) is that the result may not reach the same level of ecological validity in describing how music sounds, as a semantic structure derived from social tags. This is a very important reason to examine tag-based semantic structures further, in spite of their inherent weaknesses as pointed out by Lamere .
A second way in which this study differs from those previous lies in the careful filtering of the retrieved tags using manual and automatic methods before the actual analysis of the semantic structures was conducted. Not only that, but a prudent trimming of the acoustic features was done to avoid overfitting and any possible increases in model complexity. Finally, a perceptual exploration of the semantic structure found was carried out to assess whether the sound qualities alone would be sufficient to uncover the tag-based structure.
The whole design of this study offers a preliminary approach to the cognition of timbre in semantic terms. In other words, it uses verbal descriptions of music, expressed by the general population (in the form of social tags), as a window to study how a critical feature of music (timbre) is represented in the semantic memory . It is however evident that if each major step of this study was treated separately, there would be plenty of room for refining their respective methodologies, namely, tag filtering, uncovering the semantic structure, acoustic summarization and conducting a perceptual experiment to examine the two empirical perspectives. This being said, we did consider some of the alternatives for these steps to avoid methodological pitfalls (particularly in the clustering and the distance measures). But even if each analytical step was optimized to enhance the solution to an isolated part of the problem, this would inevitably come at the expense of unbalancing the overall picture. Since this study relies on an exploratory approach, we chose mainly conventional techniques for each step, with the expectation that further research will be conducted to corroborate the findings and improve the techniques used here.
The usefulness of signal summarization based on the random spliced method  has been assessed for audio pattern recognition . Our findings in the perceptual domain seem to vindicate the method where listeners rate sounds differing in timbral qualities, especially if the scope is the long-term non-structural qualities of music . Such a focus is attained by cutting the slices in a way that preserves important aspects of music (onsets and sample lengths), while ensuring that they are from a wide cross section of timbrically related songs (i.e. belonging to the same semantic region or timbral environment  in the perceptual space).
In conclusion, this study provided a bottom-up approach for finding the semantic qualities of music descriptions, while capitalizing on the benefits of social media, NLP, similarity ratings and acoustic analysis to do so. We learned that when listeners are presented with brief and spliced excerpts taken from the clusters representing a tag-based categorization of the music, they are able to form coherent distinctions between them. Through an acoustic analysis of the excerpts, clear correlations between the dimensional and timbral qualities of music emerged. However, it should be emphasized that the high relevance of many timbral features is only natural since the timbral characteristics of the excerpts were preserved and structural aspects were masked by the semi-random splicing. Nevertheless, we are positively surprised at the level of coherence in regard to the listener ratings and their explanations in terms of the acoustic features; in spite of the limitations we imposed on the setting using a random splicing method, and the fact that we tested a large number of clusters.
The implications of the present findings relate to several open issues. The first is whether structural aspects of music are required to explain the semantic structures or whether low-level, timbral characteristics are sufficient, as was suggested by the present findings. Secondly, what new semantic layers (as indicated by the categories of tags) can meaningfully be connected with the acoustic properties of the music? Finally, if the timbral characteristics are indeed strongly connected with such semantic layers as adjectives, nouns and verbs, do these arise by means of learning and associations or are the underlying regularities connected with the emotional, functional and gestural cues of the sounds?
A natural continuation of this study would be to go deeper into the different layers of tags to explore which layers are more amenable to direct mapping by acoustic qualities, and which are mostly dependent on the functional associations and cultural conventions of the music.
Preprocessing is necessary in any text mining application because the retrieved data do not follow any particular set of rules, and there are no standard steps to follow . Moreover, with the aid of Natural Language Processing (NLP) [71, 72] methods, it is possible to explore the nature of the tags from statistical and lexicological perspectives. In the following sections, the rationale and explanation for each preprocessing step is given.
Three filtering rules were applied to the corpus.
Remove Hapax legomena (i.e. tags that appear only once in the corpus), under the rationale of discarding unrelated data (see Table 1).
Capture the most prevalent tags by eliminating from the vocabulary those whose index of usage (see Section 2) is below the mean.
Discard tags composed by three or more words in order to prune short sentence-like descriptions from the corpus.
The subset resulting from such reductions represents 46.6% of the corpus (N = 169, 052, Vocabulary = 2029 tags).
A.2 Lexical categories for tags
Main categories of tags
Musical genre or style
Rock, Alternative, Pop
General category of adjectives
Beautiful, Mellow, Awesome
General category of nouns
Love, Melancholy, Memories
Artists or group names
Coldplay, Radiohead, Queen
Geographic situation or locality
British, American, Finnish
Words used to manage personal collections
Seen Live, Favourites, My Radio
Female vocalists, Piano, Guitar
aitch, prda, < 3
80's, 2000, Late Romantic
Musical form or compositional technique
Ballad, Cover, Fusion
Record label, radio station, etc.
Motown, Guitar Hero, Disney
General category of verbs
Chillout, Relax, Wake up
Emphasis in the message or literary content
Political, Great lyrics, Love song
Wow, Yeah, lol
The greatest percentage of tags refer to musical genres, but there are significant percentages in other categories. For instance, the second most commonly found tags are adjectives, followed by nouns which except for some particular contextual connotations, are used for the most part adjectivally to describe the general sound of a song (e.g. mellow, beautiful for adjectives and memories and melancholy for nouns).
The rest of the categories suggest that music is often tagged in terms of association, whether it be to known auditory objects (e.g. instruments and band names), specific circumstances (e.g. geographical locations and time of the day or season) or idiosyncratic things that only make sense at a personal level. This classification is mainly consistent with past efforts , although the vocabulary analysed is larger, and there are consequently more categories.
The result allowed for a finer discrimination of tags to be made, that might better uncover the semantic structure. Since one of the main motivations of this project was to obtain prototypical timbral descriptions, we focused on only a few of the categories: adjectives, nouns, instruments, temporal references and verbs, and this resulted in a vocabulary of 618 tags.
The rest of the tag categories were left for future analysis. Note that this meant discarding such commonly used descriptors as musical genres, which on the one hand provide an easy way to discriminate music  in terms of fairly broad categories, but on the other hand makes them hard to adequately define by virtue of this very same quality . This manuscript is devoted to exploring timbre and by extension the way people describe the general sound of a piece of music, hence the idea has been to explore the concepts that lie underneath the genre descriptions. For this reason, genre was utilized as the most significant semantic filter. The other discarded categories had their own reasons, for instance Personal and Locale contents are strongly centered in the individual's perspective, Artist contents are redundantly referring to the creator/performer of the music. The rest of the omissions concerned rare categories (e.g. unknown terms, expressions, commercial branches or recording companies) or not explicitly related with timbre (e.g. musical form, description of the lyrics); these were left out to simplify the results.
This study was supported by the Finnish Centre of Excellence in Interdisciplinary Music Research.
- Celma O, Serra X: FOAFing the Music: Bridging the semantic gap in music recommendation. Web Semantics. Science, Services and Agents on the World Wide Web 2008,6(4):250-256. [Semantic Web Challenge 2006/2007] 10.1016/j.websem.2008.09.004View ArticleGoogle Scholar
- Grey J: Multidimensional Perceptual Scaling of Musical Timbres. The Journal of the Acoustical Society of America 1977,61(5):1270-1277. 10.1121/1.381428View ArticleGoogle Scholar
- McAdams S, Winsberg S, Donnadieu S, De Soete G, Krimphoff J: Perceptual Scaling of Synthesized Musical Timbres: Common dimensions, specificities and latent subject classes. Psychological Research 1995,58(3):177-192. 10.1007/BF00419633View ArticleGoogle Scholar
- Burgoyne J, McAdams S: A Meta-analysis of Timbre Perception Using Nonlinear Extensions to CLAS-CAL. Computer Music Modeling and Retrieval. Sense of Sounds 2009, 181-202.Google Scholar
- Aucouturier JJ, Pachet F, Sandler M: "The Way it Sounds": Timbre models for analysis and retrieval of music signals. Multimedia, IEEE Transactions on 2005,7(6):1028-1035.View ArticleGoogle Scholar
- Alluri V, Toiviainen P: Exploring Perceptual and Acoustical Correlates of Polyphonic Timbre. Music Perception 2010,27(3):223-242. 10.1525/mp.2010.27.3.223View ArticleGoogle Scholar
- Lamere P: Social Tagging and Music Information Retrieval. Journal of New Music Research 2008,37(2):101-114. 10.1080/09298210802479284View ArticleGoogle Scholar
- Aucouturier JJ, Pampalk E: Introduction-From Genres to Tags: A little epistemology of music information retrieval research. Journal of New Music Research 2008,37(2):87-92. 10.1080/09298210802479318View ArticleGoogle Scholar
- Held C, Cress U: Learning by Foraging: The impact of social tags on knowledge acquisition. In Learning in the Synergy of Multiple Disciplines. 4th European Conference on Technology Enhanced Learning. Nice, France; 2009.Google Scholar
- Hesse F: Use and Acquisition of Externalized Knowledge. In Learning in the Synergy of Multiple Disciplines. 4th European Conference on Technology Enhanced Learning. Nice, France: Springer; 2009:5.View ArticleGoogle Scholar
- Chi E: Augmented Social Cognition: Using social web technology to enhance the ability of groups to remember, think, and reason. In Proceedings of the 35th SIGMOD International Conference on Management of Data, Providence. Rhode Island, USA; 2009.Google Scholar
- Kim H, Decker S, Breslin J: Representing and Sharing Folksonomies with Semantics. Journal of Information Science 2010, 36: 57-72. 10.1177/0165551509346785View ArticleGoogle Scholar
- Mathes A: Folksonomies-cooperative Classification and Communication through Shared Metadata. online 2004. [http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html]Google Scholar
- Lin H, Davis J, Zhou Y: An Integrated Approach to Extracting Ontological Structures from Folksonomies. In Proceedings of the 6th European Semantic Web Conference on The Semantic Web. Research and Applications. Heraklion, Greece: Springer; 2009:668.Google Scholar
- Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 1990,41(6):391-407. 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9View ArticleGoogle Scholar
- Bellegarda J: Latent Semantic Mapping: Principles & applications. Synthesis Lectures on Speech and Audio Processing 2007, 3: 1-101. 10.2200/S00048ED1V01Y200609SAP003View ArticleGoogle Scholar
- Sundaram S, Narayanan S: Audio Retrieval by Latent Perceptual Indexing. In Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on. IEEE; 2008:49-52.View ArticleGoogle Scholar
- Dumais S: Latent Semantic Analysis. Annual Review of Information Science and Technology (ARIST) 2004, 38::189-230.Google Scholar
- Eerola T, Ferrer R: Setting the Standards: Normative data on audio-based musical features for musical genres. In Proceedings of the 7th Triennial Conference of European Society for the Cognitive Sciences of Music. Jyväskylä, Finland; 2009.Google Scholar
- Levy M, Sandler M: Learning Latent Semantic Models for Music from Social Tags. Journal of New Music Research 2008,37(2):137-150. 10.1080/09298210802479292View ArticleGoogle Scholar
- Bertin-Mahieux T, Eck D, Maillet F, Lamere P: Autotagger: A model for predicting social tags from acoustic features on large music databases. Journal of New Music Research 2008,37(2):115-135. 10.1080/09298210802479250View ArticleGoogle Scholar
- Rentfrow P, Gosling S: Message in a Ballad. Psychological Science 2006,17(3):236-242. 10.1111/j.1467-9280.2006.01691.xView ArticleGoogle Scholar
- Delsing M, ter Bogt T, Engels R, Meeus W: Adolescents' Music Preferences and Personality Characteristics. European Journal of Personality 2008,22(2):109-130. 10.1002/per.665View ArticleGoogle Scholar
- Popescu I, Altmann G: Word Frequency Studies. Berlin: Walter de Gruyter; 2009.Google Scholar
- Levy M, Sandler M: A Semantic Space for Music Derived from Social Tags. In Proceedings of the 8th International Society for Music Information Retrieval Conference. Volume 1. Edited by: Dixon S, Bainbridge D, Typke R. Vienna, Austria: Österreichische Computer Gesellschaft; 2007:12.Google Scholar
- Zhang B, Xiang Q, Lu H, Shen J, Wang Y: Comprehensive Query-dependent Fusion Using Regression-on-folksonomies: A case study of multimodal music search. In Proceedings of the seventeen ACM international conference on Multimedia. Beijing, China: ACM; 2009:213-222.View ArticleGoogle Scholar
- Halpin H, Robu V, Shepherd H: The Complex Dynamics of Collaborative Tagging. In Proceedings of the 16th international conference on World Wide Webal Conference on World Wide Web. Banff, Alberta, Canada: ACM; 2007:220.Google Scholar
- Brank J, Grobelnik M, Mladenic D: Automatic Evaluation of Ontologies. In Natural Language Processing and Text Mining. Edited by: Kao A, RPoteet S. USA: Springer; 2007.Google Scholar
- Siskind J: Learning Word-to-meaning Mappings. In Models of Language Acquisition. Inductive and deductive approaches. USA: Oxford University Press; 2000:121-153.Google Scholar
- Zipf G: Human Behavior and the Principle of Least Effort. In An introduction to human ecology. addison-wesley press; 1949.Google Scholar
- Gower J, Legendre P: Metric and Euclidean Properties of Dissimilarity Coefficients. Journal of Classification 1986, 3: 5-48. 10.1007/BF01896809MATHMathSciNetView ArticleGoogle Scholar
- Walesiak M, Dudek A: ClusterSim. Searching for optimal clustering procedure for a data set. 2011. [R package version 0.39-2] http://CRAN.R-project.org/package=clusterSimGoogle Scholar
- R Development Core Team: R A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria; 2009. [ISBN 3-900051-07-0] http://www.R-project.orgGoogle Scholar
- Jain A, Dubes R: Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall; 1988.MATHGoogle Scholar
- Langfelder P, Zhang B, Horvath S: DynamicTreeCut. Methods for detection of clusters in hierarchical clustering dendrograms. 2009. [R package version 1.20] http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting/Google Scholar
- Tzanetakis G, Cook P: Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing 2002,10(5):293-302. 10.1109/TSA.2002.800560View ArticleGoogle Scholar
- Gjerdingen R, Perrott D: Scanning the Dial: The rapid recognition of music genres. Journal of New Music Research 2008,37(2):93-100. 10.1080/09298210802479268View ArticleGoogle Scholar
- Hoffman M, Blei D, Cook P: Easy as CBA: A simple probabilistic model for tagging music. In Proceedings of the 10th International Society for Music Information Retrieval Conference. Kobe, Japan; 2009.Google Scholar
- Mandel MI, Ellis DPW: A Web-based Game for Collecting Music Metadata. Journal of New Music Research 2008,37(2):151-165. 10.1080/09298210802479300View ArticleGoogle Scholar
- Turnbull D, Barrington L, Torres D, Lanckriet G: Towards Musical Query-by-semantic-description Using the CAL500 Data Set. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '07, New York, NY, USA: ACM; 2007:439-446.Google Scholar
- Jacobson K, Sandler M, Fields B: Using Audio Analysis and Network Structure to Identify Communities in On-line Social Networks of Artists. In Proceedings of the 9th International Society for Music Information Retrieval Conference. Edited by: Bello JP, Chew E, Turnbull D. Philadelphia, USA; 2008:269-274.Google Scholar
- Laurier C, Meyers O, Serrà J, Blech M, Herrera P, Serra X: Indexing Music by Mood: Design and integration of an automatic content-based annotator. Multimedia Tools and Applications 2010, 48: 161-184. [Springerlink link: http://www.springerlink.com/content/jj01750u20267426] [Springerlink link: ] 10.1007/s11042-009-0360-2View ArticleGoogle Scholar
- Bello J, Pickens J: A Robust Mid-level Representation for Harmonic Content in Music Signals. In Proceedings of the 6th International Society for Music Information Retrieval Conference. London, UK; 2005:304-311.Google Scholar
- Chu S, Narayanan S, Kuo CC: Environmental Sound Recognition With Time-frequency Audio Features. Audio, Speech, and Language Processing, IEEE Transactions on 2009,17(6):1142-1158.View ArticleGoogle Scholar
- Pampalk E, Rauber A, Merkl D: Content-based Organization and Visualization of Music Archives. In Proceedings of the tenth ACM international conference on Multimedia. Juan les Pins, France: ACM; 2002:579.Google Scholar
- Peeters G: A Large Set of Audio Features for Sound Description (Similarity and Classification) in the CUIDADO Project. CUIDADO IST Project Report 2004, 1-25.Google Scholar
- Juslin P: Cue Utilization in Communication of Emotion in Music Performance: Relating performance to perception. Journal of Experimental Psychology. Human perception and performance 6th edition. 2000, 26: 1797-1812.View ArticleGoogle Scholar
- Laukka P, Juslin P, Bresin R: A Dimensional Approach to Vocal Expression of Emotion. Cognition & Emotion 2005,19(5):633-653. 10.1080/02699930441000445View ArticleGoogle Scholar
- Jensen K: Timbre Models of Musical Sounds. Department of Computer Science, University of Copenhagen; 1999.Google Scholar
- Sethares W: Tuning, Timbre, Spectrum, Scale. Springer Verlag; 2005.Google Scholar
- Bello J, Duxbury C, Davies M, Sandler M: On the use of Phase and Energy for Musical Onset Detection in the Complex Domain. Signal Processing Letters, IEEE 2004,11(6):553-556. 10.1109/LSP.2004.827951View ArticleGoogle Scholar
- Lartillot O, Toiviainen P, Eerola T: A Matlab Toolbox for Music Information Retrieval. In Data Aalysis, Machine Learning and Applications. Edited by: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R. Berlin, Germany: Springer; 2008:261-268. Studies in Classification, Data Analysis, and Knowledge OrganizationView ArticleGoogle Scholar
- Harte C, Sandler M, Gasser M: Detecting Harmonic Change in Musical Audio. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia. Santa Barbara, CA, USA: ACM; 2006:26.Google Scholar
- Guyon I, Elisseeff A: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 2003, 3: 1157-1182.MATHGoogle Scholar
- Tzanetakis G, Cook P: Manipulation, Analysis and Retrieval Systems for Audio Signals. PhD thesis. Princeton University, Princeton, NJ; 2002.Google Scholar
- Fox J, Monette G: Generalized Collinearity Diagnostics. Journal of the American Statistical Association 1992,87(417):178-183. 10.2307/2290467View ArticleGoogle Scholar
- Breiman L: Random Forests. Machine Learning 2001, 45: 5-32. 10.1023/A:1010933404324MATHView ArticleGoogle Scholar
- Ripley B: Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press; 1996.MATHView ArticleGoogle Scholar
- Archer K, Kimes R: Empirical Characterization of Random Forest Variable Importance Measures. Computational Statistics & Data analysis 2008,52(4):2249-2260. 10.1016/j.csda.2007.08.015MATHMathSciNetView ArticleGoogle Scholar
- Pang H, Lin A, Holford M, Enerson B, Lu B, Lawton M, Floyd E, Zhao H: Pathway Analysis Using Random Forests Classification and Regression. Bioinformatics 2006,22(16):2028. 10.1093/bioinformatics/btl344View ArticleGoogle Scholar
- Nieweglowski L: CLV: Cluster validation techniques. 2009. [R package version 0.3-2] http://CRAN.R-project.org/package=clvGoogle Scholar
- Gower J: Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis. Biometrika 1966,53(3-4):325. 10.1093/biomet/53.3-4.325MATHMathSciNetView ArticleGoogle Scholar
- Cox M, Cox T: Multidimensional Scaling: Handbook of data visualization. USA: Chapman & Hall; 2001.Google Scholar
- Borg I, Groenen P: Modern Multidimensional Scaling: Theory and applications. Springer Verlag; 2005.Google Scholar
- Aucouturier JJ, Defreville B, Pachet F: The Bag-of-frames Approach to Audio Pattern Recognition: A sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America 2007,122(2):881-891. 10.1121/1.2750160View ArticleGoogle Scholar
- McKay C, Fujinaga I: Musical Genre Classification: Is it worth pursuing and how can it be improved. Proceedings of the 7th International Society for Music Information Retrieval Conference 2006, 101-6.Google Scholar
- Balota D, Cohane J: Semantic Memory. In Learing and Memory: A comprehensive reference, Volume 2 Cognitive Pyshology of Memory. Edited by: Byrne JH, III HLR. Oxford, UK: Academic Press; 2008:511-534.View ArticleGoogle Scholar
- Sandell G: Macrotimbre: Contribution of attack, steady state, and verbal attributes. The Journal of the Acoustical Society of America 1998, 103: 2966.View ArticleGoogle Scholar
- Ferrer R: Embodied Cognition Applied to Timbre and Musical Appreciation: Theoretical Foundation. British Postgraduate Musicology 2009., X: [http://www.bpmonline.org.uk/bpm10/ferrer_rafael-embodied_cognition_applied_to_timbre_and_musical_appreciation_theoretical_foundation.pdf]Google Scholar
- Kao A, Poteet SR (Eds): Natural Language Processing and Text Mining. Springer Verlag; 2006.Google Scholar
- Manning C, Schütze H: Foundations of Statistical Natural Language Processing. MIT Press; 2002.Google Scholar
- Bird S, Klein E, Loper E: Natural Language Processing with Python. Oreilly & Associates Inc; 2009.MATHGoogle Scholar
- Francis W, Kucera H: Brown Corpus. A Standard Corpus of Present-Day Edited American English, for use with Digital Computers. Department of Linguistics, Brown University, Providence, Rhode Island, USA; 1979.Google Scholar
- Fellbaum C (Ed): WordNet: An electronic lexical database. Language, speech, and communication, Cambridge, Mass: MIT Press; 1998.MATHGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.