Sunday, July 14, 2013

Questionable "help" from Gracenote, CDDB and other online music databases

I have ripped all my roughly three hundred (original) compact discs to listen them on my computer, mobile phone, wherever and whenever. A natural part of this collection is correct metadata - tracks have correct artists, song names and other related information. Now, I am not a professional of music archives and metadata, just someone who likes music and has some tens of gigabytes of legal music files.

At some point, using sophisticated rippers and players and online services like CDDB and later Gracenote (which iTunes uses), getting CD track lists automatically from the Internet became kind of easy and convenient. I noticed many people praising how easy ripping your CD's and maintaining your music collection had become. Friends of classical music had some problems, though: because music databases were designed for popular music (with a distinct "artist" and "song name"), it was difficult to encode the relevant information to the fields.

A problem relatively few seem to complain is the quality of this data. Much (if not all) of it is community-produced, and not all are interested in ensuring their information about the contents is correct, well formatted or clear of typos and other errors.

This is how my iTunes sees Higher State of Consciousness. (It is under the genre "House" simply because it seemed to fit the album in general.)
Let's take Josh Wink's 1995 club hit Higher State of Consciousness as an example. It was indeed a huge hit and it was and has since been released in many versions (or remixes). Josh Wink has elected to release his music under different pseudonyms (Josh Wink, Wink, Winc, Winx, Winks), often derived from his real name (Joshua Winkelman) - and this specific track was released under the pseudonym Wink. To add to this confusion, many releases have accidentally used the wrong pseudonym, or perhaps chosen to use Josh Wink, to conform to his usual artist name and to make it easier to associate this record with his other work.

The label for the original 12" maxi - with misspelled name for the track
Perhaps the most popular version of this song is what on the original 12" maxi was called "Version 3 - Tweekin Acid Funk". The name of the version (or mix/remix) is often omitted from CD sleeve information, even if it would be easily identifiable, or it might be mislabeled or erroneously reported as something different.

Forgetting that there are many different dashes and hyphens, there can still be a multitude of different ways of reporting the same track (here I assume that the character separating the artist and song is the colon):
  • Wink: Higher State of Consciousness (Tweekin Acid Funk Mix)
  • Wink: Higher State of Conciousness (Version 3 - Tweekin Acid Funk) (from original 12" maxi - yes, they got it wrong!)
  • Josh Wink: Higher State of Consciousness (Tweekin' Acid Funk Mix)
  • DJ Josh Wink: Wink - Higher State Of Consciousness
  • Wink: Higher State of Consciousness (John Wink's Tweakin Acid Funk Remix)
  • josh wink: higher stae of conciousness
  • Wink: Higher State of Consciousness [Tweekin Acid Funk RMX]
  • VARIOUS ARTISTS: This Is Strictly Rhythm Volume Five [1995] - Josh Wink - Higher State Of Consciousness
  • V/A: Higher State of Consciousness
  • PLASTIKMAN: highter states of conciousnes
All these are based on actual examples I have seen.

I'm listing these mostly because it's annoying that instead of the music database information helping you, it contains misspelled and incorrect information. I think that I have only encountered a couple of CD's for which the information you get from Gracenote (iTunes) is completely correct and also correctly spelled.

Of course, another problem is whether other information such as genre or year of release are correct. In electronic dance music, the genre is usually "Electronica", and I have no idea what that is. For me, this song like most of Josh Wink's music, would fall under the genre "Techno" or "Acid house", so I will have to change this information for most CD's I rip anyway.

Some of the above examples imply that the submitter has his or her own classification system, like putting the artist of the CD ("Various Artist") as part of the artist name of the track, in contrary to the idea of the database. With different kinds of parenthesis you might also mention featuring artists or other things, but once again, that is part of a personal naming convention, not a general one that a database should have.

With Discogs you can sometimes find records with incorrect track information or, for example, remix names that were omitted. Should you submit them to CDDB, too, or let them have whatever was written in the sleeve notes?

There are general metadata and archiving problem here, but largely community-supported music databases do not have professional standards or professional administrators who would certify their quality. Conventions are also dynamic, thus people may start to punctuate their submissions consistently in different ways. Nowdays it is common to separate the artist and the song name using a dash, not a colon! What a terrible sin, but since these are in different fields, it is up to the application to display these in some specific format.

This could be seen as an example of a personal informatics problem, maybe? You can approach it from the perspective of archiving and music libraries, but then again, it is your music collection, you just want to maintain it with reasonable effort, and if the "help" from CDDB and Gracenote makes it more difficult, you get annoyed or frustrated. Maybe you think you know better and introduce some other - just as incorrect - style, convention or misconception to the database.

My solution: write a rant about this to the blog and then keep on doing what you were doing all along, which is making sure that all song information is correct anyway. And keep ranting about it until the problem magically disappears.

No comments :

Post a Comment