?

Log in

No account? Create an account
The Carnatic music tagging project - Arvind Narayanan's journal [entries|archive|friends|userinfo]

The Carnatic music tagging project [Nov. 22nd, 2005|02:31 am]
Arvind Narayanan
[Tags|]

Everyone who listens to Carnatic, listen up.

I have a stash of several thousand Carnatic tracks, and I'm sure you do too, which is great. Trying to find something in there is a nightmare, however, and needs to be fixed.

There are two problems. The first is missing tag information. Personally I'm primarily interested in ragam info., but talam, composer etc. are also important. Much of my music has missing or incorrect tags. I even have some digitized vinyl recordings which don't even have track info.

The second is the myriad of spelling variants. Bombay Jayashri is also Bombay Jayashree is also Bombay S. Jayashri is also Bombay S. Jayashree. A ragam like shuddhasImAntini can be spelt in about 32 different ways by the average aspiration-challenged Tam. What's worse, cakravAkam is also spelt vEgavAhini, AbOghi is also karnATakadEvagAndhAri and so on. With simpleminded text search it is always going to be hit and miss to find what you're looking for.

As far I know there is neither a standardization mechanism (for instance, which id3 field to use for raga/tala, standardized ragam spelling and so on) nor a centralized database of tracks on the internet.

What I propose is a web-based collaborative semi-automated tagging project. I will write all the code (but of course other contributors are welcome). The user experience in the ideal state looks like this: you run a script on your computer and extract the tags of all your files, and upload them to the server. Then you use the web interface which will help you quickly (say within a couple of hours) correct wrong info or fill in missing info on your tracks. You then download the updated track info and run another script to update all your music files. If your collection has tracks that aren't in the database, they too get standardized in the process and added to the database. A lot of your standardized tags will also include lyrics.

I have fairly good idea how the server and the user interface are going to work, but it is not really relevant at the current moment. Suffice it to say that the server will match your tracks with those in the database and the interface will intelligently enable you to a edit a number of tracks with one action the combined effect of which is to allow you to rapidly standardize thousands of tracks.

To reach this ideal state, however, some ground work is needed, and the initial users will have to invest a little bit more effort, because a significant percentage of your tracks will be unknown to the database. Nevertheless, you will be able to standardize all your tags in way less time than it would take you to do all by yourself on your own computer.

If this thing gets off the ground, it will be huge. For instance, publishers of future albums (and websites), in their own best interests, will make sure their tags are standardized. If you would be willing to be a volunteer in the initial stages of this project, please reply to this post. If you know other people who listen to Carnatic and you think might be willing to help out please spam them and point them here. I need about 10 people willing to participate before I will start writing the code. Happy listening.
LinkReply

Comments:
[User Picture]From: skthewimp
2005-11-22 05:30 am (UTC)

duplication of standards...

just hope nobody else is doing the same stuff simultaneously which would create two independent standards... once you embark on this project (kudos for the idea) publicize it well enuff...
(Reply) (Thread)
[User Picture]From: arvindn
2005-11-22 10:22 am (UTC)

Re: duplication of standards...

Thanks. As for duplication, I did some googling and came up with nothing. So if someone else is doing it they are either proprietary or unknown. Besides I've had the idea for several years and in the course of that time nobody else has come up with something similar... no reason to suspect its happening now.
(Reply) (Parent) (Thread)
[User Picture]From: arvindn
2005-11-22 10:24 am (UTC)

Re: duplication of standards...

Sorry for the duplicate post, but even if there were two projects it would be relatively simple to write a script to convert from one to the other. The important thing is not so much what the standard is but that a standard exists. Someone needs to get off their ass and so something about the tagging mess that we have.
(Reply) (Parent) (Thread)
[User Picture]From: sunson
2005-11-24 02:42 am (UTC)

Re: duplication of standards...

The spelling problem with ragas, I "sorta" solved by refering to this as my standard. My script + emacs (auto-complete) helps me easily choose the right ragam from a list. Only later did I realize that all my music is tagged fairly neatly but that this ragalist isn't sufficient enough (comparison). and now I've been having plans to come up with a simple solution to find out the equivalents (soundex, levenshtein, etc.,.) between one list and the other. Ideas apart, I have no time ;/
(Reply) (Parent) (Thread)
[User Picture]From: koyaanisqatsi
2005-11-22 07:12 am (UTC)
This sounds a bit like MusicBrainz in principle, but I don't know if their user base covers Carnatic.
(Reply) (Thread)
[User Picture]From: arvindn
2005-11-22 10:17 am (UTC)
Yes, it's like MusicBrainz. In theory MusicBrainz covers everything, but in reality there's no Carnatic there because the software needs to be Carnatic-specific, due to several peculiarities of the genre.
(Reply) (Parent) (Thread)
[User Picture]From: kadambarid
2005-11-22 09:08 pm (UTC)
A lot of your standardized tags will also include lyrics.

A coupla months ago, a prof. down here asked me whether there was any site that offered such tags for Carnatic tracks and also, for old Hindi songs (say, when the Burmans ruled the roost). In particular, he asked whether I could recommend some software that kinda displays the lyrics of any song in his database (say, like in marquee) (if you've seen Karoake on [V] channel, you'll get the idea)- I said I'll check things out and let him know, but despite its brilliance never got around to it (hell, I didn't even google it up)!
Way cool!:D
(Reply) (Parent) (Thread)
[User Picture]From: sunson
2005-11-24 02:45 am (UTC)
I think there was one buggy karoake plugin for XMMS that I stopped using because it kept crashing. It attempted to do exactly what you are talking about -- lyrics in a 'karoake' like fashion, etc.,. But that was like 3 years ago. and to me, software music players are dead :)
(Reply) (Parent) (Thread)
[User Picture]From: koyaanisqatsi
2005-11-23 08:03 pm (UTC)
I see. All right then. As you were. ^_^
(Reply) (Parent) (Thread)
[User Picture]From: nivedita_n
2005-11-23 03:04 am (UTC)
Count me in.
(Reply) (Thread)
[User Picture]From: sunson
2005-11-24 02:37 am (UTC)
Count me in! So what you are talking about is nothing but the equivalent of a CDDB for MP3. I'm wondering how you plan to solve the problem of 'uniquely' identifying a track (that has no tags). For all you know, I might do some adjustments on your MP3 and alas, it becomes yet another 'file'.
(Reply) (Thread)
From: shivku
2005-11-24 02:51 am (UTC)
I guess the hope is that somebody else will have the same edited copy (like me ) and has actually taken the pains to put in the tags...
And regarding matching of two mp3s...How about Approximate string matching @ the file level ? I have done something similar on images ( and know it is possible, basically affinity of two images as a probability or smething to that effect )..But I dont even know if two mp3s with very little difference have little binary difference.
(Reply) (Parent) (Thread)
[User Picture]From: arvindn
2005-11-24 10:13 am (UTC)
The hope is that since there are a small number of compositions compared to the total number of tracks, if we can get everybody's databases together then there are going to be multiple renditions of each composition, and there is going to be a very high chance that one of them is going to be tagged correctly. A lot of my tracks have tags, but not all. So each person needs to contribute only a small percentage of tagged tracks.

I hadn't thought about string matching. The simplest way would be to compute the normalized amplitudes over 1-second time periods and do an (exact) substring match on those.
(Reply) (Parent) (Thread)
[User Picture]From: kadambarid
2005-11-24 08:46 pm (UTC)
(i)The hope is that since there are a small number of compositions compared to the total number of tracks, if we can get everybody's databases together then there are going to be multiple renditions of each composition, and there is going to be a very high chance that one of them is going to be tagged correctly.
(ii)Suffice it to say that the server will match your tracks with those in the database and the interface will intelligently enable you to a edit a number of tracks with one action the combined effect of which is to allow you to rapidly standardize thousands of tracks.
I dunno if I'm right but taking the above to points into consideration, listen up anyway- if a single composition has been sung in a different ragam (by more than one artiste), wouldn't that require different tags altogether (atleast the ragam and artiste info will be different) and hence there's every possibility that the track info might not match the one in the database,if only the composition is taken into consideration. I'm stressing on this because this unique to Carnatic music and hence think that each track would have to be considered distinct, if say missing info has to be filled as well.
This ought to be enough, though if one is looking only towards maintaining a standard and correcting wrong tag info!

(Reply) (Parent) (Thread)
[User Picture]From: kadambarid
2005-11-24 08:23 pm (UTC)
How about Approximate string matching @ the file level ? I have done something similar on images ( and know it is possible, basically affinity of two images as a probability or smething to that effect )..But I dont even know if two mp3s with very little difference have little binary difference.
Wouldn't that be too complicated for music (Carnatic at that- if it can be done for music, that is)- I'd done something similar for images, as well but don't think it's feasible for songs, on a large scale any way!
(Reply) (Parent) (Thread)
[User Picture]From: mssnlayam
2005-11-27 06:28 pm (UTC)

Count me in

I have not had a look at LJ for quite a while. Count me in, even though I do not completely understand the proposal.
(Reply) (Thread)
From: (Anonymous)
2007-03-01 10:44 am (UTC)

any development on tags for carnatic music?


Please let me know if there has been any progress on this issue. My friend has nearly 300GB of carnatic music and I would like to organize it using tags - that way the meta-data is not stored separately from the data and this can avoid all kinds of messy situations with moving files around etc. This also makes indexing and searching easy. Do you know if user defined tags can be added (for raga, composer etc)?

Thanks,

Arvind
(Reply) (Thread)
From: (Anonymous)
2007-10-22 12:35 pm (UTC)

Did this go anywhere?

I see the last entry dated 2005. This is something I would like to pursue. Did the effort proceed or is it something that has to be built up from scratch?
- srama5
(Reply) (Thread)
[User Picture]From: arvindn
2007-10-22 06:41 pm (UTC)

Re: Did this go anywhere?

Nope, never got started. This was a long time ago, and I've since moved on. If someone else is going to start it I'd be happy to chip in, though.
(Reply) (Parent) (Thread)