beets is a tool for organizing music. It can auto import your music collection. But the auto import feature is missing some things in my opinion. I will work around some of it's limitations and report on the progress.
The currenct circumstances are:
- I have a lot of music sorted in different ways. This means that I have multiple harddisks/folders with music on them. Each of them might be sorted differently, e.g. one with
$artist/$album/$number - $titleand an other one with
$artist - $album/$title. Some folders contain music that is also in others. Some may contain full albums, others might not. Some folders may contain the same song in a different file format Some folders may also contain non-music files, e.g. the cover art, log files from converting to an other file format or just other random things.
- I want to sort my music collection in one place, sorted by
$artist/$album/$number - $title
- I want to have everything just once
- I have ~30000 tracks in my collection that are already imported into beets and sorted properly.
- I have several 100GB of music laying around that has to be sorted.
- I want to do as less as possible by hand, because a manual approach does not scale well.
- I am not afraid to lose some of the things that are somewhere on some harddisk because the difference between not finding them because I am to lazy to search several disks and not having them is not relevant from a practical perspective.
Okay, let's dive into the sorting process.
Everything I have changed in the beets settings (
directory: /data/music/library/ library: /data/music/library.blb import: move: yes
the first two lines are where my music library will be stored and the second line defines where the database which is used by beets is stored. The relevant setting for me is the fourth line. This tells beets to move files into the library folder which leaves me with folders that do not contain music files anymore
First I am running beets automatic import feature over everything. The important point is to create a logfile, because this will help us with duplicates.
The command do this with is: beet import -q -l import.log
beets can find duplicates but it does not automatically remove them. We will take care of those duplicates now with the log file we made with the last step. The logfile now says something like this:
import started Wed May 2 15:05:42 2018 skip /data/music/F/F.R. - Mixtape skip /data/music/F/FAME - Real Talk Mixtape skip /data/music/F/FaSy - 33 skip /data/music/F/Fabricant - Demo 2010 skip /data/music/F/Face Of Ruin - Within The Infinite skip /data/music/F/Faeces - Upstream skip /data/music/F/Fail Emotions - 2009 - Side A skip /data/music/F/Fail Emotions - 2010 - Dance Macabre skip /data/music/F/Fail Emotions - 2010 - Make Bad skip /data/music/F/Falconer - 2003 - The Sceptre Of Deception duplicate-skip /data/music/F/Fall Out Boy - Infinity On High skip /data/music/F/Fall Out Boy - Take This To Your Grave-Direct skip /data/music/F/Farin Urlaub - Am Ende der Sonne skip /data/music/F/Farin Urlaub - Die Wahrheit übers Lügen skip /data/music/F/Farin Urlaub - Endlich Urlaub skip /data/music/F/Farin Urlaub - Livealbum of Death skip /data/music/F/Farin Urlaub - Porzellan skip /data/music/F/Fasics - Ich tu was ich kann! - EP duplicate-skip /data/music/F/Fear Factory - Archetype (2004) [320KB] skip /data/music/F/Fear Factory - Demanufacture (1995) [320KB] duplicate-skip /data/music/F/Fear Factory - Demanufacture [Remastered] (2005) [320KB] duplicate-skip /data/music/F/Fear Factory - Digimortal (2001) [320KB] skip /data/music/F/Fear Factory - Mechanize (2010) [320KB] skip /data/music/F/Fear Factory - Transgression (2005) [320KB] ... [chopped of here]
All lines that started with
duplicate-skip have automatically been detected as duplicates. They are not needed anymore and can be removed. A simple bash one-liner for this could look like this:
grep "^duplicate-skip" import.log | cut -d" " -f2- | while read line; do rm -r "$line"; done
First we find all lines that start with duplicate-skip, then we chop of the
duplicate-skip to get the path. Then we delete each of the duplicates. Since building one-liners is dangerous I would first start with
grep "^duplicate-skip" import.log | cut -d" " -f2- | while read line; do echo rm -r "$line"; done
The little but important difference lies in the
echo. This prints all the
rm -r that would be executed into the shell to check them manually and prevent major skrew ups. When everything looks fine.
In my case it looks like this:
rm -r /data/music/F/Fall Out Boy - Infinity On High rm -r /data/music/F/Fear Factory - Archetype (2004) [320KB] rm -r /data/music/F/Fear Factory - Demanufacture [Remastered] (2005) [320KB] rm -r /data/music/F/Fear Factory - Digimortal (2001) [320KB] rm -r /data/music/F/Finntroll - Jaktens Tid rm -r /data/music/F/Finntroll - Jaktens Tid (2001) rm -r /data/music/F/Finntroll - Midnattens Widunder (1999) rm -r /data/music/F/Finntroll - Nattfodd (2004) rm -r /data/music/F/Finntroll - Nattfödd rm -r /data/music/F/Finntroll - Trollhammaren (2004) rm -r /data/music/F/Finntroll - Ur Jordens Djup rm -r /data/music/F/Finntroll - Visor Om Slutet (2003) rm -r /data/music/F/For The Fallen Dreams - Back Burner 2011 rm -r /data/music/F/For The Fallen Dreams - Changes rm -r /data/music/F/For The Fallen Dreams - Wasted Youth (2012) rm -r /data/music/F/For the Fallen Dreams - Relentless rm -r /data/music/F/Fort Minor - The Rising Tied rm -r /data/music/F/Frithjof Brauer - Nevertheless rm -r /data/music/F/Frithjof Brauer - Tales from the past rm -r /data/music/F/From Autumn to Ashes - Holding a Wolf by the Ears rm -r /data/music/F/From Autumn to Ashes - The Fiction We Live rm -r /data/music/F/From Autumn to Ashes - The Fiction We Live_ rm -r /data/music/F/From Autumn to Ashes - Too Bad You're Beautiful_ rm -r /data/music/F/Funeral for a Friend - Hours rm -r /data/music/F/Funeral for a Friend - Memory and Humanity rm -r /data/music/F/Funeral for a Friend - Seven Ways to Scream Your Name rm -r /data/music/F/Funeral for a Friend - Tales Don't Tell Themselves rm -r /data/music/F/Funeral for a Friend - Welcome Home Armageddon
Since these seem to match with the logfile we can probably use the command without the
echo and get rid of the duplicates.
This code has one little corner case: When beets decides to groud to folders together e.g. if you have a 2 CD album with the different CDs in different folders the logfile does something like this:
duplicate-skip path1; path2
This screws up the command with the semicolon.
remove empty directories and other crap
Many of my music folders contain things like cover art, conversion logs or other files. Now it is time to get rid of them. A first and easy approach is to go for file extensions. This is only an approximation because technically the file extension is irrelevant. But it works pretty well.
To find all files I use
find * -type f -print
This prints all files to the terminal. But we are only interested in the file extension which is everything behind the first dot. I dont care about files that don't have a file extension. They are probably no music files, so they will be deleted later.
find * -type f -name "*.*" -print | rev | cut -d"." -f1 | rev | sort -u > musicfileextensions.txt
rev command is a little workaround for
cut which can only keep the first but not the last field in which is has split up the input. The first
rev reverses the input, then we cut and keep everything in front of the first
. which was the ending of the files. The second
rev roates our file extensions again. The
sort -u sortes them and removes duplicates. The
> musicfileextensions writes the list to a file. Now we have a list of all file endings that exist in the directory we are working on.
Mine looks like this:
JPG MP3 db gif ini jpg log m4a mp3
My list is pretty short because I am only working on a subset of my collection. The final list might be longer.
Now I sadly have to edit this list manually. After removing every non music file extension the list is a little bit shorter.
MP3 m4a mp3
This shows us something which is pretty annoying: People can not decide if the file extension is written in upper- or lowercase letters... We now could care about this and build all commands in a way which ignores the case of letters but since we gathered this list from all our music files this list should be complete and contain all occouring case-sensitive variations.
Time to get rid of some other files.
find to the rescue (again)! We want to delete:
- empty directories
- all files which are no music files (which in our simplified view means all files that dont have on of the music file extensions)
find * -type f ! -name "*.mp3" ! -name "*.MP3" ! -name "*.m4a" -print find -type d -empty -print
The first command finds all non music files, the second one the directories. If we append
-delete the files/directories are deleted. Use this with care, first try it without the
find * -type f ! -name "*.mp3" ! -name "*.MP3" ! -name "*.m4a" -print -delete find -type d -empty -print -delete
Now we got rid of a lot of things (hopefully).
The next steps will be harder and I will talk about them in the next part.