Importing music with beets (part 1)

2018-05-02

beets is a tool for organizing music. It can auto import your music collection. But the auto import feature is missing some things in my opinion. I will work around some of it's limitations and report on the progress.

The currenct circumstances are:

Okay, let's dive into the sorting process.

beets settings

Everything I have changed in the beets settings (~/.config/beets/config.yaml) is

directory: /data/music/library/
library: /data/music/library.blb
import:
        move: yes

the first two lines are where my music library will be stored and the second line defines where the database which is used by beets is stored. The relevant setting for me is the fourth line. This tells beets to move files into the library folder which leaves me with folders that do not contain music files anymore

Automatic importing

First I am running beets automatic import feature over everything. The important point is to create a logfile, because this will help us with duplicates.

The command do this with is: beet import -q -l import.log

Removing duplicates

beets can find duplicates but it does not automatically remove them. We will take care of those duplicates now with the log file we made with the last step. The logfile now says something like this:

import started Wed May  2 15:05:42 2018
skip /data/music/F/F.R. - Mixtape
skip /data/music/F/FAME - Real Talk Mixtape
skip /data/music/F/FaSy - 33
skip /data/music/F/Fabricant - Demo 2010
skip /data/music/F/Face Of Ruin - Within The Infinite
skip /data/music/F/Faeces - Upstream
skip /data/music/F/Fail Emotions - 2009 - Side A
skip /data/music/F/Fail Emotions - 2010 - Dance Macabre
skip /data/music/F/Fail Emotions - 2010 - Make Bad
skip /data/music/F/Falconer - 2003 - The Sceptre Of Deception
duplicate-skip /data/music/F/Fall Out Boy - Infinity On High
skip /data/music/F/Fall Out Boy - Take This To Your Grave-Direct
skip /data/music/F/Farin Urlaub - Am Ende der Sonne
skip /data/music/F/Farin Urlaub - Die Wahrheit übers Lügen
skip /data/music/F/Farin Urlaub - Endlich Urlaub
skip /data/music/F/Farin Urlaub - Livealbum of Death
skip /data/music/F/Farin Urlaub - Porzellan
skip /data/music/F/Fasics - Ich tu was ich kann! - EP
duplicate-skip /data/music/F/Fear Factory - Archetype (2004) [320KB]
skip /data/music/F/Fear Factory - Demanufacture (1995) [320KB]
duplicate-skip /data/music/F/Fear Factory - Demanufacture [Remastered] (2005) [320KB]
duplicate-skip /data/music/F/Fear Factory - Digimortal (2001) [320KB]
skip /data/music/F/Fear Factory - Mechanize (2010) [320KB]
skip /data/music/F/Fear Factory - Transgression (2005) [320KB]
... [chopped of here]

All lines that started with duplicate-skip have automatically been detected as duplicates. They are not needed anymore and can be removed. A simple bash one-liner for this could look like this:

grep "^duplicate-skip" import.log | cut -d" " -f2- | while read line; do rm -r "$line"; done

First we find all lines that start with duplicate-skip, then we chop of the duplicate-skip to get the path. Then we delete each of the duplicates. Since building one-liners is dangerous I would first start with

grep "^duplicate-skip" import.log | cut -d" " -f2- | while read line; do echo rm -r "$line"; done

The little but important difference lies in the echo. This prints all the rm -r that would be executed into the shell to check them manually and prevent major skrew ups. When everything looks fine.

In my case it looks like this:

rm -r /data/music/F/Fall Out Boy - Infinity On High
rm -r /data/music/F/Fear Factory - Archetype (2004) [320KB]
rm -r /data/music/F/Fear Factory - Demanufacture [Remastered] (2005) [320KB]
rm -r /data/music/F/Fear Factory - Digimortal (2001) [320KB]
rm -r /data/music/F/Finntroll - Jaktens Tid
rm -r /data/music/F/Finntroll - Jaktens Tid (2001)
rm -r /data/music/F/Finntroll - Midnattens Widunder (1999)
rm -r /data/music/F/Finntroll - Nattfodd (2004)
rm -r /data/music/F/Finntroll - Nattfödd
rm -r /data/music/F/Finntroll - Trollhammaren (2004)
rm -r /data/music/F/Finntroll - Ur Jordens Djup
rm -r /data/music/F/Finntroll - Visor Om Slutet (2003)
rm -r /data/music/F/For The Fallen Dreams - Back Burner 2011
rm -r /data/music/F/For The Fallen Dreams - Changes
rm -r /data/music/F/For The Fallen Dreams - Wasted Youth (2012)
rm -r /data/music/F/For the Fallen Dreams - Relentless
rm -r /data/music/F/Fort Minor - The Rising Tied
rm -r /data/music/F/Frithjof Brauer - Nevertheless
rm -r /data/music/F/Frithjof Brauer - Tales from the past
rm -r /data/music/F/From Autumn to Ashes - Holding a Wolf by the Ears
rm -r /data/music/F/From Autumn to Ashes - The Fiction We Live
rm -r /data/music/F/From Autumn to Ashes - The Fiction We Live_
rm -r /data/music/F/From Autumn to Ashes - Too Bad You're Beautiful_
rm -r /data/music/F/Funeral for a Friend - Hours
rm -r /data/music/F/Funeral for a Friend - Memory and Humanity
rm -r /data/music/F/Funeral for a Friend - Seven Ways to Scream Your Name
rm -r /data/music/F/Funeral for a Friend - Tales Don't Tell Themselves
rm -r /data/music/F/Funeral for a Friend - Welcome Home Armageddon

Since these seem to match with the logfile we can probably use the command without the echo and get rid of the duplicates.

This code has one little corner case: When beets decides to groud to folders together e.g. if you have a 2 CD album with the different CDs in different folders the logfile does something like this:

duplicate-skip path1; path2

This screws up the command with the semicolon.

remove empty directories and other crap

Many of my music folders contain things like cover art, conversion logs or other files. Now it is time to get rid of them. A first and easy approach is to go for file extensions. This is only an approximation because technically the file extension is irrelevant. But it works pretty well.

To find all files I use find

find * -type f -print

This prints all files to the terminal. But we are only interested in the file extension which is everything behind the first dot. I dont care about files that don't have a file extension. They are probably no music files, so they will be deleted later.

find * -type f -name "*.*" -print | rev | cut -d"." -f1 | rev | sort -u > musicfileextensions.txt

The rev command is a little workaround for cut which can only keep the first but not the last field in which is has split up the input. The first rev reverses the input, then we cut and keep everything in front of the first . which was the ending of the files. The second rev roates our file extensions again. The sort -u sortes them and removes duplicates. The > musicfileextensions writes the list to a file. Now we have a list of all file endings that exist in the directory we are working on.

Mine looks like this:

JPG
MP3
db
gif
ini
jpg
log
m4a
mp3

My list is pretty short because I am only working on a subset of my collection. The final list might be longer.

Now I sadly have to edit this list manually. After removing every non music file extension the list is a little bit shorter.

MP3
m4a
mp3

This shows us something which is pretty annoying: People can not decide if the file extension is written in upper- or lowercase letters... We now could care about this and build all commands in a way which ignores the case of letters but since we gathered this list from all our music files this list should be complete and contain all occouring case-sensitive variations.

Time to get rid of some other files. find to the rescue (again)! We want to delete:

find * -type f ! -name "*.mp3" ! -name "*.MP3" ! -name "*.m4a" -print
find -type d -empty -print

The first command finds all non music files, the second one the directories. If we append -delete the files/directories are deleted. Use this with care, first try it without the -delete flag.

find * -type f ! -name "*.mp3" ! -name "*.MP3" ! -name "*.m4a" -print -delete
find -type d -empty -print -delete

Now we got rid of a lot of things (hopefully).

The next steps will be harder and I will talk about them in the next part.