Engineering Case Study

Rekordbox Metadata Enrichment: Filling the Gaps

A Python CLI that reads a Rekordbox library export, looks up each track against MusicBrainz and Discogs, scores candidate matches by confidence, and writes a delta XML containing only the tracks that changed, ready to import straight back into Rekordbox. In a library of 2,500+ tracks with over a thousand missing label data entirely, five iterative runs pushed the enrichment rate from 436 tracks to 876.

Type Engineering Case Study

Domain AI · Music Tools · Data Pipelines

Stack Python · MusicBrainz API · Discogs API · MiniMax · Groq · Gemini

Status Active

2,500+

tracks processed per run

800+

tracks enriched

~65

LLM calls per full run

~90 min

first cold run

The Problem

Rekordbox auto-analyses BPM and key reliably on import. Label, year, remixer and album are a different story.

A library built over years from Bandcamp, Juno, Beatport and Traxsource purchases, SoundCloud downloads, promo folders and other sources accumulates inconsistencies. Those fields are often blank, wrong, or formatted in ways that carry no meaning. A track tagged with a label, a year and a recognisable release context tells a much richer story than one with empty fields.

That matters because metadata is not cosmetic. MixLab, the AI mix curation tool in this ecosystem, uses metadata as contextual input when building set concepts and narratives. “Classic Warp era” means something. “Metalheadz 2019 reissue” means something. “Label: blank” means nothing.

In a library of more than 2,500 tracks, over a thousand were missing label entirely. Many had year set to 0. Album values were inconsistent: some were genuine release names, some were Apple Music style single titles, some were little more than the original download folder. The information existed somewhere. The library was not carrying it.

Both MusicBrainz and Discogs hold large release databases. The challenge was not whether the data existed. The challenge was matching the right release metadata to the right Rekordbox tracks in a safe, repeatable way.

The Solution

Rekordbox Metadata Enrichment is a Python CLI that reads a Rekordbox XML export, looks up each track against MusicBrainz and Discogs, scores candidate matches by confidence, and writes a delta XML containing only the tracks that changed, ready to import back into Rekordbox.

A few design decisions shaped the tool from the start.

Delta output, not full library replacement. The output XML contains only tracks that changed. Once the cache is warm, the export can be very small, which keeps Rekordbox import fast and low-risk.
Colour as confidence signal. Rather than silently applying guesses, the Rekordbox Colour field is repurposed as a visual confidence indicator. Enriched tracks are colour-coded by how confident the match was, making uncertain results visible at a glance inside Rekordbox itself.
Permanent cache. Metadata for a processed track does not need to be looked up again. Once resolved, it is stored and reused on later runs. Cold runs take around 90 minutes; warm runs are near-instant.
Failures are never cached. Tracks with no match are always retried on future runs. Every improvement in query logic automatically recovers previously missed tracks without any cache cleanup.

System Architecture

Each track goes through the same pipeline on every run. The pipeline is intentionally sequential: cache check first to avoid redundant API calls, then MusicBrainz, then Discogs if MusicBrainz alone is not sufficient, then an LLM only if the result is still ambiguous.

Two-row pipeline flow: Rekordbox XML input, Cache Check, MusicBrainz API, Score Candidates in row one; Discogs API, Score and Auto-Enrich, LLM Disambiguation cascade, Delta XML output in row two. — Cache check, MusicBrainz, scoring, Discogs if needed, scoring again, LLM disambiguation only if still ambiguous. The LLM amber highlight reflects that it is used sparingly, around 65 calls per full cold run across 2,354 tracks. Only changed tracks are written to the delta XML.

Library

Rekordbox XML Export

2,500+ tracks · artist, title, BPM, Camelot key · label, year, album, remixer fields to be enriched

Cache Check

Skip track if already enriched and cached · failures are never cached and are always retried on subsequent runs
MusicBrainz API Lookup

Query release database by normalised artist and title · returns candidate releases with label, year, album data
Score Candidates

Confidence threshold check · high-confidence MusicBrainz result goes straight to enrichment · low-confidence triggers Discogs lookup
Discogs API Lookup

Queried when MusicBrainz alone is insufficient · particularly strong for underground electronic music where commercial release databases have gaps
Score Again + Auto-Enrich

Re-score all candidates from both sources · high-confidence result applies enrichment and marks green · still ambiguous triggers LLM
LLM Disambiguation

MiniMax M2.7 first, then Groq, then Gemini 2.5 Flash · used only to resolve between grounded candidates · orange confidence colour applied · if no confident decision, track is left unenriched rather than guessed at

Delta XML: only changed tracks, ready for Rekordbox import

Unresolved playlist: no-match tracks exported for review inside Rekordbox

The Discogs lookup was the most important source for this particular library. MusicBrainz covers mainstream releases thoroughly but has thinner coverage of underground electronic music, white labels and small-run digital releases. Discogs fills that gap. It is also the database where the key fix in Phase 1 was required, covered below.

Query normalisation matters throughout. Titles arrive from Rekordbox in all kinds of states: multi-artist entries, mix designators, feature credits, BPM fragments that have leaked into track names, catalogue numbers embedded in titles. Stripping and normalising all of that before lookup is what separates a useful match rate from a poor one.

The Confidence System

Applying metadata silently at the same apparent confidence level is not good enough. Some matches are deterministic. Some are the result of LLM reasoning. Some are heuristic guesses for tracks that no public database is ever likely to cover. The library deserves to know the difference.

The Colour field in Rekordbox is normally used as a personal crate organisation tool. This tool repurposes it as a confidence signal, surfacing enrichment quality directly in the DJ software where the tracks will actually be used. This works for me because I do not use the colour field for anything else. Anyone who already has a colour-coding system in place should be aware that enrichment will overwrite those values.

The import is manual. The output is a delta XML (only changed tracks) loaded back into Rekordbox by hand. Rekordbox does not auto-apply changes; you load the file and review what it brings in. That makes the colour confidence system genuinely useful: before accepting anything, you can see at a glance which results are trustworthy and which are worth checking, and decide whether to keep them.

Colour	Meaning	Action
● Green	High-confidence match from MusicBrainz or Discogs	None, safe to trust
● Orange	LLM used to disambiguate between candidates	Optional review
● Red	Low-confidence or heuristic metadata applied	Manual review recommended
○ None	No match found, track left unenriched	Appears in unresolved playlist

The key shift here was treating uncertainty as information rather than something to hide. The tool stopped pretending every result was equally trustworthy and gave the library a practical review workflow to act on. A track with red or orange metadata is still more useful than a track with blank fields, but the colour tells you how much to trust it before you go digging.

A small heuristic layer also helped with tracks that no public database was likely to cover: bootlegs, white labels, unofficial edits. Those now receive meaningful fallback values instead of staying blank, and are marked accordingly.

The Journey

The implementation went through several iterations. Three phases mattered most.

Phase 1: A healthy run with the wrong result

The first full run looked promising on paper. Hundreds of tracks were enriched and there were no API errors. But opening the export XML showed something was clearly wrong: every Label field was still blank.

The report confirmed it. Year and album were being updated, but not label. That pointed to the real issue: Discogs, the source expected to do most of the heavy lifting for underground electronic music, was not actually contributing anything.

The problem was a bad query filter. Discogs had been given a literal format string that matched nothing, so every lookup silently returned zero results. One line fixed it. With the format restriction removed, Discogs immediately began returning useful label data.

The architecture was sound, but one bad assumption had effectively disabled the most important source for this library. The failure was completely invisible. The pipeline ran cleanly, the numbers looked reasonable, and everything appeared to be working. The export was the only signal.

Phase 2: Making confidence visible

With label data flowing, the next question was how to expose uncertainty safely. Leaving everything quietly applied at the same apparent confidence level was not good enough. The library needed to distinguish between results worth trusting and results worth checking.

The answer was to surface confidence inside Rekordbox itself using the Colour field. High-confidence results could be accepted immediately. Lower-confidence results were still applied but now stood out clearly for manual review. Colour confidence moved from optional to default.

The heuristic layer for tracks unlikely to be in any public database also landed here. Bootlegs and white labels got meaningful fallback values instead of staying blank, and were marked red. Better to have plausible metadata flagged as uncertain than empty fields that contribute nothing to downstream tools.

Phase 3: Better queries, fewer misses

With the main bug fixed and confidence made visible, the remaining challenge was recall. The unresolved list showed clear patterns.

Multi-artist entries were being queried too literally
Mix designators and feature credits were polluting titles
BPM or key fragments had leaked into some track names
Catalogue numbers were embedded in track titles
Built-in Rekordbox demo tracks were being sent to live APIs

Improving artist and title normalisation recovered a significant number of previously missed tracks. Primary artist fallback, stripping feature credits, removing mix markers, cleaning trailing metadata and excluding demo content all pushed the enrichment rate higher.

The most useful improvement, though, was not a query tweak. It was turning unresolved tracks into a first-class output. Instead of burying them in a text report, the tool now exports playlists that let unresolved tracks be reviewed directly inside Rekordbox. That made the process far easier to inspect and act on.

Results

Each run built on the last. The five-run progression shows the effect of each improvement directly in the enrichment numbers.

Run	Enriched	No Match	Change
1: Discogs effectively disabled	436	801	Baseline
2: Discogs fixed	506	613	+70 enriched
3: Colour confidence + heuristics	655	628	+149 enriched
4: Query improvements	715	535	+60 enriched
5: Additional normalisation fixes	876	354	+161 enriched

A few headline numbers from the latest run:

2,354 tracks processed, excluding SoundCloud entries and Rekordbox demo tracks
876 tracks enriched
354 unresolved
Around 65 LLM calls per full run
Zero API errors across all runs
Around 90 minutes for a cold run, near-instant when the cache is warm

The enrichment rate is not perfect, but that is part of the point. The tool is designed to prefer trustworthy results over aggressive guesswork. A track that remains unenriched is better than a track with silently wrong metadata. The unresolved playlist makes the gaps visible and actionable.

Why the LLM Is There At All

This is not an AI-first tool. It is a metadata pipeline built on deterministic sources.

MusicBrainz and Discogs do the real work. The LLM is used only when multiple plausible candidates remain and a lightweight reasoning step can help decide between them. That keeps the process grounded, cost-aware and explainable. Around 65 LLM calls per cold run across 2,354 tracks is a small fraction of total lookups, and most of those are genuinely ambiguous cases where a human would have to make the same judgement call.

This also follows the same pattern used elsewhere in the ecosystem: a provider cascade with lower-cost models for first-pass work, stronger models held back for the smaller number of cases where deeper reasoning genuinely adds value. If the LLM cannot make a confident decision between candidates, the track is left unenriched rather than guessed at.

In Practice

The measure of an enrichment pipeline is not the enrichment rate. It is what downstream tools do with the data.

With label, release year and Discogs genre tags now flowing into the MixLab pipeline, the model has context it could not previously infer. The Gutterfunk imprint is being read as a quality and arrangement-depth signal. Sneaker Social Club is surfacing as a marker of contemporary UK breaks sensibility. Finger Lickin’ Records credits and Plump DJs remix attributions are being treated as reliable indicators of arrangement quality. Moving Shadow releases are being recognised as heritage markers. Output like “a genuine peak weapon from the golden era of progressive breaks” only becomes possible when the pipeline knows where a track came from and when.

Discogs also surfaces style and scene tags that go beyond what Rekordbox captures. That gives MixLab a clearer picture of what a track sits alongside and which genres it would cross into, without having to infer it from the title or artist name.

Before enrichment, the LLM was reasoning from artist names and titles alone. With label, year and genre context in the prompt, it has something concrete to work with. The difference shows in the specificity of the output.

What’s Next

There are still obvious areas to improve.

Concurrent processing. The current pipeline processes one track at a time. A bounded concurrent model would cut that 90-minute cold run to a fraction of the time.
Smarter artist normalisation. Real libraries contain messy metadata: duplicated feature credits, inconsistent spacing, stray territory markers and artist separators used in unpredictable ways. Cleaning more of that before lookup should recover additional matches.
Detecting swapped artist and title. A small number of tracks have the fields reversed entirely. Those are hard to match automatically and are better treated as review candidates unless detected up front.
Richer metadata targets. Release title and remixer are still useful contextual signals for MixLab and are not yet being enriched consistently.

The Ecosystem

This tool is one part of a wider system built around a single Rekordbox library and a long history of recorded mixes.

TuneFinder discovers new music weekly, ranking candidates from Beatport, Juno, Bandcamp and other sources against a taste profile inferred from years of recorded mixes
Rekordbox Metadata Enrichment fills in the release metadata gaps so the library is more useful to both Rekordbox and MixLab
MixLab uses that enriched library to generate structured mix concepts with narrative arcs, surfacing forgotten tracks and sequencing them for harmonic compatibility
The Mixes API catalogues recorded SoundCloud mixes, including tracklists, dates, moods and transitions, making the back catalogue queryable as structured data
changsta.com is the public-facing DJ site, with an AI search layer that lets visitors explore mixes and music in natural language

Each layer feeds the next. Better discovery builds a richer library. Better metadata produces stronger mix concepts. The enrichment tool sits in the middle of that chain; its output is only as visible as the downstream tools that consume it, but the difference it makes to MixLab’s reasoning is the whole reason it was built.

Tech Stack

Rekordbox Metadata Enrichment GitHub ↗

Python lxml Rekordbox XML MusicBrainz API Discogs API MiniMax M2.7 Groq fallback (Llama 3.3 70B) Gemini 2.5 Flash fallback JSON cache

Case Studies

Get in touch

Building something similar?

Projects like this are built around a simple pattern: identify where a data gap is causing downstream quality problems, build a repeatable pipeline to close it, and design the output to fit naturally into the tools already in use.

I enjoy designing practical data enrichment systems like this. If you are working on something in this space, feel free to get in touch.

Get in touch LinkedIn