Metadata

In DAM Survival Guide: Digital Asset Management Initiative Planning, David Diamond lays out the “primary components of digital asset management software” as:

Database
Permissions Engine
Metadata Engine
Search Engine
Processing Engine

In this section of the book (called “The Anatomy of DAM”), he uses the term “database” to refer to the mechanism for storing metadata, rather than the mechanism for storing the actual digital assets. I will attempt to be more explicit and refer to this portion of the system as the “metadata database”.

As I’ve quickly come to learn, metadata is the lifeblood of DAM. The extraction, storage, and searchability of a digital asset’s metadata is what gives DAM software its power and value.

Because of the importance of metadata, I’ve decided to make it the initial focus of my learning. While I have worked with several types of databases in the past, I have never used any of their full-text search capabilities (if they possessed any at all) other than simple sub-string matching within a column. It seems obvious to me that this would be an untenable option when needing to query thousands or millions of metadata at once!

The DAM Survival Guide was originally published in 2012, and a lot has changed in the world of databases since then. In that time, David was imploring his reader to choose a DAM software vendor that used MySQL, and to avoid vendors that used proprietary database technology. In 2024, the thought of developing a custom database technology for such a routine use case sounds like madness. And indeed, a more robust solution may actually be to use two types of commercially available databases for metadata. Specifically, one to serve as the source of truth (e.g., a SQL variant, MongoDB, or DynamoDB), and a replica which serves as the core of the search engine.

As mentioned above, this second type of database would need to be capable of doing full-text searching over dozens of metadata properties for thousands (or even millions) of digital assets both very quickly and accurately. Several full-text databases have evolved over the last decade, but the most popular seems to be ElasticSearch.

I’ve started a deep-dive into ElasticSearch to evaluate it’s fit as the “search engine” portion of the metadata database. So far, the features I’ve learned about seem very promising. Specifically, ElasticSearch stores data as JSON documents, which are perfectly suited for storing the key/value pair information which comprises metadata. Furthermore, it natively processes all text inputs upon ingestion, passing the text through a series of tokenizers and filters, creating what is known as an “inverted index” – which is what makes blazing-fast text matching possible!

The plan for now is to continue educating myself on the capabilities of ElasticSearch, while simultaneously learning about the requirements of metadata storage and searching within modern DAM software solutions.

If it ticks all the boxes, I will start building a prototype to put theory to the test.

Leave a Reply Cancel reply