Temporal metadata isn’t currently being leveraged by the video industry — but it should be. No single development would be more important to the evolution of video, both online and offline, than evolving from today’s top level, directory-powered descriptions to a more robust, second-by-second metadata-driven system. This sea change will transform how users interact with digital video content across platforms, and how advertisers and content creators approach the medium.
But first, a working understanding of metadata as it relates to television content is important. So what exactly is temporal metadata? It is the descriptive information about content—title, cast, release date, promotional images — that are frequently used by programming guides. Without this embedded data, you would have no idea whether you were watching Law & Order: Criminal Intent or Law & Order: Special Victims Unit on late night cable.
Let’s take it a step further and consider temporal metadata in the same television context. While TV metadata alone is useful enough, it’s also boring and basic—a digitized version of an old TV Guide issue. The same core data taxonomies for television have been used for decades. While this metadata accurately describes the content of a television episode as a whole, it doesn’t provide any details about individual scenes. What’s the name of that actor? What song is playing in the background? Where was that waterfall scene filmed? Where did that actress get her dress? This sort of scene-based data is referred to as temporal metadata; it applies not only to featured programming, but to commercials as well (if not even more so).
The problem is that temporal metadata is currently not embedded — or “watermarked” — within programming, but it should be. While there is a behind-the-scenes effort within the industry to develop a common metadata format which would allow creation level tagging, the fruition of that process is probably still years in the future. In the meantime, mobile and second screen devices have provided a reliable workaround in the form of Automatic Content Recognition (ACR) technology, which identifies — or “fingerprints” — content using assorted queues, mostly audio. And let’s not stop at television. Rather, let’s bring temporal metadata to every video-based medium on and offline. As with TV, web-based video can benefit from temporal metadata’s nano-level insights.
One such example dates back to 2011 when Google’s YouTube platform first started automatically adding captions to its content by using speech recognition technology across some videos. Although this clearly has benefits to web users who may be hard of hearing it, this application also clearly provides queryable metadata for the search giant to more accurately sell advertising against. I expect this to eventually play out across Google’s terrestrial video initiative Google TV as well.
With this in mind, let’s look at the single most commercially important aspect of temporal metadata: advertising. More specifically, because advertisers and brands strive with every fiber of their being to be contextually relevant, the potential monetization models presented by temporal metadata and complementary second screen apps are both far reaching and high-impact. In the future, apps leveraging temporal metadata could recognize objects on the screen and instantly direct viewers to online purchasing options, product placement would become so much more powerful — completely eclipsing the 30-second spot. Furthermore, the ability of service providers to disaggregate a show into component parts would lead to stronger programming recommendation and search options, which in turn would allow for more accurate user preference settings and personalized marketing opportunities.
Startups like Vobile, Zeitera, and Civolution are developing software for smart TV platforms, but there is currently no large-scale commercial deployment of ACR by any of the major television manufacturers. That means if you want to figure what song is playing during a basketball shoe commercial, the one option you don’t have is simply pressing a button on the remote and pulling up the song title and artist—not to mention info about the shoe.
People spend more than 60 percent of their leisure time watching television. Changing that passive viewing experience into an active one (i.e., engaging, searching, shopping) is like finding a gold mine on top of an oil deposit. While we’ve been saying this for years, temporal metadata is in fact the key we’ve been looking for to unlock this potential.