Storing metadata in files

Reading Lamp

Users can very easily sort the content of an XMP packet with the exempi command-line tool by using the -x option. This tool comes from the library of the same name found on Debian in the exempi package and in exempi-tools on openSUSE.

Listing 1 shows a shortened version of a typical XMP packet. The x:xmpmeta root element first wraps the data in the XMP packet. After this, the rdf:RDF tag creates the rdf namespace with the URI http://www.w3.org/1999/02/22-rdf-syntax-ns#.

Listing 1

PDF File XMP Packet

 

The XMP elements always reside in rdf:Description blocks. For these, the element rdf:about proves to be obligatory, although it always remains empty in XMP. XMP data is written in different styles. Commonly, the elements of a description class are collected in a Description environment, and then a respective namespace is created.

Three Dublin Core elements – dc:format, dc:title, and dc:creator – are found in the XMP packet. The title of the document is in an alternative list (rdf:Alt) in several language versions (xml:lang), and the publisher is in a sequential list (rdf:Seq).

Core XMP elements such as xmp:CreateDate and xmp:ModifyDate, which bear date stamps, are found in a further Description environment. It states here that Framemaker 8.0 has produced this document (xmp:CreatorTool, line 20). Specific description elements for PDF documents (xmlns:pdf) and data fields for media management (xmlns:xmpMM) follow in other blocks. Table 1 delivers an overview of some XMP elements and classes.

Table 1

A Selection of XMP Elements and Namespaces

Description

Content

Format

Dublin Core (http://purl.org/dc/elements/1.1/)

dc:title

Title of document or item

Alternative list with xml:lang

dc:creator

Producer (person or organization)

Ordered list

dc:publisher

Name of the publishing entity

Unordered list

dc:subject

Collection of keywords

Unordered list

dc:language

Language of the item

Unordered list with RFC 306 tags

dc:format

File format of the object

MIME type

dc:identifier

ISBN/ISSN, URN, DOI, and others

Text

XMP Core Elements (http://ns.adobe.com/xap/1.0/)

xmp:CreateDate

Object's date of production

Date stamp

xmp:CreatorTool

Tool of production

Text

xmp:MetadataDate

Modification date of the metadata

Date stamp

xmp:ModifyDate

Modification date of the object

Date stamp

xmp:Rating

Rating of the tool

Score from -1 to 5

XMP Rights Management (http://ns.adobe.com/xap/1.0/rights/)

xmpRights:Marked

Copyright marking

True/false

xmpRights:Owner

Rights holder

Unordered list

xmpRights:UsageTerms

License/use terms

Alternative list with xml:lang

XMP Media Management (http://ns.adobe.com/xap/1.0/mm/)

xmpMM:DocumentID

Identifier of the object

GUID stamp

xmpMM:InstanceID

Identifier of an object instance

GUID stamp

Python XMP Toolkit

XMP applications can be programmed without great effort with the help of a few Python libraries (see the "XMP and Exif with Python" box). The Python XMP Toolkit [7] was developed by the European Space Agency (ESA), among others, to manage images from the Hubble Telescope (Figure 1). The current version is 2.0.1.

XMP and Exif with Python

Free Python libraries for programming Exif applications also can handle XMP, they reside alongside the XMP Toolkit (although the Exif libraries do not support the same file formats). None of these tools is implemented in pure Python; instead, they are all bindings to available C- or C++ libraries.

Under the hood, the XMP Toolkit is a wrapper written with ctypes around Exempi [8], an offshoot of Adobe's official XMP Software Development Kit (current version 2.3.0, which is based on Adobe XMP SDK 4.1.1).

Pyexiv2 [9] is a binding to the Exiv2 C++ library [10], implemented with Boost.Python, which developers can use to program applications for Exif, IPTC, and XMP metadata (current version 0.3.2). Because no one is developing Pyexiv2, a switch to GExiv2 is recommended.

GExiv2 [11] is a wrapper around the Exiv2 library for the GObject programming environment (current version 0.10.3). The software supports GObject introspection, which Python programmers can access via PyGObject [12]. To do this, you need the gir1.2-gexiv2, python-gi, and python3-gi packages (e.g., on a Debian system). Then, use the following command:

from gi.repository import GExiv2

to import the library.

Figure 1: The Python XMP Toolkit helped sort the Hubble Telescope's numerous images.

If you install the XMP Toolkit's Linux package, you also get the necessary Exempi library on your computer. Until now, the Toolkit has only been available in a few distributions, such as in Debian and its offshoots, where it is within the python-libxmp and python3-libxmp packages. Alternatively, you can install it from the Python Packet Index [13].

The online documentation currently misses some of its parts [14]; Debian users are better off installing the python-libxmp-doc documentation package [15]. Alternatively, programmers can collect the documentation from the GitHub repo or scour the source code directly for the docstrings.

The libxmp.files.XMPFiles class controls the handling of files in the Python XMP Toolkit, and the libxmp.core.XMPMeta class offers a range of methods (functions) for manipulating XMP packets in memory. For contact with XMP, the Toolkit defines its own complex data object, with which Python's usual XML tools cannot cope (although this is not necessary). The example in Listing 2 demonstrates a few simple operations with the Toolkit in an IPython session.

Listing 2

Python XMP Toolkit Demo

 

Action Mode

The listing script imports both main classes and opens the requested file, loading the XMP packet via get_xmp() onto the myxmp memory object. If the file called still does not contain an XMP packet, the Toolkit creates an empty template with x:xmpmeta and the basic RDF framework in memory.

The consts module offers a range of substitutes for the common namespaces, meaning consts.XMP_NS_DC, for instance, represents the Dublin Core URI, and consts.XMP_NS_XMP represents that for the core XMP elements.

The get_localized_text() method returns certain datasets localized with xml:lang in alternative lists (e.g., those with x-default from dc:title). The XMP object can be manipulated in a targeted way with set_property(), for instance, by converting the x-default language setting to en. The set_localized_text() changes localized data in alternative lists, and in dc:title, it would expand a short German title with the localization xml:lang=de.

Next, get_property() again requests the dataset of the xmp:MetadataDate element this time. This is a date stamp in ISO 8601 format. Developers can create a new date stamp (now) with the Python library's datetime module [16] and overwrite xmp:MetadataDate in the XMP packet with set_property().

Alternatively, the set_property_datetime() method deals with date stamps. The can_put_xmp() method checks whether the opened file is write protected. If this is not the case, put_xmp() writes the file and close_file() closes it.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Metadata Analysis

    Detect operating systems, installed software, and more from easily collected metadata.

  • Programmatically change YouTube metadata

    Instead of manually editing the metadata of YouTube movies, video craftsman Mike Schilli dips into YouTube’s API spell book and lets a script automatically do the work.

  • Bcfg2

    The powerful Bcfg2 provides a sophisticated environment for centralized configuration management.

  • KTools: Treeline

    Mindmaps are useful for organizing ideas and data. Treeline is a promising mindmap tool for KDE.

  • Tutorials – ODF Metadata

    It is no secret that the native file format of LibreOffice and OpenOffice, the OpenDocument Format (ODF), is a truly open standard for word processing documents, spreadsheets, and presentations. What most people do not know is that ODF files contain lots of metadata that is very easy to read or modify.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News