Unified system

In discussions about these systems, it was clear that the differences between the databases were simply a result of them being separate, and not due to any fundamental disagreements between developers. Everyone is keen to see them merged.

This spec proposes:

Further, the existing databases have been merged into a single package [SharedMIME].

Directory layout

There are two important requirements for the way the MIME database is stored:

  • Applications must be able to extend the database in any way when they are installed, to add both new rules for determining type, and new information about specific types.

  • It must be possible to install applications in /usr, /usr/local and the user's home directory (in the normal Unix way) and have the MIME information used.

The directories to be used to store the files in the database are:

  • /usr/share/mime/

  • /usr/local/share/mime/

  • ~/.mime/

In the rest of this document, paths shown with the prefix <MIME> indicate the files should be loaded from all the directries listed above. For example, Load all the <MIME>/text/html.xml files means to load /usr/share/mime/text/html.xml, /usr/local/share/mime/text/html.xml, and ~/.mime/text/html.xml (if they exist).

Where the information from these files is conflicting, information from directories lower in the list takes precedence.

Any file named Override.xml takes precedence over all other files in the same packages directory. Tools which let the user edit the database should edit the file ~/.mime/packages/Override.xml.

Each application that wishes to contribute to the MIME database will install a single XML file, named after the application, into one of the three <MIME>/packages/ directories (depending on where the user requested the application be installed). After installing, uninstalling or modifying this file, the application MUST run the update-mime-database command, which is provided by the freedesktop.org shared database[SharedMIME].

update-mime-database is passed the mime directory containing the packages subdirectory which was modified as its only argument. It scans all the XML files in the packages subdirectory, combines the information in them, and creates a number of output files:

  • <MIME>/globs (contains a mapping from extension to MIME type)

  • <MIME>/magic (contains a mapping from file contents to MIME type)

  • <MIME>/MEDIA/SUBTYPE.xml (one file for each MIME type, giving details about the type)

The format of these generated files and the source files in packages are explained in the following sections. This step serves several purposes. First, it allows applications to quickly get the data they need without parsing all the source XML files (the base package alone is over 700K). Second, it allows the database to be used for other purposes (such as creating the /etc/mime.types if desired). Third, it allows some validation to be performed on the input data, and removes the need for other applications to carefully check the input for errors themselves.

The source XML files

Each application provides only a single XML source file, which is installed in the packages directory as described above. This file is an XML file whose document element is named mime-info and whose namespace URI is http://www.freedesktop.org/standards/shared-mime-info. All elements described in this specification MUST have this namespace too.

The document element may contain zero or more mime-type child nodes, in any order, each describing a single MIME type. Each element has a type attribute giving the MIME type that it describes.

Each mime-type node may contain any combination of the following elements, and in any order:

  • glob elements have a pattern attribute. Any file whose name matches this pattern will be given this MIME type (subject to conflicting rules in other files, of course).

  • magic elements contain a list of match elements, any of which may match, and an optional priority attribute for all of the contained rules. Low numbers should be used for more generic types (such as 'gzip compressed data') and higher values for specific subtypes (such as a word processor format that happens to use gzip to compress the file). The default priority value is 50.

    Each child element can be any of string, host16, host32, big16, big32, little16, little32 or byte. Each of these elements has offset, type, value and, optionally, mask attributes. Each element corresponds to one line of file(1)'s magic.mime file. They can be nested in the same way to provide the equivalent of continuation lines.

  • comment elements give a human-readable textual description of the MIME type. There may be many of these elements with different xml:lang attributes to provide the text in multiple languages.

Applications may also define their own elements, provided they are namespaced to prevent collisions. Unknown elements are copied directly to the output XML files like comment elements.

Here is an example source file, named diff.xml:

<?xml version="1.0"?>
<mime-info xmlns='http://www.freedesktop.org/standards/shared-mime-info'>
  <mime-type type="text/x-diff">
    <comment>Differences between files</comment>
    <comment xml:lang="af">verskille tussen lêers</comment>
    ...
    <magic priority="50">
      <string offset="0" value="diff	"/>
      <string offset="0" value="***	"/>
      <string offset="0" value="Common subdirectories: "/>
    </magic>
    <glob pattern="*.diff"/>
    <glob pattern="*.patch"/>
  </mime-type>
</mime-info>

In practice, common types such as text/x-diff are provided by the freedesktop.org shared database. Also, only new information needs to be provided, since this information will be merged with other information about the same type.

The MEDIA/SUBTYPE.xml files

These files have a mime-type element as the root node. The format is as described above. They are created by merging all the mime-type elements from the source files and creating one output file per MIME type. Each file may contain information from multiple source files. The magic and glob elements will have been removed.

The example source file given above would (on its own) create an output file called <MIME>/text/x-diff.xml containing the following:

<?xml version="1.0" encoding="utf-8"?>
<mime-type xmlns="http://www.freedesktop.org/standards/shared-mime-info" type="text/x-diff">
<!--Created automatically by update-mime-database. DO NOT EDIT!-->
  <comment>Differences between files</comment>
  <comment lang="af">verskille tussen lêers</comment>
  ...
</mime-type>

The glob files

This is a simple list of lines containing a MIME type and pattern, separated by a colon. For example:

# This file was automatically generated by the
# update-mime-database command. DO NOT EDIT!
...
text/x-diff:*.diff
text/x-diff:*.patch
...

KDE's glob system replaces GNOME's and ROX's ext/regex fields, since it is trivial to detect a pattern in the form '*.ext' and store it in an extension hash table internally. The full power of regular expressions was not being used by either desktop, and glob patterns are more suitable for filename matching anyway.

Applications MUST first try a case-sensitive match, then a case-insensitive one. This is so that main.C will be seen as a C++ file, but IMAGE.GIF will still use the *.gif pattern.

If several patterns match then the longest pattern SHOULD be used. In particular, files with multiple extensions (such as Data.tar.gz) MUST match the longest sequence of extensions (eg '*.tar.gz' in preference to '*.gz'). Literal patterns (eg, 'Makefile') must be matched before all others. It is acceptable to match patterns of the form '*.text' before other wildcarded patterns (that is, to special-case extensions using a hash table).

There may be several rules mapping to the same type. They should all be merged. If the same pattern is defined twice, then they MUST be ordered by the directory the rule came from, as described above.

Common types (such as MS Word Documents) will be provided in the X Desktop Group's package, which SHOULD be required by all applications using this specification. Since each application will then only be providing information about its own types, conflicts should be rare.

The magic files

These files have a similar format to file(1)'s magic.mime file. Each line may be either a comment (starting with '#'), a new type (starting with '[') or a rule to match (anything else).

Type lines are in the form "[" PRIORITY ":" TYPE "]".

Match lines are in the form ">"* START [":" END] TAB TYPE ["&" MASK] TAB VALUE. The offsets may be a range in the form START:END. The rule is considered to match if there is a match at either of these offsets, or at any offset in-between. The line may start with zero or more ">" characters, as for the normal file syntax. Whitespace in the value (after the tab) is significant, and fields are separated by exactly one tab character.

The above example would create a magic file with these contents:

# This file was automatically generated by the
# update-mime-database command. DO NOT EDIT!
...
[50:text/x-diff]
0	string	diff 
0	string	*** 
0	string	Common subdirectories: 
...

Security implications

The system described in this document is intended to allow different programs to see the same file as having the same type. This is to help interoperability. The type determined in this way is only a guess, and an application MUST NOT trust a file based simply on its MIME type. For example, a downloader should not pass a file directly to a launcher application without confirmation simply because the type looks `harmless' (eg, text/plain).

Do not rely on two applications getting the same type for the same file, even if they both use this system. The spec allows some leeway in implementation, and in any case the programs may be following different versions of the spec.

User preferences

The MIME database is NOT intended to store user preferences. Although users can edit the database, this is only to provide corrections and to allow them to install software themselves. Information such as "text/html files should be opened with Mozilla" should NOT go in the database. However, it may be used to store static information, such as "Mozilla can view text/html files", and even information such as "Galeon is the GNOME default text/html browser" (via an extension element with a GNOME namespace).