File encoding and content types
The platform runtime plug-in defines infrastructure for defining and discovering content types for data
streams. (See
Content types for an overview of the content framework.)
An important part of the content type system is the ability to specify different encodings (character sets)
for different kinds of content. The resources API further allows default character sets to be established for
projects, folders, and files. These default character sets are consulted if the content of the file itself
does not define a particular encoding inside its data stream.
Setting a character set
We've seen in
Content types that default file encodings can be established
for content types. More fine-grained control is provided by the resources API.
IContainer
defines protocol for setting the default character set for a particular project or folder. This gives
plug-ins (and ultimately the user) more freedom in determining an appropriate character set for a set of files when
the default character sets from the content type may not be appropriate.
IFile
defines API for setting
the default character set for a particular file. If no encoding is specified inside the file contents,
then this character set will be used. The file's default character set takes precedence over any default
character set specified in the file's folder, project, or content type.
Both of these features are available to the end-user in the properties page for a resource.
Querying the character set
IFile
also defines API for
querying the character set of a file. A boolean flag specifies whether only the character set explicitly
defined for the file should be returned, or whether an implied character set should be returned. For example:
String charset = myFile.getCharset(false);
returns null if no character set was set explicitly on myFile. However,
String charset = myFile.getCharset(true);
will first check for a character set that was set explicitly on the file. If none is found, then the content of the file
will be checked for a description of the character set. If none is found, then the file's containing folders and projects
will be checked for a default character set. If none is found, the default character set defined for the content type
itself will be checked. And finally, the platform default character set will be returned if there is no other designation
of a default character set. The convenience method getCharset() is the same as using getCharset(true).
Content types for files in the workspace
For files in the workspace,
IFile provides API for obtaining
the file content description:
IFile file = ...;
IContentDescription description = file.getDescription();
This API should be used even when clients are only interested in determining
the content type - the content type can be easily obtained from the content
description. It is possible to detect the content type or describe files in
the workspace by obtaining the contents and name and using the API described
in
Using content types, but that is not recommended. Content type determination
using IFile.getContentDescription() takes into account
project natures and project-specific settings. If you go directly to the content
type manager, you are ignoring that. But more importantly, because reading the
contents of files from disk is very expensive. The Resources plug-in maintains
a cache of content descriptions for files in the workspace. This reduces the
cost of content description to an acceptable level.