Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

5.1.3 Unicode Strings

This manual section was written by Marc-Andre Lemburg mal at lemburg.com.
Python supports characters in different languages using the Unicode standard. Unicode data can be stored and manipulated in the same way as strings.

For example, creating Unicode strings in Python is as simple as creating normal strings:

    >>> u'Hello World !'
    u'Hello World !'

The prefix ‘u’ in front of the quote indicates that a Unicode string is to be created. If you want to include special characters in the string, you can do so using the Python Unicode-Escape encoding. The following example shows how:

    >>> u'Hello\u0020World !'
    u'Hello World !'

The escape sequence \u0020 inserts the Unicode character with the hexadecimal value 0x0020 (the space character) at the given position.

There is also a raw mode like the one for normal strings, using the prefix ‘ur’ to specify Raw-Unicode-Escape encoding of the string. It will only apply the above \uXXXX conversion if there are an uneven number of backslashes in front of the small 'u'.

Python provides additional functions for manipulating Unicode strings. The built-in function unicode() provides access to standard Unicode encodings such as latin-1, ascii, utf-8, and utf-16. The default encoding is normally set to ascii, which passes through characters in the range 0 to 127 and rejects any other characters with an error. When a Unicode string is printed, written to a file, or converted with str(), conversion takes place using this default encoding.

    >>> u"abc"
    u'abc'
    >>> str(u"abc")
    'abc'
    >>> u"\u00e4\u00f6\u00fc"
    u'\xe4\xf6\xfc'
    >>> str(u"\u00e4\u00f6\u00fc")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeError: ASCII encoding error: ordinal not 
      in range(128)

To convert a Unicode string into an 8-bit string using a specific encoding, Unicode objects provide an encode() method that takes one argument, the name of the encoding.

    >>> u"\u00e4\u00f6\u00fc".encode('utf-8')
    '\xc3\xa4\xc3\xb6\xc3\xbc'

If you have data in a specific encoding and want to produce a corresponding Unicode string from it, you can use the unicode() function with the encoding name as the second argument.

    >>> unicode('\xc3\xa4\xc3\xb6\xc3\xbc', 'utf-8')
    u'\xe4\xf6\xfc'

 
 
  Published under the terms of the Python License Design by Interspire