String Built-in Functions
The following built-in functions are relevant to
string manipulation
-
chr(
i
) →
character
-
Return a string of one character with
ordinal i; 0 ≤
i
< 256.
-
len(
object
) →
integer
-
Return the number of items of a sequence or mapping.
-
ord(
c
) →
integer
-
Return the integer ordinal of a one character
string
-
repr(
object
) →
string
-
Return the canonical string
representation of the object. For most object types,
eval(repr(object)) == object.
-
str(
object
) →
string
-
Return a nice string representation
of the object. If the argument is a string,
the return value is the same object.
-
unichr(
i
) → Unicode
string
-
Return a Unicode string of one
character with ordinal i; 0 ≤
i
<
65536.
-
unicode(
string
,
[
encoding
, ]
[
errors
]) → Unicode string
-
Creates a new Unicode object from the given encoded
string.
encoding
defaults to the current default string
encoding and
errors
, defining the error
handling, to 'strict'.
For character code manipulation, there are three related
functions: chr, ord and
unichr. chr returns the ASCII
character that belongs to an ASCII code number.
unichr returns the Unicode character the belongs to
a Unicode number. ord transforms an ASCII character
to its ASCII code number, or transforms a Unicode character to its
Unicode number.
The len function returns the length of the
string.
>>>
len("abcdefg")
7
>>>
len(r"\n")
2
>>>
len("\n")
1
The str function converts any object to a
string.
>>>
a= str(355.0/113.0)
>>>
a
'3.14159292035'
>>>
len(a)
13
The repr function also converts an object to
a string. However, repr
usually creates a string suitable for use as
Python source code. For simple numeric types, it's not terribly
interesting. For more complex, types, however, it reveals details of
their structure. It can also be invoked using the reverse
quotes (`), also called accent grave,
(underneath the tilde, ~, on most keyboards).
>>>
a="""a very
...
long string
...
on multiple lines"""
>>>
print repr(a)
'a very\012long string\012on multiple lines'
>>>
print `a`
'a very\012long string\012on multiple lines'
This representation shows the newline characters
(\012) embedded within the triple-quoted
string. If we simply print a
or str(
a
), we would see the
string interpreted instead of represented.
>>>
a="""a very
...
long string
...
on multiple lines"""
>>>
print a
a very
long string
on multiple lines
The unicode(
string
,
[
encoding
, ]
[
errors
]) function converts the
string
to a specific Unicode external
representation. The default
encoding
is 'UTF-8'
with 'strict' error handling. Choices for
errors
are 'strict', 'replace' and 'ignore'. Strict raises an exception for
unrecognized characters, replace substitutes the Unicode replacement
character (\uFFFD) and ignore skips over invalid
characters. The codecs and
unicodedata modules provide more functions for
working with Unicode.