Python - String Operations

String Operations
	Chapter 12. Strings

String Operations

There are a number of operations on strings, operations which create strings and operations which create other objects from strings.

There are three operations (+, *, [ ]) that work with all sequences (including strings) and a unique operation, %, that can be performed only with strings.

The + operator creates a new string as the concatenation of the arguments.

>>> 
"hi " + 'mom'

'hi mom'

The * operator between strings and numbers (number * string or string * number) creates a new string that is a number of repetitions of the input string.

>>> 
print 3*"cool!"

'cool!cool!cool!'

The [ ] operator can extract a single character or a slice from the string. There are two forms. The single item format is string [ index ] . Items are numbered from 0 to len( string ). Items are also numbered in reverse from −len( string ) to -1. The slice format is string [ start : end ] . Characters from start to end -1 are chosen to create a new string as a slice of the original string; there will be end − start characters in the resulting string. If start is omitted it is the beginning of the string (position 0), if end is omitted it is the end of the string (position -1).

>>> 
s="adenosine"

>>> 
s[2]

'e'
>>> 
s[:5]

'adeno'
>>> 
s[-5:]

'osine'
>>> 
s[5:]

'sine'

The String Formatting Operation, %. The % operator is sometimes call string interpolation, since it interpolates literal text and converted values. We prefer to call it string formatting, since that is a more apt description. This formatting is taken straight from the C library's printf function.

This operator has two forms. You can use it with a string and value as well as a string and a tuple. We'll cover tuples in detail later, but for now, it is a comma-separated collection of values in ()'s.

The string on the left-hand side of % contains a mixture of literal text plus conversion specifications. A conversion specification begins with %. For example, integers are converted with %i. Each conversion specification will use a corresponding value from the tuple. The first conversion uses the first value of the tuple, the second conversion uses the second value from the tuple. For example:

import random
d1, d2 = random.randrange(1,6), random.randrange(1,6)
r= "die 1 shows %i, and die 2 shows %i" % ( d1, d2 )

The first %i will convert the value for d1 to a string and insert the value, the second %i will convert the value for d2 to a string. The % operator returns the new string based on the format, with each conversion specification replaced with the appropriate values.

Conversion Specifications. Each conversion specification has from one to four elements, following this pattern:

%[ flags ][ width [. precision ]] code

The % and the final code in each conversion specification are required. The other elements are optional.

The optional flags element can have any combination of the following values:

-: Left adjust the converted value in a field that has a length given by the width element. The default is right adjustment.
+: Show positive signs (sign will be + or -). The default is to show negative signs only.
␣ (a space ): Show positive signs with a space (sign will be ␣ or −). The default is negative signs only.
#: Use the Python literal rules (0 for octal, 0x for hexadecimal, etc.) The default is decoration-free notation.
0: Zero-fill the the field that has a length given by the width element. The default is to space-fill the field. This doesn't make a lot of sense with the - (left-adjust) flag.

The optional width element is a number that specifies the total number of characters for the field, including signs and decimal points. If omitted, the width is just big enough to hold the output number. If a * is used instead of a number, an item from the tuple of values is used as the width of the field. For example, "%*i" % ( 3, d1 ) uses the value 3 from the tuple as the field width and d1 as the value to convert to a string.

The optional precision element (which must be preceded by a dot, . if it is present) has a few different purposes. For numeric conversions, this is the number of digits to the right of the decimal point. For string conversions, this is the maximum number of characters to be printed, longer strings will be truncated. If a * is used instead of a number, an item from the tuple of values is used as the precision of the conversion. For example, "%*.*f" % ( 6, 2, avg ) uses the value 6 from the tuple as the field width, the value 2 from the tuple as the precision and avg as the value.

The standard conversion rules also permit a long or short indicator: l or h. These are tolerated by Python so that these formats will be compatible with C, but they have no effect. They reflect internal representation considerations for C programming, not external formatting of the data.

The required one-letter code element specifies the conversion to perform. The codes are listed below.

%: Not a conversion, this creates a % in the resulting string. Use %% to put a % in the output string.
c: Convert a single-character string. This will also convert an integer value to the corresponding ASCII character. For example, "%c" % ( 65, ) results in "A".
s: Convert a string. This will convert non-string objects by implicitly calling the str function.
r: Call the repr function, and insert that value.
i, d: Convert a numeric value, showing ordinary decimal output. The code i stands for integer, d stands for decimal. They mean the same thing; but it's hard to reach a consensus on which is "correct".
u: Convert an unsigned number. While relevant to C programming, this is the same as the i or d format conversion.
o: Convert a numeric value, showing the octal representation. %#0 gets the Python-style value with a leading zero.
x, X: Convert a numeric value, showing the hexadecimal representation. %#X gets the Python-style value with a leading 0X; %#x gets the Python-style value with a leading 0x.
e, E: Convert a numeric value, showing scientific notation. %e produces ±d.ddd e ±xx, %E produces ±d.ddd E ±xx. For example 6.02E23.
f, F: Convert a numeric value, using ordinary decimal notation. In case the number is gigantic, this will switch to %g or %G notation.
g, G: "Generic" floating-point conversion. For values with an exponent larger than -4, and smaller than the precision element, the %f format will be used. For values with an exponent smaller than -4, or values larger than the precision element, the %e or %E format will be used.

Here are some examples.

"%i: %i win, %i loss, %6.3f" % (count,win,loss,float(win)/loss)

This example does four conversions: three simple integer and one floating point that provides a width of 6 and 3 digits of precision. ±0.000 is the expected format. The rest of the string is literally included in the output.

"Spin %3i: %2i, %s" % (spin,number,color)

This example does three conversions: one number is converted into a field with a width of 3, another converted with a width of 2, and a string is converted, using as much space as the string requires.


String Literal Values		String Comparison Operations