Python Read Input One Letter at a Time
7. Strings¶
7.1. A compound data blazon¶
So far nosotros have seen five types: int , bladder , bool , NoneType and str . Strings are qualitatively dissimilar from the other four because they are fabricated up of smaller pieces — characters.
Types that comprise smaller pieces are called chemical compound information types. Depending on what we are doing, we may desire to treat a compound data type every bit a single thing, or we may want to access its parts. This ambiguity is useful.
The subclass operator selects a single character from a cord:
>>> fruit = "banana" >>> letter = fruit [ one ] >>> print letter The expression fruit[i] selects character number 1 from fruit . The variable alphabetic character refers to the result. When we display letter , nosotros get a surprise:
The first letter of "assistant" is not a , unless yous are a computer scientist. For perverse reasons, computer scientists always first counting from zero. The 0th letter ( zero-eth ) of "assistant" is b . The 1th letter ( ane-eth ) is a , and the 2th ( two-eth ) letter of the alphabet is northward .
If yous want the zip-eth letter of a string, you simply put 0, or any expression with the value 0, in the brackets:
>>> letter = fruit [ 0 ] >>> print alphabetic character b The expression in brackets is called an alphabetize. An index specifies a member of an ordered ready, in this case the ready of characters in the cord. The alphabetize indicates which one you desire, hence the name. Information technology tin be any integer expression.
7.2. Length¶
The len function returns the number of characters in a string:
>>> fruit = "banana" >>> len ( fruit ) 6 To go the terminal letter of a cord, you might exist tempted to endeavour something like this:
length = len ( fruit ) terminal = fruit [ length ] # ERROR! That won't piece of work. It causes the runtime error IndexError: string alphabetize out of range . The reason is that in that location is no 6th letter of the alphabet in "assistant" . Since nosotros started counting at zero, the half dozen letters are numbered 0 to 5. To become the last character, we take to decrease one from length :
length = len ( fruit ) concluding = fruit [ length - 1 ] Alternatively, we tin can apply negative indices, which count backward from the end of the cord. The expression fruit[-1] yields the terminal letter of the alphabet, fruit[-two] yields the 2nd to concluding, and and so on.
seven.iii. Traversal and the for loop¶
A lot of computations involve processing a string one character at a time. Often they start at the first, select each character in turn, do something to information technology, and continue until the end. This blueprint of processing is called a traversal. One way to encode a traversal is with a while statement:
index = 0 while alphabetize < len ( fruit ): letter = fruit [ index ] print letter index += 1 This loop traverses the cord and displays each letter on a line by itself. The loop status is index < len(fruit) , and then when index is equal to the length of the string, the status is false, and the body of the loop is not executed. The concluding character accessed is the ane with the alphabetize len(fruit)-1 , which is the last grapheme in the string.
Using an alphabetize to traverse a set up of values is then common that Python provides an alternative, simpler syntax — the for loop:
for char in fruit : print char Each time through the loop, the side by side character in the string is assigned to the variable char . The loop continues until no characters are left.
The following example shows how to utilize concatenation and a for loop to generate an abecedarian series. Abecedarian refers to a series or list in which the elements appear in alphabetical society. For case, in Robert McCloskey's volume Brand Mode for Ducklings, the names of the ducklings are Jack, Kack, Lack, Mack, Nack, Ouack, Pack, and Dishonest. This loop outputs these names in order:
prefixes = "JKLMNOPQ" suffix = "ack" for letter of the alphabet in prefixes : print letter + suffix The output of this program is:
Jack Kack Lack Mack Nack Oack Pack Qack Of course, that'southward non quite right because Ouack and Quack are misspelled. You lot'll fix this as an do below.
seven.4. Cord slices¶
A substring of a string is called a slice. Selecting a slice is similar to selecting a character:
>>> south = "Peter, Paul, and Mary" >>> impress s [ 0 : 5 ] Peter >>> impress s [ 7 : xi ] Paul >>> print s [ 17 : 21 ] Mary The operator [northward:m] returns the part of the string from the n-eth character to the yard-eth character, including the showtime but excluding the last. This behavior is counterintuitive; it makes more sense if you imagine the indices pointing between the characters, equally in the following diagram:
If you omit the first alphabetize (earlier the colon), the piece starts at the get-go of the string. If you omit the 2nd alphabetize, the slice goes to the cease of the string. Thus:
>>> fruit = "banana" >>> fruit [: iii ] 'ban' >>> fruit [ iii :] 'ana' What do you think s[:] means?
7.5. Cord comparison¶
The comparison operators work on strings. To meet if ii strings are equal:
if word == "assistant" : print "Yes, we take no bananas!" Other comparison operations are useful for putting words in lexigraphical order:
if give-and-take < "banana" : print "Your word, " + word + ", comes before assistant." elif word > "banana" : print "Your discussion, " + word + ", comes after banana." else : print "Yes, we take no bananas!" This is like to the alphabetical gild you would utilize with a dictionary, except that all the majuscule letters come before all the lowercase letters. As a outcome:
Your word , Zebra , comes earlier banana . A mutual way to address this problem is to catechumen strings to a standard format, such as all lowercase, before performing the comparing. A more difficult trouble is making the plan realize that zebras are not fruit.
7.vi. Strings are immutable¶
It is tempting to utilise the [] operator on the left side of an consignment, with the intention of changing a grapheme in a string. For example:
greeting = "How-do-you-do, world!" greeting [ 0 ] = 'J' # Error! print greeting Instead of producing the output Jello, earth! , this lawmaking produces the runtime mistake TypeError: 'str' object doesn't support item assignment .
Strings are immutable, which ways you can't change an existing string. The best you lot can do is create a new cord that is a variation on the original:
greeting = "Hello, world!" new_greeting = 'J' + greeting [ ane :] impress new_greeting The solution hither is to concatenate a new first letter onto a piece of greeting . This operation has no effect on the original cord.
7.seven. The in operator¶
The in operator tests if one string is a substring of another:
>>> 'p' in 'apple tree' True >>> 'i' in 'apple' False >>> 'ap' in 'apple tree' Truthful >>> 'pa' in 'apple tree' Fake Notation that a string is a substring of itself:
>>> 'a' in 'a' Truthful >>> 'apple' in 'apple' True Combining the in operator with string concatenation using + , we can write a function that removes all the vowels from a cord:
def remove_vowels ( s ): vowels = "aeiouAEIOU" s_without_vowels = "" for letter in s : if letter non in vowels : s_without_vowels += letter of the alphabet return s_without_vowels Test this part to confirm that it does what we wanted it to exercise.
vii.viii. A notice office¶
What does the following function do?
def find ( strng , ch ): index = 0 while index < len ( strng ): if strng [ alphabetize ] == ch : return index index += ane render - 1 In a sense, find is the opposite of the [] operator. Instead of taking an index and extracting the corresponding character, information technology takes a character and finds the index where that character appears. If the character is not found, the part returns -i .
This is the starting time example we have seen of a return argument within a loop. If strng[index] == ch , the office returns immediately, breaking out of the loop prematurely.
If the character doesn't appear in the string, then the program exits the loop normally and returns -1 .
This design of computation is sometimes called a eureka traversal because equally before long as we find what we are looking for, we can cry Eureka! and stop looking.
7.9. Looping and counting¶
The post-obit program counts the number of times the letter a appears in a string, and is another example of the counter pattern introduced in Counting digits:
fruit = "assistant" count = 0 for char in fruit : if char == 'a' : count += 1 print count 7.10. Optional parameters¶
To find the locations of the second or third occurence of a character in a string, nosotros tin can modify the find part, adding a third parameter for the starting postion in the search string:
def find2 ( strng , ch , start ): alphabetize = offset while index < len ( strng ): if strng [ index ] == ch : return alphabetize index += 1 return - 1 The call find2('banana', 'a', 2) at present returns 3 , the index of the first occurance of 'a' in 'assistant' later on index two. What does find2('banana', 'n', 3) return? If you lot said, iv, in that location is a good chance you empathise how find2 works.
Better even so, we tin can combine observe and find2 using an optional parameter:
def find ( strng , ch , outset = 0 ): alphabetize = start while index < len ( strng ): if strng [ index ] == ch : render index index += 1 return - 1 The call observe('banana', 'a', two) to this version of find behaves simply like find2 , while in the call detect('banana', 'a') , start will be set to the default value of 0 .
Calculation some other optional parameter to detect makes it search both forrad and astern:
def notice ( strng , ch , start = 0 , step = 1 ): index = start while 0 <= alphabetize < len ( strng ): if strng [ index ] == ch : render index index += step return - 1 Passing in a value of len(strng)-1 for start and -1 for stride will make it search toward the kickoff of the string instead of the stop. Note that nosotros needed to check for a lower bound for alphabetize in the while loop likewise as an upper bound to accomodate this alter.
vii.11. The string module¶
The string module contains useful functions that manipulate strings. Equally usual, we have to import the module before nosotros tin use it:
To encounter what is inside it, use the dir function with the module proper name as an argument.
which will render the list of items inside the string module:
['Template', '_TemplateMetaclass', '__builtins__', '__doc__', '__file__', '__name__', '_float', '_idmap', '_idmapL', '_int', '_long', '_multimap', '_re', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'atof', 'atof_error', 'atoi', 'atoi_error', 'atol', 'atol_error', 'capitalize', 'capwords', 'center', 'count', 'digits', 'expandtabs', 'observe', 'hexdigits', 'index', 'index_error', 'join', 'joinfields', 'letters', 'ljust', 'lower', 'lowercase', 'lstrip', 'maketrans', 'octdigits', 'printable', 'punctuation', 'supercede', 'rfind', 'rindex', 'rjust', 'rsplit', 'rstrip', 'dissever', 'splitfields', 'strip', 'swapcase', 'translate', 'upper', 'capital letter', 'whitespace', 'zfill']
To find out more about an particular in this listing, we can use the type command. We need to specify the module proper noun followed by the item using dot notation.
>>> type ( string . digits ) <blazon 'str'> >>> type ( string . detect ) <type 'function'> Since string.digits is a string, we can impress it to see what it contains:
>>> print string . digits 0123456789 Non surprisingly, it contains each of the decimal digits.
string.find is a function which does much the same thing as the office we wrote. To find out more about it, we can print out its docstring, __doc__ , which contains documentation on the office:
>>> print string . notice . __doc__ find(s, sub [,start [,end]]) -> in Render the lowest index in s where substring sub is plant, such that sub is contained within s[commencement,end]. Optional arguments beginning and end are interpreted as in slice notation. Return -1 on failure. The parameters in square brackets are optional parameters. We tin can apply cord.find much as nosotros did our own find :
>>> fruit = "assistant" >>> index = string . discover ( fruit , "a" ) >>> print index 1 This example demonstrates one of the benefits of modules — they help avoid collisions between the names of built-in functions and user-defined functions. By using dot annotation we can specify which version of find nosotros desire.
Actually, string.detect is more full general than our version. it can find substrings, not just characters:
>>> cord . detect ( "banana" , "na" ) 2 Like ours, information technology takes an additional argument that specifies the index at which it should start:
>>> string . find ( "banana" , "na" , 3 ) 4 Unlike ours, its second optional parameter specifies the index at which the search should cease:
>>> string . find ( "bob" , "b" , 1 , 2 ) -i In this case, the search fails because the letter b does non announced in the alphabetize range from 1 to ii (non including 2 ).
vii.12. Character classification¶
It is often helpful to examine a character and exam whether it is upper- or lowercase, or whether information technology is a graphic symbol or a digit. The string module provides several constants that are useful for these purposes. One of these, cord.digits , we accept already seen.
The string string.lowercase contains all of the letters that the organization considers to exist lowercase. Similarly, string.capital letter contains all of the uppercase letters. Try the following and run into what you get:
impress string . lowercase print cord . uppercase print string . digits Nosotros can utilise these constants and find to classify characters. For example, if find(lowercase, ch) returns a value other than -one , then ch must exist lowercase:
def is_lower ( ch ): return string . find ( string . lowercase , ch ) != - 1 Alternatively, nosotros can take advantage of the in operator:
def is_lower ( ch ): render ch in string . lowercase As still another alternative, nosotros tin use the comparison operator:
def is_lower ( ch ): return 'a' <= ch <= 'z' If ch is between a and z, it must be a lowercase letter of the alphabet.
Another constant defined in the string module may surprise yous when you print it:
>>> print string . whitespace Whitespace characters motility the cursor without printing anything. They create the white space between visible characters (at least on white paper). The constant string.whitespace contains all the whitespace characters, including space, tab ( \t ), and newline ( \n ).
There are other useful functions in the string module, but this volume isn't intended to be a reference transmission. On the other hand, the Python Library Reference is. Along with a wealth of other documentation, information technology's bachelor from the Python website, http://www.python.org.
7.13. String formatting¶
The most concise and powerful way to format a string in Python is to use the string formatting operator, % , together with Python'south string formatting operations. To see how this works, let'southward start with a few examples:
>>> "His name is %s ." % "Arthur" 'His name is Arthur.' >>> name = "Alice" >>> historic period = 10 >>> "I am %s and I am %d years old." % ( name , age ) 'I am Alice and I am 10 years sometime.' >>> n1 = 4 >>> n2 = 5 >>> "ii**x = %d and %d * %d = %f " % ( ii ** 10 , n1 , n2 , n1 * n2 ) 'two**ten = 1024 and 4 * 5 = xx.000000' >>> The syntax for the string formatting operation looks like this:
It begins with a format which contains a sequence of characters and conversion specifications. Conversion specifications start with a % operator. Following the format string is a unmarried % and and then a sequence of values, i per conversion specification, seperated by commas and enclosed in parenthesis. The parenthesis are optional if in that location is merely a single value.
In the first example to a higher place, at that place is a single conversion specification, %south , which indicates a string. The single value, "Arthur" , maps to it, and is not enclosed in parenthesis.
In the second example, name has string value, "Alice" , and age has integer value, 10 . These map to the ii converstion specifications, %s and %d . The d in the 2nd converstion specification indicates that the value is a decimal integer.
In the third example variables n1 and n2 have integer values iv and 5 respectively. There are four converstion specifications in the format string: three %d 's and a %f . The f indicates that the value should exist represented as a floating point number. The 4 values that map to the four converstion specifications are: ii**10 , n1 , n2 , and n1 * n2 .
south , d , and f are all the conversion types nosotros volition need for this book. To come across a complete list, come across the String Formatting Operations section of the Python Library Reference.
The post-obit example illustrates the real utility of string formatting:
i = 1 print "i \t i**2 \t i**3 \t i**5 \t i**10 \t i**twenty" while i <= x : print i , ' \t ' , i ** ii , ' \t ' , i ** 3 , ' \t ' , i ** 5 , ' \t ' , i ** 10 , ' \t ' , i ** 20 i += 1 This program prints out a table of diverse powers of the numbers from i to 10. In its electric current form information technology relies on the tab grapheme ( \t ) to marshal the columns of values, but this breaks down when the values in the table go larger than the 8 character tab width:
i i**2 i**3 i**5 i**10 i**20 1 i 1 1 ane 1 2 four 8 32 1024 1048576 3 9 27 243 59049 3486784401 4 16 64 1024 1048576 1099511627776 5 25 125 3125 9765625 95367431640625 half dozen 36 216 7776 60466176 3656158440062976 7 49 343 16807 282475249 79792266297612001 8 64 512 32768 1073741824 1152921504606846976 9 81 729 59049 3486784401 12157665459056928801 x 100 1000 100000 10000000000 100000000000000000000 Ane possible solution would exist to change the tab width, merely the beginning column already has more space than it needs. The best solution would exist to set the width of each column independently. As you may have guessed by now, string formatting provides the solution:
i = ane print " %-4s%-5s%-6s%-8s%-13s%-15s " % \ ( 'i' , 'i**two' , 'i**three' , 'i**v' , 'i**10' , 'i**xx' ) while i <= 10 : print " %-4d%-5d%-6d%-8d%-13d%-15d " % ( i , i ** 2 , i ** 3 , i ** 5 , i ** ten , i ** twenty ) i += one Running this version produces the following output:
i i**2 i**3 i**5 i**10 i**20 1 one 1 1 1 1 ii 4 8 32 1024 1048576 3 9 27 243 59049 3486784401 four 16 64 1024 1048576 1099511627776 5 25 125 3125 9765625 95367431640625 6 36 216 7776 60466176 3656158440062976 7 49 343 16807 282475249 79792266297612001 eight 64 512 32768 1073741824 1152921504606846976 9 81 729 59049 3486784401 12157665459056928801 x 100 1000 100000 10000000000 100000000000000000000 The - after each % in the converstion specifications indicates left justification. The numerical values specify the minimum length, and then %-13d is a left justified number at least 13 characters wide.
7.14. Summary and Showtime Exercises¶
This chapter introduced a lot of new ideas. The following summary and set of exercises may bear witness helpful in remembering what you learned:
- indexing ( [] )
- Access a single character in a cord using its position (starting from 0). Instance: 'This'[two] evaluates to 'i' .
- length function ( len )
- Returns the number of characters in a string. Instance: len('happy') evaluates to v .
- for loop traversal ( for )
-
Traversing a string means accessing each graphic symbol in the cord, i at a time. For example, the following for loop:
for letter in 'Instance' : impress 2 * letter of the alphabet ,
evaluates to EE twenty aa mm pp ll ee
- slicing ( [:] )
- A piece is a substring of a cord. Example: 'bananas and cream'[iii:six] evaluates to ana (so does 'bananas and cream'[1:4] ).
- string comparison ( >, <, >=, <=, == )
- The comparision operators piece of work with strings, evaluating according to lexigraphical lodge. Examples: 'apple' < 'banana' evaluates to True . 'Zeta' < 'Appricot' evaluates to False . 'Zebra' <= 'aardvark' evaluates to True because all upper example letters precede lower case letters.
- in operator ( in )
- The in operator tests whether one character or cord is contained within another cord. Examples: 'heck' in "I'll exist checking for you." evaluates to True . 'cheese' in "I'll be checking for you." evaluates to False .
7.xiv.1. Offset Exercises¶
-
Write the Python interpreter's evaluation to each of the following expressions:
>>> "Strings are sequences of characters." [ v ]
>>> 'apple' in 'Pinapple'
-
Write Python code to brand each of the post-obit doctests pass:
""" >>> type(fruit) <type 'str'> >>> len(fruit) 8 >>> fruit[:3] 'ram' """
""" >>> grouping = "John, Paul, George, and Ringo" >>> grouping[12:x] 'George' >>> grouping[n:m] 'Paul' >>> group[:r] 'John' >>> group[due south:] 'Ringo' """
""" >>> len(southward) 8 >>> due south[4:6] == 'on' True """
7.15. Glossary¶
- compound information blazon
- A data blazon in which the values are fabricated up of components, or elements, that are themselves values.
- default value
- The value given to an optional parameter if no argument for it is provided in the part call.
- docstring
- A string abiding on the first line of a part or module definition (and as we volition see later, in form and method definitions as well). Docstrings provide a user-friendly way to associate documentation with lawmaking. Docstrings are also used by the doctest module for automated testing.
- dot notation
- Use of the dot operator, . , to access functions inside a module.
- immutable
- A chemical compound data type whose elements cannot exist assigned new values.
- alphabetize
- A variable or value used to select a member of an ordered set, such as a character from a string.
- optional parameter
- A parameter written in a function header with an assignment to a default value which it will receive if no corresponding argument is given for it in the function call.
- slice
- A part of a string (substring) specified past a range of indices. More generally, a subsequence of whatever sequence blazon in Python can be created using the piece operator ( sequence[get-go:stop] ).
- traverse
- To iterate through the elements of a set, performing a similar performance on each.
- whitespace
- Whatsoever of the characters that move the cursor without press visible characters. The abiding string.whitespace contains all the white-infinite characters.
7.16. Exercises¶
-
Alter:
prefixes = "JKLMNOPQ" suffix = "ack" for letter in prefixes : print letter + suffix
and then that Ouack and Quack are spelled correctly.
-
Encapsulate
fruit = "banana" count = 0 for char in fruit : if char == 'a' : count += 1 print count
in a role named count_letters , and generalize it so that it accepts the string and the letter as arguments.
-
At present rewrite the count_letters function so that instead of traversing the string, it repeatedly calls find (the version from Optional parameters), with the optional third parameter to locate new occurences of the letter being counted.
-
Which version of is_lower do you lot recall will be fastest? Can you think of other reasons also speed to prefer one version or the other?
-
Create a file named stringtools.py and put the post-obit in it:
def contrary ( s ): """ >>> reverse('happy') 'yppah' >>> reverse('Python') 'nohtyP' >>> reverse("") '' >>> reverse("P") 'P' """ if __name__ == '__main__' : import doctest doctest . testmod ()
Add a part body to reverse to make the doctests laissez passer.
-
Add mirror to stringtools.py .
def mirror ( s ): """ >>> mirror("good") 'gooddoog' >>> mirror("yep") 'yessey' >>> mirror('Python') 'PythonnohtyP' >>> mirror("") '' >>> mirror("a") 'aa' """
Write a function body for it that will go far work as indicated by the doctests.
-
Include remove_letter in stringtools.py .
def remove_letter ( letter of the alphabet , strng ): """ >>> remove_letter('a', 'apple') 'pple' >>> remove_letter('a', 'banana') 'bnn' >>> remove_letter('z', 'assistant') 'banana' >>> remove_letter('i', 'Mississippi') 'Msssspp' """
Write a function body for it that volition brand information technology work equally indicated by the doctests.
-
Finally, add bodies to each of the following functions, one at a time
def is_palindrome ( s ): """ >>> is_palindrome('abba') True >>> is_palindrome('abab') Simulated >>> is_palindrome('tenet') Truthful >>> is_palindrome('banana') False >>> is_palindrome('harbinger warts') Truthful """
def count ( sub , s ): """ >>> count('is', 'Mississippi') 2 >>> count('an', 'banana') 2 >>> count('ana', 'banana') 2 >>> count('nana', 'banana') one >>> count('nanan', 'assistant') 0 """
def remove ( sub , s ): """ >>> remove('an', 'banana') 'bana' >>> remove('cyc', 'bike') 'bile' >>> remove('iss', 'Mississippi') 'Missippi' >>> remove('egg', 'wheel') 'bicycle' """
def remove_all ( sub , southward ): """ >>> remove_all('an', 'banana') 'ba' >>> remove_all('cyc', 'cycle') 'bile' >>> remove_all('iss', 'Mississippi') 'Mippi' >>> remove_all('eggs', 'bicycle') 'bicycle' """
until all the doctests pass.
-
Try each of the following formatted string operations in a Python shell and tape the results:
- "%south %d %f" % (5, five, 5)
- "%-.2f" % 3
- "%-10.2f%-10.2f" % (7, i.0/2)
- impress " $%5.2fn $%5.2fn $%5.2f" % (3, iv.5, 11.two)
-
The following formatted strings have errors. Fix them:
- "%s %due south %s %south" % ('this', 'that', 'something')
- "%due south %south %southward" % ('yeah', 'no', 'up', 'down')
- "%d %f %f" % (3, three, 'three')
Source: https://www.openbookproject.net/thinkcs/python/english2e/ch07.html
0 Response to "Python Read Input One Letter at a Time"
Post a Comment