Learning Python - Day 5
???????????? ?????????? ?? ??
Experienced Full Stack Engineer | Next.JS, MERN
Objects and Types
Overview
While you can get a long way in Python with fairly shallow understanding of its underlying model, we find that even just a small understanding of its deeper structure can yield deep insight, making you more productive and helping you design better programs. In this module of Core Python: Getting Started, we'll seek to understand the Python object model. We'll see that the notion of named references to objects is key to how Python works. We'll discuss the important difference between value equality and identity equality. We'll see how argument passing and returning from functions in Python fits into the object model. We'll investigate Python's type system. We'll look at how Python uses scopes to limit access to names in a program. And we'll introduce you to a core insight for understanding Python programs, the idea that everything is an object. The topics in this module may seem abstract or even simplistic, but if you internalize these concepts, you'll find that you're able to reason about Python much more fluidly and with greater precision. We've already talked about and used variables in Python, but what exactly is a variable? What's going on when we do something as straightforward as assigning an integer to a variable? In this case, Python creates an int object with a value of 1000, an object reference with the name x, and arranges for x to refer to the into 1000 object. If we now modify the value of x with another assignment, what does not happen is a change in the value of the integer object. Integer objects in Python are immutable and cannot be changed. In fact, what happens is that Python creates a new immutable integer object with the value 500 and redirects the x reference to point to the new object. We now have no way of reaching the int 1000 object, and the Python garbage collector will reclaim it at some point. When we assign from one variable to another, we're really assigning from one object reference to another object reference, so both references now refer to the same object. If we now reassign x, we have x referring to an int 3000 object and y referring to a separate int 500. There is no work for the garbage collector to do, because all objects are reachable from live references. Let's dig a little deeper using the built?in id function. Id returns an integer identifier that is unique and constant for the lifetime of an object. Let's rerun the previous experiment using id. First we'll assign a to the integer 496 and check its id. Then we'll assign b to 1729, and see that it has a different id. If we now make b refer to the same object as a, we'll see that they have the same id. Note that the id function is seldom used in production Python code. Its main use is in object model tutorials such as this one, and as a debugging tool. Much more commonly used than the id function is the is operator, which tests for equality of identity. That is, it tests whether two references refer to the same object. We've already met the is operator earlier in the course when we tested for None. Even operations which seem naturally mutating in nature are not necessarily so. Consider the augmented assignment operator. If we create an integer and then increment it by 2, we see that the id of the incremented integer is different from the original. Now let's look at that pictorially. We start with t referring to an int 5 object. Augmented assignment creates an int 2 without assigning a reference to it. It then adds the int 2 with the int 5 to create a new int 7. Finally, it assigns t to the int 7, and the remaining ints are garbage collected. Python objects show this behavior for all types. A core rule to remember is this, the assignment operator only ever binds objects to names. It never copies an object to a value. Let's look at another example using mutable objects, lists. We create a list object with three elements, binding the list object to a reference named r. We then assign r to a new reference s. When we modify the list referred to by s, by changing the middle element, we see that the r list has changed as well. This happens since the names s and r, in fact refer to the same object, which we can verify with the is operator. Let's see that again with a diagram. First we assign r to a new list. Then we assign s to r, creating a new name for the existing list. If we modify s, we also modify r, because we're modifying the same underlying object. S is r is true, because both names refer to the same object. If you want to create an actual copy of an object such as a list, other techniques must be used, which we'll look at later. It turns out that Python doesn't really have variables in the metaphorical sense of a box holding a value. It only has named references to objects, and the references behave more like labels, which allow us to retrieve objects. That said, it's still common to talk about variables in Python. We will continue to do so, secure in the knowledge that you now understand what's really going on behind the scenes. Let's contrast the behavior of the is operator with the test for value equality, or equivalence. We'll create two identical lists. First, we'll test them for equivalence with the double equals operator. Then we'll test them for identity equality with is. Here we see that p and q refer to different objects, but that the objects they refer to have the same value. Of course, an object should almost always be equivalent to itself. Here's how that looks pictorially. We have two separate list objects, each with a single reference to it. The values contained in the lists are the same, that is, they are equivalent or value equal, even though they have different identities. Value equality and identity are fundamentally different notions of equality, and it's important to keep them separate in your mind. It's also worth noting that value comparison is something that is defined programmatically. When you define types, you can control how that class determines value equality. In contrast, identity comparison is defined by the language, and you can't change that behavior.
Passing Arguments and Returning Values
Now let's look at how all this relates to function arguments and return values. Let's define a function at the REPL, which appends a value to a list and prints the modified list. First, we'll create a list. Then we'll make a function, modify, which appends to and prints the list. The function accepts a single formal argument named k. We then call modify, passing our list m as the actual argument. This indeed prints the modified list with four elements. But what does our list reference outside the function now refer to? The list referred to by m has been modified because it is the self same list referred to by k inside the function. When we pass an object reference to a function, we're essentially assigning from an actual argument reference, in this case m, to the formal argument reference, in this case k. As we've seen, assignment causes the reference being assigned to to refer to the same object as the reference being assigned from. This is exactly what's going on here. If you want the function to modify a copy of an object, it's the responsibility of the function to do the copying. Let's look at another instructive example. First will define a new list, then will define a function which replaces the list. Now we'll call replace with f as the argument. It prints the new list, which is much as we'd expect. However, what's the value of f after the call? F still refers to the original unmodified list. This time, the function did not modify the object that was passed in. What's going on? Well, the object reference named f was assigned to the formal argument named g, so g and f did indeed refer to the same object, just as in the previous example. However, on the first line of the function, we reassigned the reference g to point to a newly constructed list 17, 28, 45. So within the function, the reference to the original list 14, 23, 37, was overwritten, although the original list was still pointed to by the f reference outside the function. So we've seen that it's quite possible to modify the objects through function argument references, but also possible to rebind the argument reference to new values. If you wanted to change the contents of the list and have the changes seen outside the function, you could modify the contents of the list by writing a function that replaces each element of this list in place. Now we define f and pass it into replace_contents. And indeed, the contents of f have been modified by the function. Function arguments are transferred by what is called pass?by?object?reference. This means that the value of the reference is copied into the function argument, not the value of the referred to object. No objects are copied. The return statement uses the same pass?by?object?reference semantics as function arguments. We can demonstrate this by writing a simple function that just returns it's only argument. Create an object such as a list and pass it through this simple function. We see that it returns the very same object we passed in, showing that no copies of the list were made.
Function Arguments
Now that we understand the distinction between object references and objects, we'll look at some more capabilities of function arguments. The formal function arguments specified when a function is defined with the def keyword are a comma?separated list of the argument names. These arguments can be made optional by providing default values. Consider a function which prints a simple banner to the console. This function takes two arguments, the second of which is provided with a default value, in this case a hyphen, in a literal string. Since we've given it this default value, callers can choose whether they want to pass their own value for border or use the default. Note that when we define functions using default arguments, the parameters with default arguments must come after those without defaults. Otherwise, we will get a syntax error. Within the body of the function. We multiply our border string by the length of the message string. This shows how we can determine the number of items in a Python collection using the built?in lend function. Secondly, it shows how multiplying a string, in this case the single?character string border, by an integer results in a new string containing the original string repeated a number of times. We use that feature here to make a string equal in length to our message. We then print the full width border, the message, and the border again. When we call our banner function, we don't need to supply the border string because we provided a default value. We can see that the default border of hyphens has been created. However, if we do provide an optional argument, it's used. In production code, this function call is not particularly self documenting. We can improve that situation by naming the border argument at the call site. In this case, the message string is called a positional argument and the border string a keyword argument. The actual positional arguments are matched up in sequence with the formal arguments, that is by position, whereas the keyword arguments are matched by name. If we use keyword arguments for both of our parameters, we have the freedom to supply them in any order. Remember though that all keyword arguments must be specified after the positional arguments.
It's crucial to have an appreciation of exactly when the expression provided as a default value is evaluated. This will help you avoid a common pitfall, which frequently ensnares newcomers to Python. Let's examine this question closely using the Python standard library time module. We can easily get the time as a readable string by using the ctime function of the time module. Let's write a function, which uses a value retrieved from ctime as a default argument value. So far, so good. But notice what happens when you call show_default again a few seconds later, and again. The displayed time never progresses. Recall how we said that def is a statement that, when executed, binds a function definition to a function name? Well, the default argument expressions are evaluated only once when the def statement is executed. Normally, when the default is a simple, immutable constant, such as an integer or a string, this causes no problems. But it can be a confusing trap for the unwary that usually shows up in the form of using mutable collections as argument defaults. Let's take a closer look. Consider this function, which uses an empty list as a default argument. It accepts a menu, which will be a list of strings, appends the item spam to the list, and returns the modified menu. Let's create a simple breakfast of bacon and eggs. Naturally, we'll want to add Spam to it. We'll, do something similar for our lunch of baked beans. Nothing unexpected so far. But look what happens when you rely on the default argument by not passing an existing menu. When we append spam to the default value of the empty menu, we get just spam. Let's do that again. When we exercise the default argument value a second time, we get two spams, and three and four. What's happening here is that the empty list used for the default argument is created exactly once, when the def statement is executed. The first time we fall back on the default, this list has spam added to it. When we use the default a second time, the list still contains that item, and a second instance of spam is added to it, making two, ad infinitum, or perhaps ad nauseum would be more appropriate. The solution to this is straightforward, but perhaps not obvious. Always use immutable objects such as integers or strings for default values.
Following this advice, we can solve this particular case by using the immutable None object as a sentinel. Now our function needs to check if menu is none and provide a newly constructed empty list, if so. The rest of the function behaves as before. Now, if we call the function repeatedly with no arguments, we get the same menu, one order of spam each time.
Python's Type System
Programming languages can be distinguished by several characteristics, but one of the most important is the nature of their type system. Python could be characterized as having a dynamic and strong type system. Let's investigate what that means. Dynamic typing means the type of an object reference isn't resolved until the program is running, and needn't be specified up front when the program is written. Take a look at this simple function for adding two objects. Nowhere in this definition do we mention any types. We can use add with integers, floats, strings, or indeed any type for which the addition operator has been defined. These examples illustrate the dynamism of the type system. The two arguments, a and b, of the add function can reference any type of object. The strength of the type system can be demonstrated by attempting to add types for which addition has not been defined, such as strings and floats. This produces a type error, because python will not, in general, perform implicit conversions between object types, or otherwise attempt to coerce one type to another. The exception to this rule is the conversion of if statement and while loop predicates to bool.
Scopes
As we've seen, no type declarations are necessary in Python, and variables are essentially just untyped name bindings to objects. As such, they can be rebound or reassigned as often as necessary, even to objects of different types. But when we bind the name to an object, where is that binding stored? To answer that question, we must look at scopes and scoping rules in Python. There are four types of scope in Python arranged in the hierarchy. Each scope is a context in which names are stored and in which they could be looked up. The four scopes, from narrowest to broadest, are local, names defined inside the current function; enclosing, names defined inside any and all enclosing functions, this scope isn't important for the contents of this Python fundamentals course; global, names defined at the top level of a module, each module brings with it a new global scope; built?in, names built into the Python language through the special built?ins module. Together, these scopes comprise the LEGB rule. Names are looked up in the narrowest relevant context. It's important to note that scopes in Python do not correspond to the source code blocks as demarcated by indentation. For loops and the like do not introduce new nested scopes. Consider our words.py module. It contains the following global names, main bound by deaf main, sys bound by import sys, dunder name provided by the Python runtime, urlopen bound by from urllib.request import urlopen, fetch_words bound by def fetch_words, print_items bound by def print_items. Module scope name bindings are typically introduced by import statements and function or class definitions. It's possible to use other objects at module scope, and this is typically used for constants, although it can be used for variables. Within the fetch_words function, we have six local names, word bound by the inner for loop, line_words bound by assignment, line bound by the outer for loop, story_words bound by assignment, url bound by the formal function argument, and story bound by assignment. Each of these is brought into existence at first use and continues to live within the function scope until the function complete, at which point, the references will be destroyed. Very occasionally, we need to rebind a global name, that is, one defined at module scope, from within a function. Consider the following simple module, it initialize is count to 0 at module scope. The show_count function simply prints the value of count, and set_count binds the name count to a new value. When show_count is called, Python looks up the count name in the local namespace, doesn't find it, so it looks it up in the next most outer namespace, in this case, the global module namespace, where it finds the name_count and prints the referred to object. Now we call set_count with a new value and show_count again. You might be surprised that show_count displays 0 after the call to set_count 5, so let's work through what's happening. When we call set_count, the assignment, count = c, binds the object referred to by the formal argument, c, to a new name, count, in the innermost namespace context, which is the scope of the current function. No look?up is performed for the global_count at module scope. We've created a new variable which shadows and thereby prevents access to the global of the same name. To avoid this situation, we need to instruct Python to consider use of the count name in the set_count function to resolve to the count in the module namespace. We can do this by using the global keyword. Let's modify set_count to do so. The additional call to show_count still behaves as expected. Calling set_count, however, does now modify the count preference at module scope. This is the behavior we want.
领英推荐
Everything is an Object
Let's go back to our words module and experiment with it further at the REPL. On this occasion, we'll import just the module. The import statement binds a module object to the name words in the current name space. We can determine the type of any object by using the type built in function. If we want to see the attributes of an object, we can use the dir built in function in a Python interactive session to introspect it. The dir function returns assorted list of the module attributes, including the ones we defined, such as the function fetch words, any imported names, such as sys and urlopen, and various special attributes delimited by double underscores, such as dunder name and dunder doc, which reveal the inner workings of Python. We can use the type function on any of these attributes to learn more about them. For instance, we could see that fetch words is a function object. We can in turn call dir on the function to reveal its attributes. We see that function objects have many special attributes to do with how Python functions are implemented behind the scenes. For now, we'll just look at a couple of simple attributes. As you might expect, words dot fetch_words dot dunder name is the name of the function object as a string. Likewise, words dot fetch_words dot dunder doc is the doc string we provided for the function. This gives us some clues as to how the built in help function might be implemented.
Built-in Collections
Python comes with a powerful suite of built?in collection types, several of which you've seen already in earlier modules. To be truly fluent in Python, you need to be familiar with all of these types and how to use them. So in this module, we'll take another deeper look at the collection types you already know, str, list, and dict. We'll introduce you to some new collection types. First, we'll look at tuple, an immutable sequence of objects. Then, we'll cover range, which represents arithmetic progressions of integers. Finally, we'll see set, immutable collection of unique, immutable objects. We'll round off with an overview of the protocols that unite these collections, which allow them to be used in consistent and predictable ways. First up is tuple.
Tuples
Tuples in Python are immutable sequences of arbitrary objects. Once created, the objects within them cannot be replaced or removed, and new elements cannot be added. Tuples have a similar syntax to lists, except that they are delimited by parentheses rather than square brackets. Here's a literal tuple containing a string, a float, and an integer. We can access the elements of a tuple by 0?based index using square brackets. We can determine the number of elements in the tuple using the built?in len function, and we can iterate over it using a for loop. We can concatenate tuples using the plus operator and repeat using the multiplication operator. Since tuples can contain any object, it's perfectly possible to have nested tuples. We use repeated application of the indexing operator to get to the inner elements of such nested collections. Sometimes a single element tuple is required. To write this, we can't just use a simple number in parentheses. This is because Python parses that as an integer enclosed in the precedence controlling parentheses of a math expression. To create a single element tuple, we make use of the trailing comma separator, which you'll recall, we're allowed to use when specifying literal tuples, lists, and dictionaries. A single element with a trailing comma is parsed as a single element tuple. This leaves us with the problem of how to specify an empty tuple. In actuality, the answer is simple. We just used empty parentheses. In many cases, the parentheses of literal tuples may be omitted. This feature is often used when returning multiple values from a function. Here we make a function to return the minimum and maximum values of a sequence, the hard work being done by two built?in functions, min and max. Returning multiple values as a tuple is often used in conjunction with a wonderful feature of Python called tuple unpacking. Tuple unpacking is a destructuring operation, which allows us to unpack data structures into named references. For example, we can assign the result of our minmax function to two new references like this. When we print the two objects, we see that the references have indeed been unpacked from the tuples returned from the function. Unpacking also works with nested tuples. Here we assigned from a triply nested tuple of integers to a triply nested tuple of references. As before, we can see that each of the references has been assigned to the corresponding value from the original tuple. This support for unpacking leads to the beautiful Python idiom for swapping two or more variables. First, we'll create two references, a and b, referring to the strings jelly and bean, respectively. Then we use the form a, b = b, a. This first packs a and b into a tuple on the right side of the assignment. It then unpacks the tuple on the left, reusing the names a and b. If we examine a and b, we can see that they have been swapped. Should you need to create a tuple from an existing collection object, such as a list, you can use the tuple constructor. You can also do this with a string, or indeed, any type over which you can iterate. Finally, as with most collection types in Python, we can test for containment using the in operator. Similarly, we can test for non?membership with the not in operator.
Strings
We covered strings at some length already, but we'll take time now to explore their capabilities in a little more depth. As with any Python sequence, we can determine the length of a string with the built?in len function. Here we see that the name of the longest train station in the UK contains a whopping 58 characters. Concatenation of strings is supported using the plus (+) operator. We can create the string, Newfoundland, by contaminating the strings New, found, and land. We can also use the related augmented assignment operator. Here starting with the string New, we incrementally add found and land. Remember that strings are immutable. So here the augmented assignment operator is binding a new string object to s on each use. The illusion of modifying s in place is achievable because s is a reference to an object, not an object itself. While the plus (+) operator is intuitive, you should prefer the join method for joining large numbers of strings because it is substantially more efficient. This is because concatenation using the addition (+) operator or it's augmented assignment version can lead to the generation of large numbers of temporaries with consequent costs for memory, allocations, and copies. Join is a method on the string class, which takes the collection of strings as an argument and produces a new string by inserting a separator between each of them. An interesting aspect of join is how the separator is specified. It is the string on which join is called. As with many parts of Python, an example is the best explanation. To join a list of HTML color code strings into a semicolon?separated string, construct a string containing semicolon and call join on it, passing in the list of color strings to be joined as an argument. We can then split the colors up again using the split method. We've already encountered this method, but this time we're going to provide it's optional argument. A widespread and fast idiom for contaminating together a collection of strings is to join using an empty string as the separator. The way may not be obvious at first. To concatenate, invoke join on empty text. Something from nothing. This use of join is often confusing to the uninitiated, but with use, the approach taken by Python will be appreciated as natural and elegant. Another very useful string method is partition, which divides a string into three sections, the part before the separator, the separator itself, and the part after the separator. Partition returns a tuple, so this is commonly used in conjunction with tuple unpacking. Here we partition the elements of a travel plan into its parts. Often we're not interested in capturing the separator value, so you might see the underscore variable name used. This is not treated in a special way by the Python language, but there's an unwritten convention that the underscore variable is for unused or dummy values. This convention is supported by many Python?aware development tools, which will suppress unused variable warnings for underscore. One of the most interesting and frequently used string methods is format. This supersedes, although does not replace, the string interpolation technique used in older versions of Python, which we do not teach here. The format method can be usefully called on any string containing so?called replacement fields, which are surrounded by curly braces. The objects provided as arguments to format are converted to strings and used to populate these fields. Here's an example where the arguments to format the string Jim and the integer 32 are inserted into the format string. The field names, in this case 0 and 1, are matched up with the positional arguments to format, and each argument is converted to a string. The field name may be used more than once. Here we use the first argument to format twice. However, if the field names are used exactly once and in the same order as the arguments, the field number can be omitted. If keyword arguments are supplied to format, then named fields can be used instead of ordinals. Here the keywords latitude and longitude are inserted into the corresponding named replacement fields. It's possible to index into sequences using square brackets inside the replacement field. Here we index into a tuple in the replacement fields. You can even access object attributes. Here we pass the whole math module to format, using a keyword argument, remember, modules are objects, then access two of its attributes from within the replacement fields. Format strings also give us a lot of control over field alignment and floating point formatting. Here are the same values with the constants displayed using only three decimal places. While the format method we've just covered is quite powerful and is generally preferable to its predecessors, it could be quite verbose, even for relatively simple cases. Consider this example. We assigned 4 times 20 to the name value. We then interpolate value into a string with format, using the keyword argument matching feature. Here we have to mention the name value three times. Of course, this example could be made shorter by removing value from the brackets in the string and not using keyword arguments to format. But in larger, more complex interpolations, we would want to keep those elements in place for readability and maintainability. To address this, PEP 498 from which this example is directly drawn, introduces a new string formatting approach called literal string interpolation or, more commonly, f?strings. F?strings are available in Python 3.6 and later, and in the words of PEP 498, they provide a way to embed expressions inside literal strings using a minimal syntax. An f?string is like a normal string literal, except that it is prefixed with the letter f. Inside the string literal, Python expressions can be embedded inside curly braces, and the results of these expressions will be inserted into the string at runtime. Let's rework our previous example using f?strings. We again assign 4 times 20 to the name value. We then use an f?string to insert value into a string. Here instead of needing to pass value into a method, the f?string simply evaluates it as a normal Python expression, inserting the result into the resulting string. Because f?strings allow you to use any Python expression, you're not limited to using simple named references. You can, for example, call functions. First, let's import the datetime module. Now we use an f?string to report the current time calling datetime.datetime.now to get the time, then formatting it with isoformat. We can rewrite the math constants example from the previous section by simply accessing math.pi and math.e from within the f?string. This then lets us demonstrate that, like format, f?strings also support floating point formatting. To print these constants with three places of precision, we can put a colon after the expression in the f?string followed by the format specifier. These are the essentials of f?strings, and this may be all that you ever need to use. There's quite a bit more to know about them, though, and we'll cover f?strings in greater depth in later courses in the core Python series. We recommend you spend some time familiarizing yourself with the other string methods. Remember, you can find out what they are by simply passing str to help.
Ranges
Let's move on and look at range, which really is a collection rather than a container. A range is a type of sequence used for representing an arithmetic progression of integers. Ranges are created by calls to the range constructor, and there is no literal form. Most typically, we supply only the stop value, and Python defaults to a starting value of 0. Ranges are sometimes used to create consecutive integers for use as loop counters. Note that the stop value supplied to range is 1 past the end of the sequence, which is why the previous loop didn't print 5. We can also supply a starting value if we wish by passing two arguments to range. Wrapping this in a call to the list constructor is a handy way to force production of each item. This so?called half open range convention, with the stop value not being included in the sequence, may seem strange at first, but it actually makes a lot of sense if you're dealing with consecutive ranges because the end specified by one range is the start of the next one. Range also supports a step argument. Here we count from 0 to 9 by 2's. Note that in order to use it, you must supply all three arguments. Range is curious in that it determines what its arguments mean by counting them. Providing only one argument means the argument is a stop value, two arguments are start and stop, and three arguments are start, stop, and step. Python range works this way so the first argument, start, can be made optional, something which isn't normally possible. Furthermore, range doesn't support keyword arguments. You might almost describe it as un?Pythonic. At this point, we're going to show you another example of poorly styled code, except this time it's one you can, and should, avoid. Here's a poor way to print the elements in a list, by constructing a range over the length of the list and then indexing into the list on each iteration. Although this works, it's most definitely un?Pythonic. Instead, always prefer to use iteration over objects themselves. If, for some reason, you need a counter, you should use the built?in enumerate function, which returns an iterable series of pairs, each pair of being a tuple. The first element of the pair is the index of the current item and the second element of the pair is the item itself. Here we construct a list, passing it to enumerate and iterate over the result, giving us the elements of the list with the corresponding positions in the list. Even better, we can use tuple unpacking to avoid having to directly deal with the tuple.