Python error handling is effective and rises clear messages that most of the times lead to quick fixes. Hard to detect bugs, though, can be expected due to misunderstandings or oversights related to its variable assignment features.
On this page: |
We want to apply a mask to a galaxy catalog. Given a list of galaxies ['gal1', 'gal2', 'gal3'], a mask is a list of Boolean elements that is used to extract the relevant galaxies. For example, the following mask [False, False, True] only selects 'gal3'.
In the following class we read an initial mask selecting only a few galaxies, and define a method to widen the selection based on their magnitude (the lower the magnitude, the fainter the galaxies are). The following example is a simplified version extracted from a larger code.
class Mask:
"""Modify an initial mask by further selecting faint galaxies."""
def __init__(self, mask_init):
# Declare a class variable, initializing it to mask_init.
self.mask = mask_init
def select_faint(self, mags):
"""Set mask element to True if a galaxy is fainter than a threshold."""
# Set magnitude threshold.
mag_limit = -20.0
for i, mag in enumerate(mags):
if mag < mag_limit:
# If the magnitude is lower than the threshold, set
# the corresponding mask element to True.
self.mask[i] = True
Let's create a galaxy catalog and an initial mask. In practice, common galaxy catalogs include millions to billions of objects, hence using Numpy arrays instead of Python lists is more efficient.
import numpy as np
# Galaxy catalog.
catalog = np.array(['gal1', 'gal2', 'gal3'])
# Initial mask.
mask_init = np.array([False, False, True])
# Only 'gal3' is selected.
print(catalog[mask_init])
# => ['gal3']
We define galaxy magnitudes in some band (let's call it band A) and widen the selection also including those galaxies fainter than the threshold.
# Set galaxies magnitudes: the first galaxy is fainter than the
# threshold.
mags_a = [-21.0, -18.0, -15.0]
# Create an instance of the class given the initial mask.
mask_a = Mask(mask_init)
# Select faint galaxies.
mask_a.select_faint(mags_a)
# Mask the catalog: 'gal1' is also selected.
print(catalog[mask_a.mask])
# => ['gal1' 'gal3']
So far the behavior is as expected: 'gal3' was already selected in the initial mask, and 'gal1' is fainter than the threshold (its magnitude is less than -20.0) in the A band.
We now define galaxy magnitudes in some other band (let's call it band B), and widen the initial selection including those galaxies fainter than the threshold. This step is supposed to be independent from the selection based on the A band.
# Set galaxies magnitudes in a different band: now only the second
# galaxy is fainter than the threshold.
mags_b = [-19.0, -22.0, -15.0]
# Create a new instance of the class, different from the one
# previously created.
mask_b = Mask(mask_init)
# Select faint galaxies.
mask_b.select_faint(mags_b)
# Mask the catalog: 'gal2' is correctly selected. However, also 'gal1'
# is unexpectedly selected.
print(catalog[mask_b.mask])
# => ['gal1' 'gal2' 'gal3']
Here we notice a bug: in the B band 'gal1' is brighter than the threshold and should not be selected. The selection based on the B band is performed by creating a new instance of the Mask class. Hence, we would expect it to be independent from the selection based on A band.
This kind of bugs may be difficult to tackle down, since the code is syntactically correct and no exception is risen. Also, e.g., testing the mask in the B band alone (without applying first the one in the A band) would recover the correct behavior. Automatized help may come by analyzing the class with pylint, which rises a warning:
W: [...] Redefining name 'mask_init' from outer scope [...] (redefined-outer-name)
Further inspection shows that the warning is indeed valuable, since the initial mask has unexpectedly been modified.
# The initial mask changed: unwanted behavior.
print(mask_init)
# => [ True True True]
Fluent Python starts Chapter 8, dedicated to object references, with the following quote.
'You are sad,' the Knight said in an anxious tone: 'let me sing you a song to comfort you. [...] The name of the song is called "Haddocks' Eyes".'
'Oh, that's the name of the song, is it?' Alice said, trying to feel interested.
'No, you don't understand,' the Knight said, looking a little vexed. 'That's what the name is called. The name really is "The Aged Aged Man".'
In Python an object is first created and then a variable is assigned to it. Many variables can be assigned to the same object, which can be changed through one of its aliases. Hence, it is important to distinguish between variables (that are just labels) and objects themselves.
In the following example we create an object (a list) and assign two aliases, a and b, to it.
# Create a new object. The variable 'a' is a label attached to this
# object.
a = [1, 2]
# Create an alias.
b = a
# Both 'a' and 'b' point to the same object.
print(a is b)
# => True
# Modify the first element of the object pointed by the variable 'b'.
b[0] = 10
# The variable 'a' points to the same object whose first element has
# just been modified.
print(a)
# => [10, 2]
We continue the example by first creating a new object with the same content as an existing one, and then by assigning a variable, c, to the newly created object.
# Create a new object with the same content as the one pointed by 'a'.
c = list(a)
# Now 'a' and 'c' point to distinct objects.
print(a is c)
# => False
# Modify the first element of the object pointed by the variable 'a'.
c[0] = 20
# Modifying the object pointed by 'c' does not change the object
# pointed by 'a'.
print(a)
# => [10, 2]
Hence, whenever using mutable objects like lists or arrays, it must be considered whether a variable assignment should be used to create an alias pointing to an existing object, or if a variable should be instead assigned to a new object.
The issue does not show up for immutable objects such as numbers or strings. However, (relatively immutable) tuples can be more subtle as they can contain references to mutable objects.
The copy created in the example above above through the built-in constructor list(a) is a shallow copy. If the object contains nested references to other mutable objects, those are still shared among the copy and the original object. Sometimes a deep copy may be needed instead. Depending on the situation, the Python Standard Library functions copy.copy() and copy.deepcopy() may also be considered to create shallow and deep copies of mutable objects to a function.
Finally, parameters are passed to Python functions via call by sharing. Parameters inside a function become aliases of the actual parameters passed to a function. Hence, care should be taken when passing mutable objects as functions parameters (in particular, it should be avoided to set mutable objects as function default parameters, see Fluent Python for more examples).
Given the considerations above, it becomes easy to identify the bug affecting the initial example. The class variable self.mask is actually an alias for the initial array passed as a parameter when initializing the class.
# The variables mask_init and mask_a.mask are assigned to the same
# object.
print(mask_a.mask is mask_init)
# => True
This is not the expected behavior. While the object referred by the variable self.mask should be modifiable, the Mask class is not supposed to change the array referred by mask_init. To fix this we can simply create a new object with the same value as mask_init, rather than assigning an alias for the same object referred by mask_init. To create a shallow copy we use Numpy array() constructor.
import numpy as np
class Mask:
"""Modify an initial mask by further selecting faint galaxies."""
def __init__(self, mask_init):
# Create a new object with the same content as mask_init.
self.mask = np.array(mask_init)
# The rest of the class is unchanged