Evolution of 'sort' in Python and the Role of 'functools.cmp_to_key'
Adarsh Divakaran
Co-founder at Digievo Labs | Building LinkHQ – A Super-app for Creators | Python Conference Speaker
In Python,?the?`sort`?method and the?`sorted`?callable are commonly used for sorting operations.?`sort`?is a list method that modifies the list in place,?whereas?`sorted`?takes an iterable as its first argument and returns a sorted list containing the iterable's elements.
Both of these use the?Timsort algorithm?under the hood to perform the sorting operation.
In this article,?we will explore the evolution of sorting in Python from Python 2 to 3 and look into the?`cmp_to_key`?function from?functools.
Background
Below are the type annotations for?`sort`?and?`sorted`?from the?Python typeshed library.
@overload
def sorted(
__iterable: Iterable[SupportsRichComparisonT],
*,
key: None = None,
reverse: bool = False
) -> list[SupportsRichComparisonT]:
...
@overload
def sorted(
__iterable: Iterable[_T],
*,
key: Callable[[_T], SupportsRichComparison],
reverse: bool = False
) -> list[_T]:
...
class list(MutableSequence[_T]):
@overload
def sort(
self: list[SupportsRichComparisonT], *, key: None = None, reverse: bool = False
) -> None:
...
@overload
def sort(
self, *, key: Callable[[_T], SupportsRichComparison], reverse: bool = False
) -> None:
...
Both?sort?and?sorted?accept an optional keyword-only argument?key.?It should be a callable and, if supplied,?will be used to calculate the comparison key.
From Python docs:
key?specifies a function of one argument that is used to extract a comparison key from each list element?( for example,?key=str.lower).?The key corresponding to each item in the list is calculated once and then used for the entire sorting process.
Now let's take a look at the?Python 2 docs for the sorted built-in:
sorted(iterable[, cmp[, key[, reverse]]])
Return a new sorted list from the items in iterable.
The optional arguments cmp, key, and reverse have the same meaning as those for the list.sort() method (described in section Mutable Sequence Types).
cmp specifies a custom comparison function of two arguments (iterable elements),
which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument.
...
There is an additional argument?cmp?compared to what we have seen in Python 3.
The docs also mention:
In general,?the key and reverse conversion processes are much faster than specifying an equivalent cmp function.?This is because cmp is called multiple times for each list element while key and reverse touch each element only once.?Use functools.cmp_to_key()?to convert an old-style cmp function to a key function.
The?cmp?argument was removed in Python 3, and the?cmp_to_key helper?to convert comparison functions to new-style key functions has been added to the functools module.
领英推荐
Usage in the Wild
If we have a comparison function?(which takes two arguments and returns their comparison result),?this can be converted to a sort key by using?cmp_to_key?function?provided in the?functools?module.
The below example from the?pretty_midi,?a MIDI processing library, illustrates the use of?cmp_to_key.
class PrettyMIDI(object):
"""A container for MIDI data in an easily-manipulable format."""
...
def write(self, filename):
"""Write the MIDI data out to a .mid file."""
def event_compare(event1, event2): # <--- (1)
"""Compares two events for sorting.
Events are sorted by tick time ascending. Events with the same tick
time ares sorted by event type. Some events are sorted by
additional values. For example, Note On events are sorted by pitch
then velocity, ensuring that a Note Off (Note On with velocity 0)
will never follow a Note On with the same pitch.
Parameters
----------
event1, event2 : mido.Message
Two events to be compared.
"""
secondary_sort = {
"set_tempo": lambda e: (1 * 256 * 256),
"time_signature": lambda e: (2 * 256 * 256),
"key_signature": lambda e: (3 * 256 * 256),
"lyrics": lambda e: (4 * 256 * 256),
"text_events": lambda e: (5 * 256 * 256),
"program_change": lambda e: (6 * 256 * 256),
"pitchwheel": lambda e: ((7 * 256 * 256) + e.pitch),
"control_change": lambda e: (
(8 * 256 * 256) + (e.control * 256) + e.value
),
"note_off": lambda e: ((9 * 256 * 256) + (e.note * 256)),
"note_on": lambda e: ((10 * 256 * 256) + (e.note * 256) + e.velocity),
"end_of_track": lambda e: (11 * 256 * 256),
}
# If the events have the same tick, and both events have types
# which appear in the secondary_sort dictionary, use the dictionary
# to determine their ordering.
if (
event1.time == event2.time
and event1.type in secondary_sort
and event2.type in secondary_sort
):
return (secondary_sort[event1.type](event1) -
secondary_sort[event2.type](event2))
# Otherwise, just return the difference of their ticks.
return event1.time - event2.time
# Create track 0 with timing information
timing_track = mido.MidiTrack()
... # Code processing the timing_track
# Sort the (absolute-tick-timed) events.
timing_track.sort(key=functools.cmp_to_key(event_compare)) # <--- (2)
...
The?`event_compare`?function?(pointed out in comment?(1))?provides a specific comparison logic for MIDI events,?ensuring they are sorted correctly by their tick time and type,?along with additional criteria for certain event types.?Then,?functools.cmp_to_key?is used to convert this comparison function into a key function?(pointed out in comment?(2)).
Should You Use It
As?`cmp_to_key`?was aimed to help transition from Python 2 style?cmp?functions to?key?functions,?its usage is not very common in newer projects.?However,?it can come in handy when we already have a comparison function defined in our code?to compare two objects and need to sort an iterable of these objects.?In such a case,?cmp_to_key will allow us to reuse our comparison logic?(defined in our comparison function)?without defining a new key function.
Behind the Scenes
Below is the?source code of?cmp_to_key?from the functools module
def cmp_to_key(mycmp):
"""Convert a cmp= function into a key= function"""
class K(object):
__slots__ = ['obj']
def __init__(self, obj):
self.obj = obj
def __lt__(self, other):
return mycmp(self.obj, other.obj) < 0
def __gt__(self, other):
return mycmp(self.obj, other.obj) > 0
def __eq__(self, other):
return mycmp(self.obj, other.obj) == 0
def __le__(self, other):
return mycmp(self.obj, other.obj) <= 0
def __ge__(self, other):
return mycmp(self.obj, other.obj) >= 0
__hash__ = None
return K
`cmp_to_key`?intelligently maps the original comparison function to a key function suitable for sorting?by implementing all the comparison dunders.
It returns a new callable?(class K)?with all the comparison dunders, which will be called during the sorting process to calculate the sort key.?Since the sorting process compares each of the calculated keys,?the comparison dunders will get invoked,?and they will return based on the result from the original comparison function.
For example,?during sorting,?when?`K object`?<?`another K object`?needs to be checked,?Python will actually call the?`lt`?dunder method on K.?This method simply calls the original cmp function to compare the?obj?attributes of the two K instances.
I share interesting Python snippets????like this from open-source projects illustrating Python language features in my newsletter,?"Python in the Wild".
Ingenieria Industrial
1 年Thanks for sharing