List comprehensions in Python - traversing and filtering a file tree
Author: OpenClipart-Vectors (pixabay.com)

List comprehensions in Python - traversing and filtering a file tree

List comprehensions in Python provide a concise way to generate a list. Comprehensions in Python are not limited to lists, but could also be applied to for example sets and dictionaries to generate set comprehensions respectively dictionary comprehensions. For data scientists, this is quite useful when extracting, filtering and manipulating data. It is also in general useful for system administration, for example when reading all file names of a directory tree into a list structure, which we will see how to do in this article.

Let us start with a simple example, in order to show the syntax:

>>> [i for i in range(1,10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

In the above example the for loop is likely familiar. The brackets notifies that a list will be generated and the for loop will loop over the numbers 1 to 9. Using the index i from this loop, a list will be created. To be honest, using a for loop to generate the list is not that difficult, but when doing some processing of the data is where the list comprehension shines. Now, the next step is to provide a list of all equal numbers in the previous sequence. To do this, we only need to apply a check at the end of the list comprehension expression:

>>> [i for i in range(1,10) if i % 2 == 0]
[2, 4, 6, 8]

Now we are getting somewhere... If we are dealing with a list of lists, the list comprehension expression becomes even more powerful, especially if we want to perform some type of computation of the data in the lists, together with a filtering of data. To show how this works, let us generate all combinations of the letters in the strings 'python' and 'rocks', where we test if the letters in latter is in the string 'blocking'.

>>> [i + j for i in 'python' for j in 'rocks' if j in 'blocking']
['po', 'pc', 'pk', 'yo', 'yc', 'yk', 'to', 'tc', 'tk', 'ho', 'hc', 'hk',
'oo', 'oc', 'ok', 'no', 'nc', 'nk']

Pretty nifty, hey? Noting the details, the latter for loop is looped for every loop of the first. The expression in the end, if j in 'blocking', can be used to test data; if True it will add that to the data sequence, if False, it will not. The first expression, i+j, is evaluated last, adding the value of that expression to the generated list.

Moving beyond toy examples, let us now apply list comprehension to the task of generating a list of filenames together with their path names, matching a certain criteria in a sub directory. For this purpose, Python os.walk() is a fitting tool. Using a list comprehension, the data generated from os.walk() may be handled elegantly. Let us assume we have the current directory structure for the folder 'mydir' (using the Linux command tree):

$ tree .
.
├── a2.txt
├── a.csv
├── a.txt
├── bdir
│   ├── b2.txt
│   ├── b.csv
│   ├── b.txt
│   └── cdir
│       ├── c2.txt
│       ├── c.csv
│       └── c.txt
└── ddir
    ├── d2.txt
    ├── d.csv
    └── d.txt

Let us assume that we in this structure want to find all files with ending '.txt'. Using os.walk() and list comprehensions, it can be done as:

>>> import os
>>> [path + '/' + name for path, adir, files in os.walk('.')
 for name in files if '.txt' in name]
['./a2.txt', './a.txt', './bdir/b.txt', './bdir/b2.txt', 
'./bdir/cdir/c.txt', './bdir/cdir/c2.txt', './ddir/d2.txt', './ddir/d.txt']

To understand what just happened, let us have a look at the output from os.walk(). The output of the command is a tuple for each traversed node (directory) of the file tree consisting of 1) pathname 2) directory and 3) a list of files in that directory. To make a complete comprehension of files, we need another for loop, looping over the filenames. Adding an if clause checking if a condition is true, we can filter and test the data if it should be included in the comprehension, in this case filtering text files. And lastly, by adding any expression in the beginning of the list comprehension, we can design each element as desired, in this case concatenating path and filename.

I hope this article gave you an insight to the power of list comprehensions and how they can be used in Python for data extraction, filtering and manipulation. Now go ahead an give it a try yourself!

要查看或添加评论,请登录

Bengt Ljungquist的更多文章

社区洞察

其他会员也浏览了