If you type ls *.c at the prompt of the Linux Shell it will interpret this command and will print on the screen all files in the current directory that ends with the .c extension, that kind of behavior can be intimidating if you don't really know what's being pulled and triggered in the background, so here's a guided tour through the secrets behind the scenes.
The definition of Shell is the outermost layer that wraps and protects the Kernell.
The definition of Kernell is the software that interacts with the hardware. So we're getting very deep into the structure that allows us to interact with computers.
Back in 1971, Ken Thompson wrote the first known Shell, then the Bourne Shell appears in 1979, and later development creates the BASH that stands for Bourne-Again Shell. There's a lot of work behind that interface to perform the tasks we're going to talk about.
As with almost any program, there are three main stages involving the use of the Shell:
- Initialize: When the program starts it reads the configuration files that will determine the behavior of the Shell. One of these files is /etc/profile that instructs the Shell about the user and system configuration and variables as the PATH.
- Interpret: The program reads the commands and through a process of decomposition determines the actions to be taken accordingly with the instructions received, we will get into it below to explain the steps involved.
- Terminate: The program determines the end of the process when finding the proper instructions and frees memory and other resources taken to complete the task.
The Shell is capable of receive instructions through the command line and from a file, of course, the file must have some kind of format that will instruct the Shell that the content of the file should be read as instructions and not as usual just text content. When receiving the instructions from the command line the behavior of the Shell is known as interactive mode, and when the instructions are taken from a file the behavior is known as a non-interactive mode.
Understand this difference in the behavior of the Shell is important because the gears involved in the process will show a distinctive pattern, the interactive mode works as an infinite loop and won't end until is instructed to do so, on the other hand, the non-interactive mode will stop the execution once the command reaches the end.
The single line ls *.c is read by the interpreter and broken into smaller parts called tokens to determine the next steps in the instruction, so we'll use the space to divide this sentence into words,
- Read The "Shell" receives the whole instruction from the user via the standard input (keyboard) and loads it into memory
- Break The "Shell" use a list of defined characters to identify the different words "tokens" that compose the instructions, the first word in this sentence should be a command that the Shell knows about and subsequent tokens behave as arguments that can also have more commands and so on, in our case ls will be our command and *.c will be our argument.
- Expand Some characters are used as special keys to instruct the Shell to behave in some way, in our case the star * is one of them and is used as a wildcard to replace everything that matches some pattern. The Shell will load any file that ends with extension .c as you can imagine the star wildcard can be placed on any place of the pattern and also several times so you can use it as image.* to find all files whose name is: "image" and have any extension after the dot, or even filter using image*.jpg to find any file that starts with image followed by any characters and ends with jpg after the dot. Very useful.
- Search for alias It's a common practice that commands that are used very often receive an alias in the system to call the commands by it's nickname instead of their full name, suppose you have something like grep -c 'user' login.txt to look for coincidences inside some files and you assign the alias lfc to store the whole command, the shell will look into the list of stored alias and will try to match the tokens with your defined alias before continue, of course as seen in the example we'll have to break again the alias.
- Search for built-in Some commands are built-in, which means that the implementation of the standard behavior will be overridden by the one provided by the one in the Shell, again each token is compared to look for matches against the built-in list.
- Search in Path Lastly the Shell will try to find the command in the standard locations defined in the PATH - a list of places in the file system - if there's no match at this point the Shell will throw an error indicating that don't know the given command. In our case, the file that contains the functions we are looking for are in the folder /bin/ls that is defined in the path.
- Prompt When the execution of the commands reaches the end by error or by normal termination the prompt will be printed on the screen again to indicate that the Shell is able to receive a new command.
- Command The loop starts all over again.
As we have seen the Shell works as an interpreter between the user and the Kernell that in turn will talk to the hardware, is a very sophisticated chain of steps that allows humans to store and manipulate lots of information written in bytes ones and zeros without even know how binary language works, a process of multiple translations.
Here are some links to go deep in the understanding of this subject: