Talend Open Studio as Linux cron scheduler & command library
Those of you familiar with Linux shell scripting and cron scheduling will know so well that using those commands and variations of the those commands can be quite a daunting task. Most Linux administrators keep those commands in a library / catalog or notepad somewhere tucked away or will keep as a shell script (.sh) file stored somewhere in a directory only to be forgotten over time. I also found that our memory starts to fade over time and one will 'Google' the Linux command & syntax again when needed.
There has to be an easier way, right? Well there is! Imagine one can 'store' those commands and logic in a database, together with an easy to use user interface (UI) to control specific tasks. e.g. it's much easier to enter scheduler date and time via a UI instead of using the crontab -e command or editing files within the /etc/cron.d directory.
During Melbourne and Sydney's world famous lock downs the internet became a necessity, in addition to food and water. Now imagine all the kids are doing home schooling, video conferencing, etc. All members of the household scramble for the internet at any time. I thought, let's download my favorite YouTube videos during off peak times so it can be watched offline and the precious internet can be available to rest of the family.
The youtube-dl package is available as a Linux command line that enables YouTube video downloads. The following /etc/cron.d entry for example is used to incrementally download the latest videos from 7 News.
25 \* \* \* \*?muser youtube-dl?--playlist-end 10 --download-archive \"/home/muser/youtube-dl/archives/7-News.txt\" -i --dateafter now-1days -o "/home/muser/Backup/Videos/7-News/(filedate) \\%(title)s.\\%(ext)s" https://www.youtube.com/c/7news/videos \> /home/muser/youtube-dl/logs/7-News.log
The following cron entry ensures that we remove all the older videos, e.g we don't want to overload a mobile device storage with too many watched videos. We also want to limit the number of videos to be downloaded at given time.
25 \* \* \* \*?muser find /home/muser/Backup/Videos/7-News* -mtime +1 -exec rm {} \\\;
As we can notice those commands are quite complex and many changes are needed if we want to download a different channel or we want to keep more of the older videos or want to change the schedule. The good news is that all of this can be automated very easily by maintaining meta data in a database and dynamically manipulating the cron files.
The following UI shows a list of channels to be downloaded and some parameters associated to each of the downloads. A different Linux command and cron entry will be created based on these parameters. The UI has been developed using Joget.
The parameters are listed below and explained as follows:
Talend Open Studio has been used to implement the whole end to end creation of the Linux cron table.
领英推荐
The Talend tSSH component allows us to access Linux via ssh. Literally any Linux shell command can be executed using this component. The 'Reset cron' tssh component in the workflow above resets the cron file so we can recreate all entries according to the meta data in the database.
The next step is to read all the database records representing all the YouTube channels. We then want to apply some formatting rules, or transformation logic. The Talend tMap component is very useful for this purpose:
Firstly we want to replace the spaces within the YouTube friendly name with '-' so directory names are easier to navigate. We also want to add escape characters to the '*' so cron can successfully execute the Linux command once implemented. The Youtube-dl command will be different if we decide to download audio instead of video.
Once the transformations have been done we can now create the youtube-dl cron entry. The Talend tJava command allows us to write Java code for this purpose:
The last step in the process is to create the Linux cron file as seen below. Once again implemented via the tssh Talend component:
As with most projects I've worked on in my entire career, there are always the exceptions one need to cater for. The following tSSH component illustrates the use of the Linux 'sed' command to add escape characters. Any % needs to be replaced with \% for a Linux command to be executed as part of a cron entry.
Talend Open Studio can be used for any other use case similar to the one above, where a Linux command catalog and scheduling are required. Let me know if you have a requirement for a similar use case and we are more than happy to assist to make this happen.