Introduction to Stream Editor (sed)
Following to my previous sed story :
I write this article to help you to start with sed and enjoy the speed.
Installation
For Unix-like operating system (Mac OS, Linux, Unix) sed already pre-installed and for Windows user, there are 4 options for you : Windows Subsystem for Linux (WSL), Virtual Machine like virtualbox, Git-Bash, and Cygwin. For simplicity, I would recommend Cygwin if you only want to run Unix tools in windows cmd terminal or Git-Bash if you also need to use git as version control.
Syntax
sed [OPTIONS] [COMMAND] [INPUT FILE]
[INPUT FILE] is the text file that we want to edit, it can be csv, txt, tsv, html, or any other text based file. The output will be print on screen by default, if you want to save the output into other file you can add > [OUTPUT FILE] at the end
sed [OPTIONS] [COMMAND] [INPUT FILE] > [OUTPUT FILE]
If you want to modify the existing INPUT FILE you can save the changes back into INPUT FILE by add -i (in-place) option.
sed -i [COMMAND] [INPUT FILE]
You can find very comprehensive documentation of sed, but here I just want to share my favorite command :
sed [OPTIONS] s/OLD/NEW/g [INPUT FILE]
"s" is command for substitute to find OLD pattern/string and replace with NEW, "g" is global (optional) to replace all OLD to NEW, without g only the first OLD will be replaced by NEW. For advance searching/pattern recognition you can use Regular Expression
Examples
As example (in Windows cmd) we use SampleData.csv with the content print into screen with windows cmd "type" command:
C:\Users\setya\Documents\Demo>type SampleData.csv
RecordNo,Data
Record1,Data1
Record2,Data2
Record3,Data3
Record4,Data4
Record5,Data5
With /g (global) flag the command will executed for every lines
C:\Users\setya\Documents\Demo> sed "s/Data/NewData/g" SampleData.csv
RecordNo,NewData
Record1,NewData1
Record2,NewData2
Record3,NewData3
Record4,NewData4
Record5,NewData5
Without /g the command only executed once on the first occurance.
C:\Users\setya\Documents\Demo> sed "s/Data/NewData/" SampleData.csv
RecordNo,NewData
Record1,Data1
Record2,Data2
Record3,Data3
Record4,Data4
Record5,Data5
Please remember that all command need to put inside (" ") double quote for windows (can be single quote for linux/MacOS) and the output so far only print on screen without saved anywhere. Saving can be done as follow :
C:\Users\setya\Documents\Demo> sed "s/Data/NewData/g" SampleData.csv > NewFile.csv
C:\Users\setya\Documents\Demo>type NewFile.csv
RecordNo,NewData
Record1,NewData1
Record2,NewData2
Record3,NewData3
Record4,NewData4
Record5,NewData5
C:\Users\setya\Documents\Demo>type SampleData.csv
RecordNo,Data
Record1,Data1
Record2,Data2
Record3,Data3
Record4,Data4
Record5,Data5
On above example we save the changes into NewFile.csv and we can see the modified content inside NewFile.csv and after execution we can observe the original SampleData.csv was not changed.
领英推荐
C:\Users\setya\Documents\Demo> sed -i "s/Data/NewData/g" SampleData.csv
C:\Users\setya\Documents\Demo>type SampleData.csv
RecordNo,NewData
Record1,NewData1
Record2,NewData2
Record3,NewData3
Record4,NewData4
Record5,NewData5
On above example we use -i option in and instead of print the output on the screen sed save the changes by overwrite original file and we can observed the changes in SampleData.csv as shows in the last command line
Specific Occurrence
By default sed work on INPUT FILE line by line, so we can specify which line that we wan to apply sed by provide (1) Line Number or (2) Matched Pattern before the COMMAND.
C:\Users\setya\Documents\Demo>type SampleData.csv
RecordNo,NewData
Record1,NewData1
Record2,NewData2
Record3,NewData3
Record4,NewData4
Record5,NewData5
C:\Users\setya\Documents\Demo>sed "s/NewData/Replaced/g" SampleData.csv
RecordNo,Replaced
Record1,Replaced1
Record2,Replaced2
Record3,Replaced3
Record4,Replaced4
Record5,Replaced5
On above example the sed replace all NewData with Replaced from previous example file.
C:\Users\setya\Documents\Demo>sed "2 s/NewData/Replaced/g" SampleData.csv
RecordNo,NewData
Record1,Replaced1
Record2,NewData2
Record3,NewData3
Record4,NewData4
Record5,NewData5
On above example we put 2 as specific line number (the second line) for COMMAND to be executed.
C:\Users\setya\Documents\Demo>sed "/Record2/ s/NewData/Replaced/g" SampleData.csv
RecordNo,NewData
Record1,NewData1
Record2,Replaced2
Record3,NewData3
Record4,NewData4
Record5,NewData5
On above example we put /Record2/ (inside / /) as search pattern and sed will only execute COMAND where Record2 pattern found (line no 3)
We can also combine multiple line command
C:\Users\setya\Documents\Demo>sed "2,4 {s/NewData/Replaced/g}" SampleData.csv
RecordNo,NewData
Record1,Replaced1
Record2,Replaced2
Record3,Replaced3
Record4,NewData4
Record5,NewData5
Adding 2,4 in the first line means execute command from line no 2 to line no 4
C:\Users\setya\Documents\Demo>sed "/[24]/ {s/NewData/Replaced/g}" SampleData.csv
RecordNo,NewData
Record1,NewData1
Record2,Replaced2
Record3,NewData3
Record4,Replaced4
Record5,NewData5
Adding /[2,4]/ is put regular expression (inside / /) [2,4] means (2 or 4) so the command will be executed for every line contain character 2 or character 4
Multiple File
We can execute sed command for multiple INPUT FILE using windows Glob Pattern
C:\Users\setya\Documents\Demo>dir/w File*.csv
?Volume in drive C is Windows
?Volume Serial Number is 9CB9-9E65
?Directory of C:\Users\setya\Documents\Demo
File1.csv? ? ?File10.csv? ? File100.csv? ?File11.csv? ? File12.csv? ? File13.csv
File14.csv? ? File15.csv? ? File16.csv? ? File17.csv? ? File18.csv? ? File19.csv
File2.csv? ? ?File20.csv? ? File21.csv? ? File22.csv? ? File23.csv? ? File24.csv
File25.csv? ? File26.csv? ? File27.csv? ? File28.csv? ? File29.csv? ? File3.csv
File30.csv? ? File31.csv? ? File32.csv? ? File33.csv? ? File34.csv? ? File35.csv
File36.csv? ? File37.csv? ? File38.csv? ? File39.csv? ? File4.csv? ? ?File40.csv
File41.csv? ? File42.csv? ? File43.csv? ? File44.csv? ? File45.csv? ? File46.csv
File47.csv? ? File48.csv? ? File49.csv? ? File5.csv? ? ?File50.csv? ? File51.csv
File52.csv? ? File53.csv? ? File54.csv? ? File55.csv? ? File56.csv? ? File57.csv
File58.csv? ? File59.csv? ? File6.csv? ? ?File60.csv? ? File61.csv? ? File62.csv
File63.csv? ? File64.csv? ? File65.csv? ? File66.csv? ? File67.csv? ? File68.csv
File69.csv? ? File7.csv? ? ?File70.csv? ? File71.csv? ? File72.csv? ? File73.csv
File74.csv? ? File75.csv? ? File76.csv? ? File77.csv? ? File78.csv? ? File79.csv
File8.csv? ? ?File80.csv? ? File81.csv? ? File82.csv? ? File83.csv? ? File84.csv
File85.csv? ? File86.csv? ? File87.csv? ? File88.csv? ? File89.csv? ? File9.csv
File90.csv? ? File91.csv? ? File92.csv? ? File93.csv? ? File94.csv? ? File95.csv
File96.csv? ? File97.csv? ? File98.csv? ? File99.csv
? ? ? ? ? ? ?100 File(s)? ? ? ? ?10,200 bytes
? ? ? ? ? ? ? ?0 Dir(s)? 177,604,562,944 bytes free
C:\Users\setya\Documents\Demo>sed "s/NewData/Replaced/g" File*.csv
On above example the sed command will be executed to all Files that meet glob pattern File*.cvs (File1.csv - File100.csv)
For further exploration you can refer to :