Substitution Ciphers
Ahlem Marzouk
Big Data Developper | Data Consultant || Certified PL-300: Power BI: Data Analyst Associate
In order to encrypt a text sequence, we use substitution ciphers that aim to replace characters by others. This technique helps to create a map that relates each letter to a specific key to get an encoded text.
There are many ways to perform this encryption and in what follows I will detail some algorithms describing three methods of encrypting:
In this article we have three steps to apply in each method: First we will create the model that create a dictionary containing each letter of the alphabet and its code (or value) using one of the three methods listed above.
Second I will create an algorithm that performs the encryption of a text using the method mentioned. Third get a text from scraping a web site and test the code on it.
Output:
{' ': ' ',
'a': 'z',
'b': 'y',
'c': 'x',
'd': 'w',
'e': 'v',
'f': 'u',
'g': 't',
'h': 's',
'i': 'r',
'j': 'q',
'k': 'p',
'l': 'o',
'm': 'n',
'n': 'm',
'o': 'l',
'p': 'k',
'q': 'j',
'r': 'i',
's': 'h',
't': 'g',
'u': 'f',
'v': 'e',
'w': 'd',
'x': 'c',
'y': 'b',
'z': 'a'}
To apply this function on a text we need first to remove all the trailing whitespaces and return a copy of the text in lowercase using "rstrip().lower()" and we need, as well, a regular expression regex that is a sequence of characters to specify a search pattern in the text.
utput:
'zsovn'
Output:
'Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.\nChallenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.\nNatural language processing has its roots in the 1950s. Already in 1950, Alan Turing published an article titled "Computing Machinery and Intelligence" which proposed what is now called the Turing test as a criterion of intelligence, though at the time that was not articulated as a problem separate from artificial intelligence. The proposed test includes a task that involves the automated interpretation and generation of natural language.\nThe premise of symbolic NLP is well-summarized by John Searle\'s Chinese room experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts.\nUp to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This was due to both the steady increase in computational power (see Moore\'s law) and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing.[6]\nIn the 2010s, representation learning and deep neural network-style machine learning methods became widespread in natural language processing. That popularity was due partly to a flurry of results showing that such techniques[7][8] can achieve state-of-the-art results in many natural language tasks, e.g., in language modeling[9] and parsing.[10][11] This is increasingly important in medicine and healthcare, where NLP helps analyze notes and text in electronic health records that would otherwise be inaccessible for study when seeking to improve care.[12]\nIn the early days, many language-processing systems were designed by symbolic methods, i.e., the hand-coding of a set of rules, coupled with a dictionary lookup:[13][14] such as by writing grammars or devising heuristic rules for stemming....
Above the text that we will use and now we encrypt it using our code.
encryptPlaintext(text,alphabet_reverse,_letters)
Output:
'mzgfizo ozmtfztv kilxvhhrmt mok rh z hfyurvow lu ormtfrhgrxh xlnkfgvi hxrvmxv zmw zigrurxrzo rmgvoortvmxv xlmxvimvw drgs gsv rmgvizxgrlmh yvgdvvm xlnkfgvih zmw sfnzm ozmtfztv rm kzigrxfozi sld gl kiltizn xlnkfgvih gl kilxvhh zmw zmzobav ozitv znlfmgh lu mzgfizo ozmtfztv wzgz gsv tlzo rh z xlnkfgvi xzkzyov lu fmwvihgzmwrmt gsv xlmgvmgh lu wlxfnvmgh rmxofwrmt gsv xlmgvcgfzo mfzmxvh lu gsv ozmtfztv drgsrm gsvn gsv gvxsmloltb xzm gsvm zxxfizgvob vcgizxg rmulinzgrlm zmw rmhrtsgh xlmgzrmvw rm gsv wlxfnvmgh zh dvoo zh xzgvtlirav zmw litzmrav gsv wlxfnvmgh gsvnhvoevh xszoovmtvh rm mzgfizo ozmtfztv kilxvhhrmt uivjfvmgob rmeloev hkvvxs ivxltmrgrlm mzgfizo ozmtfztv fmwvihgzmwrmt zmw mzgfizo ozmtfztv tvmvizgrlm mzgfizo ozmtfztv kilxvhhrmt szh rgh illgh rm gsv h zoivzwb rm zozm gfirmt kfyorhsvw zm zigrxov grgovw xlnkfgrmt nzxsrmvib zmw rmgvoortvmxv dsrxs kilklhvw dszg rh mld xzoovw gsv gfirmt gvhg zh z xirgvirlm lu rmgvoortvmxv gslfts zg gsv grnv gszg dzh mlg zigrxfozgvw zh z kilyovn hvkzizgv uiln zigrurxrzo rmgvoortvmxv gsv kilklhvw gvhg rmxofwvh z gzhp gszg rmeloevh gsv zfglnzgvw rmgvikivgzgrlm zmw tvmvizgrlm lu mzgfizo ozmtfztv gsv kivnrhv lu hbnylorx mok rh dvoo hfnnziravw yb qlsm hvziov h xsrmvhv illn vckvirnvmg trevm z xloovxgrlm lu ifovh v t z xsrmvhv ksizhvyllp drgs jfvhgrlmh zmw nzgxsrmt zmhdvih gsv xlnkfgvi vnfozgvh mzgfizo ozmtfztv fmwvihgzmwrmt li lgsvi mok gzhph yb zkkobrmt gslhv ifovh gl gsv wzgz rg xlmuilmgh fk gl gsv h nlhg mzgfizo ozmtfztv kilxvhhrmt hbhgvnh dviv yzhvw lm xlnkovc hvgh lu szmw dirggvm ifovh hgzigrmt rm gsv ozgv h sldvevi gsviv dzh z ivelofgrlm rm mzgfizo ozmtfztv kilxvhhrmt drgs gsv rmgilwfxgrlm lu nzxsrmv ovzimrmt zotlirgsnh uli ozmtfztv kilxvhhrmt gsrh dzh wfv gl ylgs gsv hgvzwb rmxivzhv rm xlnkfgzgrlmzo kldvi hvv nlliv h ozd zmw gsv tizwfzo ovhhvmrmt lu gsv wlnrmzmxv lu xslnhpbzm gsvlirvh lu ormtfrhgrxh v t gizmhulinzgrlmzo tiznnzi dslhv gsvlivgrxzo fmwvikrmmrmth wrhxlfiztvw gsv hlig lu xlikfh ormtfrhgrxh gszg fmwviorvh gsv nzxsrmv ovzimrmt zkkilzxs gl ozmtfztv kilxvhhrmt rm gsv h ivkivhvmgzgrlm ovzimrmt zmw wvvk mvfizo mvgdlip hgbov nzxsrmv ovzimrmt nvgslwh yvxznv drwvhkivzw rm mzgfizo ozmtfztv kilxvhhrmt gszg klkfozirgb dzh wfv kzigob gl z uofiib lu ivhfogh hsldrmt gszg hfxs gvxsmrjfvh xzm zxsrvev hgzgv lu gsv zig ivhfogh rm nzmb mzgfizo ozmtfztv gzhph v t rm ozmtfztv nlwvormt zmw kzihrmt gsrh rh rmxivzhrmtob rnkligzmg rm nvwrxrmv zmw svzogsxziv dsviv mok svokh zmzobav mlgvh zmw gvcg rm vovxgilmrx svzogs ivxliwh gszg dlfow lgsvidrhv yv rmzxxvhhryov uli hgfwb dsvm hvvprmt gl rnkilev xziv rm gsv vziob wzbh nzmb ozmtfztv kilxvhhrmt hbhgvnh dviv wvhrtmvw yb hbnylorx nvgslwh r v gsv szmw xlwrmt lu z hvg lu ifovh xlfkovw drgs z wrxgrlmzib ollpfk hfxs zh yb dirgrmt tiznnzih li wverhrmt svfirhgrx ifovh uli hgvnnrmt nliv ivxvmg hbhgvnh yzhvw lm nzxsrmv ovzimrmt zotlirgsnh szev nzmb zwezmgztvh levi szmw kilwfxvw ifovh wvhkrgv gsv klkfozirgb lu nzxsrmv ovzimrmt rm mok ivhvzixs hbnylorx nvgslwh ziv hgroo xlnnlmob fhvw hrmxv gsv hl xzoovw hgzgrhgrxzo ivelofgrlm rm gsv ozgv h zmw nrw h nfxs mzgfizo ozmtfztv kilxvhhrmt ivhvzixs szh ivorvw svzerob lm nzxsrmv ovzimrmt gsv nzxsrmv ovzimrmt kzizwrtn xzooh rmhgvzw uli fhrmt hgzgrhgrxzo rmuvivmxv gl zfglnzgrxzoob ovzim hfxs ifovh gsilfts gsv zmzobhrh lu ozitv xlikliz gsv kofizo ulin lu xlikfh rh z hvg lu wlxfnvmgh klhhryob drgs sfnzm li xlnkfgvi zmmlgzgrlmh lu gbkrxzo ivzo dliow vcznkovh nzmb wruuvivmg xozhhvh lu nzxsrmv ovzimrmt zotlirgsnh szev yvvm zkkorvw gl mzgfizo ozmtfztv kilxvhhrmt gzhph gsvhv zotlirgsnh gzpv zh rmkfg z ozitv hvg lu uvzgfivh gszg ziv tvmvizgvw uiln gsv rmkfg wzgz rmxivzhrmtob sldvevi ivhvzixs szh ulxfhvw lm hgzgrhgrxzo nlwvoh dsrxs nzpv hlug kilyzyrorhgrx wvxrhrlmh yzhvw lm zggzxsrmt ivzo ezofvw dvrtsgh gl vzxs rmkfg uvzgfiv xlnkovc ezofvw vnyvwwrmth zmw mvfizo mvgdliph rm tvmvizo szev zohl yvvm kilklhvw uli v t hkvvxs ...
领英推荐
2. Caesar Cipher:
This method uses a shift "p" that changes each letter of the alphabet with a letter that comes p before it in the alphabet.
First we create a dictionary containing each letter with its new value and then we will use it to encode a plain text.
Output:
{'a': 'w'
'b': 'x',
'c': 'y',
'd': 'z',
'e': 'a',
'f': 'b',
'g': 'c',
'h': 'd',
'i': 'e',
'j': 'f',
'k': 'g',
'l': 'h',
'm': 'i',
'n': 'j',
'o': 'k',
'p': 'l',
'q': 'm',
'r': 'n',
's': 'o',
't': 'p',
'u': 'q',
'v': 'r',
'w': 's',
'x': 't',
'y': 'u',
'z': 'v'}
Let's now use this dictionary to encrypt the same text we used before.
3. Bacon's code:
Bacon’s Code replaces each letter of the English alphabet with a 5-letter sequence. These sequences begin with "AAAAA" and add "B"s in an arbitrary order.
So, in Bacon’s Code, "A = AAAAA", "B = AAAAB", "C = AAABA", "D = AAABB" and so on. Let's start, then, with the dictionary that will help us encrypt the text.
Output:
{'a': 'AAAAA'
'b': 'AAAAB',
'c': 'AAABA',
'd': 'AAABB',
'e': 'AABAA',
'f': 'AABAB',
'g': 'AABBA',
'h': 'AABBB',
'i': 'ABAAA',
'j': 'ABAAA',
'k': 'ABAAB',
'l': 'ABABA',
'm': 'ABABB',
'n': 'ABBAA',
'o': 'ABBAB',
'p': 'ABBBA',
'q': 'ABBBB',
'r': 'BAAAA',
's': 'BAAAB',
't': 'BAABA',
'u': 'BAABB',
'v': 'BAABB',
'w': 'BABAA',
'x': 'BABAB',
'y': 'BABBA',
'z': 'BABBB'}
This is the code that encrypts a text.
Output:
'AAAAA-AABBB-ABABA-AABAA-ABABB-'
Conclusion:
In this article we've used three methods of encrypting to apply substitution ciphers.