Encoding
Computers can handle and understand only numbers and stores them on registers (unit of memory) . How are non-numerical characters and other special character in the key board being stored ? Encoding is the answer also it has to be standard so that different computer manufactures don’t use different techniques.
Popular Encoding Standards:
ASCII (American standard code for Information exchange):
This is one bytes or 8-bit encoding standard which came to existence early 1960’s. Each character in the key board is assigned what is called ASCII code. For example, B is assigned as 66(Decimal) or 1 is assigned as 49 etc.
With time new languages started showing up on key board which brought new character set and ASCII could not incorporate so many languages hence new Standard called Unicode Standard.
Unicode Standard:
This encoding standard can represent text in almost all languages of the world. There are two different variations on Unicode
UTF 16: Encodes each character with 16 bit or two bytes
UTF 8: This came after UTF 16 and it provides the flexibility of backward compatibility with ASCII. That is it uses 8 bit encoding for English characters and for non-English character set it used 32 bit encoding .
First Case using UTF 8 code is an advantage since it occupies less memory that is 8 bits where as UTF 16 takes 16 bits to store.
Second case for the Indian Rupee sign which is not in ASCII set, using UTF 8 takes 4 bytes or 32 bits to store compared to UTF 16 .
Popular Programming languages like python uses UTF 8 encoding standards