登录查看更多内容

Creating Your Own Programming Language From Scratch: Part 2

Mahdad Kiyani

Software Developer | Data Analyst | Data Engineer

发布日期: 2023年1月24日

Many people have difficulties or frustrations with the programming languages they use every day. Some want things to be handled more abstractly, while others dislike implementing features they wish were 'standard'. Whether you are an IT professional or just a hobbyist, many times you may find yourself wanting to create a new programming language.

Introduction:

Making an 'interpreter' for a new programming language is the most technically rewarding project I’ve worked on. It pushed my understanding of fundamental software concepts and helped me comprehend how a computer works at a deeper level.

Programming languages and compilers / interpreters can be black boxes for programmers starting out. They’re tools taken for granted, and you can write good software without necessarily worrying about how the languages and primitive tools you’re using work under the hood. But writing a programming language involves understanding the most meaty parts of the computing stack, from the CPU and memory up through the operating system, in-memory data structures, processes, threads, the stack, all the way up to libraries and applications. It’s the floodgate to the most interesting mathematics in computing – category theory and type systems, algebras, lots of applications of graph theory, complexity analysis, and more.?Writing and studying programming languages, I found myself getting more familiar with both the mathematics and the mechanical underpinnings of how computers work. This project made me a better developer as a result, even if my day-to-day work rarely concerns programming language tooling or compiler writing.

Geen alternatieve tekst opgegeven voor deze afbeelding — With my inspirational mentor, who inspired me to write my own programming language. Xi'an 2017 @ Huobi Global Project.

Lets Start...

In this article I will improve some mathematical (expressions) evaluation for our programming language (also see?Part I ) using the 'Dijkstra's Two-Stack algorithm'. The essence of this algorithm is that any complex expression ultimately comes down to the simple expression. In the end, It will be a unary or binary operation on one/two operands.

How Dijkstra's Two-Stack algorithm works:

We iterate tokens expression
If our token is an operand (e.g. number), we push it into the operands stack
If we find an operator, we push into the operators stack
We pop operators with the higher priority than the current one and apply it on the top operand(s)
If an opening parenthesis is found, we push it into the operands stack.
If a closing parenthesis is found, pop all operators and apply it on the top operand(s) until the opening parenthesis will be reached, and finally pop the opening parenthesis.

Example

Let’s suppose we have the following expression:?2 * (3 + 4) - 5. When we calculate this expression first we summarize?3?and?4?in the parentheses. After that, we multiply the result by?2. And in the end we subtract?5?from the result. Down below is the expression evaluation using Dijkstra's Two-Stack algorithm:

Push?2?to the operands stack:

2. Push?*?to operators:

3. Push?(?to operands:

4. Push?3?to operands:

5. Push?+?to operators:

6. Push?4?to operands:

7. Push?)?to operators. Closing parenthesis pops an operation until a nearest opening parenthesis?(. Therefore, we pop?top?operator and apply it on the?two?top?operands:

8. Push?-?to operators. The operators stack's top element?*?has greater precedence than the current operator?-. Therefore, first we apply?topoperator on?two top?operands:

9. Push?5?to operands:

10. Pop left?-?operator:

2 Operators

Let’s dive into the code. First, we will update our?Operator?enum including operators precedence and introduce a few new operators:

@RequiredArgsConstructor
@Getter
public enum Operator {
    Not("!", NotOperator.class, 5),
    StructureValue("::", StructureValueOperator.class, 5),

    Addition("+", AdditionOperator.class, 3),
    Subtraction("-", SubtractionOperator.class, 3),

    LessThan("<", LessThanOperator.class, 2),
    GreaterThan(">", GreaterThanOperator.class, 2);

    private final String character;
    private final Class<? extends OperatorExpression> type;
    private final Integer precedence;

    Operator(String character, Integer precedence) {
        this(character, null, precedence);
    }

    public static Operator getType(String character) {
        return Arrays.stream(values())
                .filter(t -> Objects.equals(t.getCharacter(), character))
                .findAny().orElse(null);
    }

    public boolean greaterThan(Operator o) {
        return getPrecedence().compareTo(o.getPrecedence()) > 0;
    }
}

public class StatementParser {
    ...
    private Expression readExpression() {
        ...
        while (peek(TokenType.Operator)) {
            Token operation = next(TokenType.Operator);
            Operator operator = Operator.getType(operation.getValue());
            Class<? extends OperatorExpression> operatorType = operator.getType();
            ...
        }

        ...
    }
    ...
}

2.1 StructureInstance

We will no longer mark?new?as the keyword lexeme, it will be the operator which will help us to instantiate a structure object inside a more complex expression:

public enum TokenType {
	...
	Keyword("(if|then|end|print|input|struct|arg)"),
	Operator("(\\+|\\-|\\>|\\<|\\={1,2}|\\!|\\:{2}|new)");
    ...
}

We set it the highest priority:

public enum Operator {
	...
    StructureInstance("new", StructureInstanceOperator.class, 5),
	...
}

public class StructureInstanceOperator extends UnaryOperatorExpression {
    public StructureInstanceOperator(Expression value) {
        super(value);
    }

    @Override
    public Value<?> calc(Value<?> value) {
        return value; //will return toString() value
    }
}

2.2 Multiplication & Division

We will also include two more operators with higher priority than addition and subtraction. Don’t forget to add regex expression for each operator to be counted when we’re doing lexical analysis:

public enum TokenType {
	...
	Operator("(\\+|\\-|\\>|\\<|\\={1,2}|\\!|\\:{2}|new|\\*|\\/)");
    ...
}

public enum Operator {
    ...
    Multiplication("*", MultiplicationOperator.class, 4),
    Division("/", DivisionOperator.class, 4),
    ...
}

2.3 LeftParen & RightParen

These operators will be used to change operator’s precedence by surrounding expression in the parentheses. We will not create?OperatorExpression?implementations as we make an expression calculation using parentheses:

public enum TokenType {
	...
	Operator("(\\+|\\-|\\>|\\<|\\={1,2}|\\!|\\:{2}|new|\\*|\\/|\\(|\\))");
    ...
}

public enum Operator {
    ...
    LeftParen("(", 1),
    RightParen(")", 1),
    ...
}

2.4 Assignment

We also need to introduce a new assignment operator with a single equal?=sign. The variable value will be assigned to the variable only when we complete evaluation of the right expression. Therefore, this operator should have the lowest priority:

public enum Operator {
	...
    Assignment("=", AssignmentOperator.class, 6),
	...
}

public class AssignOperator extends BinaryOperatorExpression {
    public AssignOperator(Expression left, Expression right) {
        super(left, right);
    }

    @Override
    public Value<?> calc(Value<?> left, Value<?> right) {
        if (getLeft() instanceof VariableExpression) {
            ((VariableExpression) getLeft()).setValue(right);
        }
        return left;
    }
}

During the execution we expect to retrieve the left expression as a?VariableExpression?- the variable we’re assigning value to. Thus, we introduce a new method in our?VariableExpression?to set the variable value. As you may remember our variables Map is being stored in the?StatementParser. To set access it and set a new value we introduce a?BiConsumer?instance, passing variable name as a key and variable value as a new value:

public class VariableExpression implements Expression {
    ...
    private final BiConsumer<String, Value<?>> setter;
    ...

    public void setValue(Value<?> value) {
        setter.accept(name, value);
    }
}

public class StatementParser {
	...
    private Expression nextExpression() {
        ...
        return new VariableExpression(value, variables::get, variables::put);
    }
	...
}

Michael Spencer 2 年前

Why C Programming Still Matters in the Era of…

DataThick 5 个月前

Programming Languages: A Journey Through Time and…

Mohammad Rammal 5 个月前

2.5 Expression dividers

2.5.1 End of the row

Previously, to build a mathematical expression we expected to obtain an operator after each operand:

Expression left = nextExpression();

//recursively read an expression
while (peek(TokenType.Operator)) {
    Token operation = next(TokenType.Operator);
    Operator operator = Operator.getType(operation.getValue());
    Class<? extends OperatorExpression> operatorType = operator.getType();
    if (BinaryOperatorExpression.class.isAssignableFrom(operatorType)) {
        Expression right = nextExpression();
        left = operatorType
                .getConstructor(Expression.class, Expression.class)
                .newInstance(left, right);
    } else if (UnaryOperatorExpression.class.isAssignableFrom(operatorType)) {
        left = operatorType
                .getConstructor(Expression.class)
                .newInstance(left);
    }
}

With the current behavior we can expect two subsequent operators in the row, e.g. addition of structure instantiation as an right operand or two following opening parentheses:

value + new Car [“E30” “325i”] :: model

or

( ( 1 + 2) * 3) - 4

To read an entire expression with any sequence of operands and operators we will introduce the?LineBreak?lexeme which will help us to catch all expression tokens until we reach end of the row:

public class StatementParser {
    LineBreak("[\\n\\r]"),
    Whitespace("[\\s\\t]"),
    ...
}

2.5.2 Structure arguments

The second divider we need to count is when we instantiate a structure object:

new Car [ “E30” “325i” new Engine [“M20B25”] :: model ]

This structure instantiation is quite complex for the?StatementParser?to divide each passing argument into separate expression. To deal with it we can introduce new comma?GroupDivider:

public enum TokenType {
	...
   	GroupDivider("(\\[|\\]|\\,)"),
	...
}

new Car [ “E30”, “325i”, new Engine [“M20B25”] :: model ]

3 Expression evaluation

Now we can start refactoring our expression evaluation in the?StatementParser?using Dijkstra’s Two-Stack algorithm.

3.1 Next token

Firstly, we will introduce?skipLineBreaks()?method, that will increment tokens position while we reach?LineBreak?token:

private void skipLineBreaks() {
    while (tokens.get(position).getType() == TokenType.LineBreak 
            && ++position != tokens.size()) ;
}

It will be used on the top of each?next()?and?peek()?methods to escape catching?LineBreak?token:

private Token next(TokenType type, TokenType... types) {
    skipLineBreaks();
    ...
}

private Token next(TokenType type, String value) {
    skipLineBreaks();
    ...
}

private boolean peek(TokenType type, String content) {
    skipLineBreaks();
    ...
}

private boolean peek(TokenType type) {
    skipLineBreaks();
    ...
}

We will also override the?next()?method without any arguments just returning the next token:

private Token next() {
    skipLineBreaks();
    return tokens.get(position++);
}

3.2 ExpressionReader

To read and calculate an entire expression using Dijkstra's Two-Stack algorithm we will need two stacks, the operators stack, and the operands stack. The first stack of operators will contain the?Operator?enum types. The second stack of operands will have?Expression?type, it will help us to store every type of expression including values, variables and composite operators. It will be clearer if we perform expression reading in a separate inner class?ExpressionReader:

private class ExpressionReader {
    private final Stack<Expression> operands;
    private final Stack<Operator> operators;

    private ExpressionReader() {
        this.operands = new Stack<>();
        this.operators = new Stack<>();
    }
}

Then we create a?readExpression()?method with a while loop to read an entire expression. Our mathematical expression can begin with?Operator,?Variable,?Numeric?,?Logical?or?Text?token types. In this case we can’t exclude the?LineBreak?token during token retrieval because this token means the end of the expression. Therefore, we introduce an additional?peek()?method inside our inner class that will read an expression until we reach the end of the row:

private class ExpressionReader {
    ...
    private Expression readExpression() {
		while (peek(TokenType.Operator, TokenType.Variable, TokenType.Numeric, TokenType.Logical, TokenType.Text)) {
 	        Token token = next();
            switch (token.getType()) {
               case Operator:
			       ...
			    default:
			        ...
            }
        }
        ...
    }

    private boolean peek(TokenType type, TokenType... types) {
        TokenType[] tokenTypes = ArrayUtils.add(types, type);
        if (position < tokens.size()) {
            Token token = tokens.get(position);
            return Stream.of(tokenTypes).anyMatch(t -> t == token.getType());
        }
        return false;
    }
}

3.2.1 Operator

The first lexeme type we will process is?Operator. We retrieve the?Operator?enum by token’s value and use it in the switch block. There are 3 cases we will deal with:

LeftParen?- just put an operator into the operators stack
RightParen?- calculate previously pushed expressions until we reach the left parenthesis?LeftParen.
Before inserting other operators we need to be sure that the current operator has greater precedence than the top operator, in the negative case we pop the top operator and apply it on the top operand(s). In the end we push the less prioritized operator into the operators stack.

...
case Operator:
    Operator operator = Operator.getType(token.getValue());
    switch (operator) {
        case LeftParen:
            operators.push(operator);
            break;
        case RightParen:
            //until left parenthesis is not reached
            while (!operators.empty() && operators.peek() != Operator.LeftParen)
                applyTopOperator();
            operators.pop(); //pop left parenthesis
            break;
        default:
            //until top operator has greater precedence
            while (!operators.isEmpty() && operators.peek().greaterThan(operator))
                applyTopOperator();
            operators.push(operator); // finally, add less prioritized operator
    }
    break;
...

To apply the top operator on the top operand(s) we create an additional?applyTopOperator()?method. First we pop an operator and the first operand. Then if our operator is binary we pop the second operand. To create an?OperatorExpression?object we retrieve a suitable constructor for unary or binary operator implementation and make an instance with earlier popped operands. In the end we push the?OperatorExpression?instance to the operands stack. Thus, this pushed operand?Expression?can be used later by other operators inside a more composite expression:

@SneakyThrows
private void applyTopOperator() {
    // e.g. a + b
    Operator operator = operators.pop();
    Class<? extends OperatorExpression> operatorType = operator.getType();
    Expression left = operands.pop();
    if (BinaryOperatorExpression.class.isAssignableFrom(operatorType)) {
        Expression right = operands.pop();
        operands.push(operatorType
                .getConstructor(Expression.class, Expression.class)
                .newInstance(right, left));
    } else if (UnaryOperatorExpression.class.isAssignableFrom(operatorType)) {
        // e.g. new Instance []
        operands.push(operatorType
                .getConstructor(Expression.class)
                .newInstance(left));
    } else {
        throw new SyntaxException(String.format("Operator `%s` is not supported", operatorType));
    }
}

3.2.2 Value & VariableExpression

If our token is a variable or a literal, we transform it to the appropriate?Expression?object inside the following switch block and then push the result to the operands stack:

...
default:
    String value = token.getValue();
    Expression operand;
    switch (token.getType()) {
        case Numeric:
            operand = new NumericValue(Integer.parseInt(value));
            break;
        case Logical:
            operand = new LogicalValue(Boolean.valueOf(value));
            break;
        case Text:
            operand = new TextValue(value);
            break;
        case Variable:
        default:
            if (!operators.isEmpty() && operators.peek() == Operator.StructureInstance) {
                operand = readInstance(token);
            } else {
                operand = new VariableExpression(value, variables::get, variables::put);
            }
    }
    operands.push(operand);
...

A variable token can also indicate the start of the structure instantiation. In this case we read an entire structure expression with arguments inside square brackets and only then assign it to the?StructureExpressionvalue. We will move existent?readInstance()?method from?StatementParser?class into the inner?ExpressionReader?class with following changes:

...
private Expression readInstance(Token token) {
    StructureDefinition definition = structures.get(token.getValue());

    List<Expression> arguments = new ArrayList<>();
    if (StatementParser.this.peek(TokenType.GroupDivider, "[")) {

        next(TokenType.GroupDivider, "["); //skip open square bracket

        while (!StatementParser.this.peek(TokenType.GroupDivider, "]")) {
            Expression value = new ExpressionReader().readExpression();
            arguments.add(value);

            if (StatementParser.this.peek(TokenType.GroupDivider, ","))
                next();
        }

        next(TokenType.GroupDivider, "]"); //skip close square bracket
    }

    if (definition == null) {
        throw new SyntaxException(String.format("Structure is not defined: %s", token.getValue()));
    }
    return new StructureExpression(definition, arguments, variables::get);
}
...

3.2.3 Evaluation

After we finished processing expression tokens and reached the end of the expression we will need an extra while loop to pop remaining operators and apply it on the top operand(s). In the end we should have the only one resulting composite operand in the operands stack, we return it as a final expression result:

private Expression readExpression() {
    ...

    while (!operators.isEmpty()) {
        applyTopOperator();
    }

    return operands.pop();
}

3.3 AssignStatement

Now we can now build?AssignmentStatement?in the?parseExpression()?method. The?AssignStatement?can start with a Variable or an Operator token type (in case we have a?new?operator for the structure instantiation). When we read an expression we expect to read an entire?Assignment?expression accumulating left value as a variable name and right expression as a variable value. Therefore, we go to one tokens position back and start reading the expression using?ExpressionReaderstarting from variable’s name declaration:

private Statement parseExpression() {
    Token token = next(TokenType.Keyword, TokenType.Variable, TokenType.Operator);
    switch (token.getType()) {
        case Variable:
        case Operator:
            position--;
            Expression value = new ExpressionReader().readExpression();
		    ...
        case Keyword:
        default:
            ...
    }
}

In the end we expect to obtain the complete?AssignmentOperator?and we can create an?AssignmentStatement?instance:

...
case Variable:
case Operator:
    position--;
    Expression value = new ExpressionReader().readExpression();

    if (value instanceof AssignmentOperator) {
        VariableExpression variable = (VariableExpression) ((AssignmentOperator) value).getLeft();
        return new AssignmentStatement(variable.getName(), ((AssignmentOperator) value).getRight(), variables::put);
    } else {
        throw new SyntaxException(String.format("Unsupported statement: `%s`", value));
    }
...

The last thing we need to do is to update expressions reading for the?inputand?if?statements using our new inner?ExpressionReader?class. It will allow us to read complex expression inside?print?and?if?statements:

...
case Keyword:
default:
    switch (token.getValue()) {
        case "print":
            Expression expression = new ExpressionReader().readExpression();
            ...
        case "if":
            Expression condition = new ExpressionReader().readExpression();
		    ...
    }
...

4 Wrapping up

In this article, we improved our own programming language with mathematical expressions evaluation using Dijkstra's Two-Stack algorithm. We injected operators priority with an ability to change it using parentheses, including complex structure instantiations.?

要查看或添加评论，请登录

Mahdad Kiyani的更多文章

Mahdad Kiyani: Verified Expertise in Blockchain Development Backed by #Hashlock Audit and KYC

2024年11月19日

Mahdad Kiyani: Verified Expertise in Blockchain Development Backed by #Hashlock Audit and KYC

In the world of blockchain and Web3, trust is the cornerstone of success. As digital ecosystems expand, so does the…
Why Capgemini's AI Report Fails!!!

2024年11月18日

Why Capgemini's AI Report Fails!!!

Why Capgemini’s Generative AI Report Misses the Mark https://www.capgemini.
Implementing Quantum-Proof ZK-Cryptography by Mahdad Kiyani Cross-Language Implementation of RLWE-ZKP based on Python and C++(basic)

2024年11月4日

Implementing Quantum-Proof ZK-Cryptography by Mahdad Kiyani Cross-Language Implementation of RLWE-ZKP based on Python and C++(basic)

As quantum computing advances, traditional cryptographic systems, including those used in blockchain networks, face…
There is No Artificial Intelligence: What We Have is Algorithm Intelligence

2024年10月22日

There is No Artificial Intelligence: What We Have is Algorithm Intelligence

Introduction There is no Artificial Intelligence; what we have is Algorithm Intelligence. This challenges the common…
Cutting-Edge Security for SQL, Python, JavaScript, and the Cloud

2024年9月17日

Cutting-Edge Security for SQL, Python, JavaScript, and the Cloud

According to Statista, from 2005 to 2023, over 353 million individuals in the U.S.

3 条评论
Highly secure zkEmail accounts(Javascript/Rust)

2024年9月17日

Highly secure zkEmail accounts(Javascript/Rust)

Introduction Hi, I’m Mahdad Kiyani, and I’d like to explain zkEmail accounts, building on my previous article about…
Best Practices and Essential Tools for Robust Protection with Python!

2024年9月16日

Best Practices and Essential Tools for Robust Protection with Python!

Back in 2020, Twitter experienced a major security breach where high-profile accounts were compromised, leading to…
Security leaks in Zero Knowledge Proof and FinTech platforms!!!

2024年8月2日

Security leaks in Zero Knowledge Proof and FinTech platforms!!!

Back in December 2021 I started noticing some leaks in ZK-Dapps and issued it on Github, BitcoinTalk & Discord. First…
Building a Decentralized Blockchain (Payment Transaction) Application with ReactJS by Mahdad Kiyani

2023年9月2日

Building a Decentralized Blockchain (Payment Transaction) Application with ReactJS by Mahdad Kiyani

Introduction: Welcome to the first part of our two-part tutorial series on building a decentralized blockchain…

1 条评论
Developing a Virtual Reality App. A Comprehensive Guide by Mahdad Kiyani. part 1!

2023年8月31日

Developing a Virtual Reality App. A Comprehensive Guide by Mahdad Kiyani. part 1!

Introduction: Virtual Reality (VR) technology has paved the way for highly immersive and realistic simulations…

See all articles

Creating Your Own Programming Language From Scratch: Part 2

Mahdad Kiyani

Software Developer | Data Analyst | Data Engineer

Introduction:

Example

2 Operators

2.1 StructureInstance

2.2 Multiplication & Division

2.3 LeftParen & RightParen

2.4 Assignment

领英推荐

2.5 Expression dividers

2.5.1 End of the row

2.5.2 Structure arguments

3 Expression evaluation

3.1 Next token

3.2 ExpressionReader

3.2.1 Operator

3.2.2 Value & VariableExpression

3.2.3 Evaluation

3.3 AssignStatement

4 Wrapping up

Mahdad Kiyani的更多文章

社区洞察

其他会员也浏览了

Top 10 New Programming Languages in 2022

Functional Programming

Rust Compilation

Creating a Bosque Programming Language Dev Environment in 2021

Top 10 Programming Languages to Learn in 2023

Why Rust language will be the futer of programming?

We Speak Your Language - Walter Bright

Code with C

Decoding the 'Virtual' Function in C++ Programming Language

Mastering Different Programming Languages with Ease

Introduction:

Example

2 Operators

2.1 StructureInstance

2.2 Multiplication & Division

2.3 LeftParen & RightParen

2.4 Assignment

领英推荐

2.5 Expression dividers

2.5.1 End of the row

2.5.2 Structure arguments

3 Expression evaluation

3.1 Next token

3.2 ExpressionReader

3.2.1 Operator

3.2.2 Value & VariableExpression

3.2.3 Evaluation

3.3 AssignStatement

4 Wrapping up

Mahdad Kiyani的更多文章

Mahdad Kiyani: Verified Expertise in Blockchain Development Backed by #Hashlock Audit and KYC

Why Capgemini's AI Report Fails!!!

Implementing Quantum-Proof ZK-Cryptography by Mahdad Kiyani Cross-Language Implementation of RLWE-ZKP based on Python and C++(basic)

There is No Artificial Intelligence: What We Have is Algorithm Intelligence

Cutting-Edge Security for SQL, Python, JavaScript, and the Cloud

Highly secure zkEmail accounts(Javascript/Rust)

Best Practices and Essential Tools for Robust Protection with Python!

Security leaks in Zero Knowledge Proof and FinTech platforms!!!

Building a Decentralized Blockchain (Payment Transaction) Application with ReactJS by Mahdad Kiyani

Developing a Virtual Reality App. A Comprehensive Guide by Mahdad Kiyani. part 1!

社区洞察

其他会员也浏览了

Top 10 New Programming Languages in 2022

Functional Programming

Rust Compilation

Creating a Bosque Programming Language Dev Environment in 2021

Top 10 Programming Languages to Learn in 2023

Why Rust language will be the futer of programming?

We Speak Your Language - Walter Bright

Code with C

Decoding the 'Virtual' Function in C++ Programming Language

Mastering Different Programming Languages with Ease