Introducci贸n al desarrollo de int茅rpretes y lenguajes de programaci贸n

1

Aprende a desarrollar lenguajes de programaci贸n con int茅rpretes

2

Desarrolla LPP o Lenguaje de Programaci贸n Platzi

Construcci贸n del lexer o tokenizador

3

驴Qu茅 es an谩lisis l茅xico? Funcionamiento del lexer y tokens

4

Estructura y definici贸n de tokens en Python

5

Lectura de caracteres y tokens

6

Tokens ilegales, operadores de un solo car谩cter y delimitadores

7

Reconocimiento y diferenciaci贸n entre letras y n煤meros

8

Declaraci贸n y ejecuci贸n de funciones

9

Extensi贸n del lexer: condicionales, operaciones y booleanos

10

Operadores de dos caracteres

11

Primera versi贸n del REPL con tokens

Construcci贸n del parser o analizador sint谩ctico

12

驴Qu茅 es un parser y AST?

13

Estructura y definici贸n de nodos del AST en Python

14

Parseo del programa o nodo principal

15

Parseo de assignment statements

16

Parseo de let statements

17

Parseo de errores

18

Parseo del return statement

19

T茅cnicas de parsing y pratt parsing

20

Pruebas del AST

21

Implementaci贸n del pratt parser

22

Parseo de Identifiers: testing

23

Parseo de Identifiers: implementaci贸n

24

Parseo de enteros

25

Prefix operators: negaci贸n y negativos

26

Infix operators y orden de las operaciones: testing

27

Infix operators y orden de las operaciones: implementaci贸n

28

Parseo de booleanos

29

Desaf铆o: testing de infix operators y booleanos

30

Parseo de expresiones agrupadas

31

Parseo de condicionales: testing y AST

32

Parseo de condicionales: implementaci贸n

33

Parseo de declaraci贸n de funciones: testing

34

Parseo de declaraci贸n de funciones: AST e implementaci贸n

35

Parseo de llamadas a funciones: testing y AST

36

Parseo de llamadas a funciones: implementaci贸n

37

Completando los TODOs o pendientes del lexer

38

Segunda versi贸n del REPL con AST

Evaluaci贸n o an谩lisis sem谩ntico

39

Significado de s铆mbolos

40

Estrategias de evaluaci贸n para int茅rpretes de software

41

Representaci贸n de objetos

42

Evaluaci贸n de expresiones: enteros

43

Evaluaci贸n de expresiones: booleanos y nulos

44

Evaluaci贸n de expresiones: prefix

45

Evaluaci贸n de expresiones: infix

46

Evaluaci贸n de condicionales

47

Evaluaci贸n del return statement

48

Manejo de errores

49

Ambiente

50

Bindings

51

Evaluaci贸n de funciones

52

Llamadas a funciones

Mejora del int茅rprete

53

Implementaci贸n de strings

54

Operaciones con strings

55

Built-in functions: objeto y tests

56

Built-in functions: evaluaci贸n

Siguientes pasos

57

Retos para expandir tu int茅rprete

58

Contin煤a con el Curso de Creaci贸n de Compiladores de Software

You don't have access to this class

Keep learning! Join and start boosting your career

Aprovecha el precio especial y haz tu profesi贸n a prueba de IA

Antes: $249

Currency
$209
Suscr铆bete

Termina en:

0 D铆as
12 Hrs
16 Min
34 Seg

Extensi贸n del lexer: condicionales, operaciones y booleanos

9/58
Resources

How to extend the lexicon for conditionals and booleans?

Developing an interpreter for the Platzi programming language requires us to continually extend our capabilities. So far, we have made great strides in handling variables, basic operators and functions. However, the focus now is on expanding our lexer to include additional operators and conditional dialog.

How are the additional operators implemented?

To begin, we must implement a series of tests and modifications to our system. The focus here is to create a new test called test control statement, in order to corroborate whether our conditionals generate the correct consequences.

  1. Necessary tests:

    • We must define a source and tokenize the following string: if 5 less than 10 it returns true, otherwise it returns false.
    • This requires implementing keywords like yes, returns, true, false, and also defining a comparison operator like <.
  2. Change in the next token method:

    • By modifying the next token method in the lexer, we can determine where each of the operators are placed and adjust our regular expressions.
  3. Update in lookup token type:

    • In the token.py module, inside the lookup token type method, we add the new keywords to recognize, such as: false, true, yes (equivalent to if), else (equivalent to else), and return (like return).

What changes are made to the token file?

Manipulation of the token.py file is crucial to determine the extended token types.

  1. Definition of new token types:

    • We add else, false, if, return and true in our enum of token types, following an alphabetical order to facilitate future revisions.
  2. Implementation of the < operator :

    • We introduce a regular expression inside next token to recognize the < operator and construct a token corresponding to the less than type.

How do you create a test for one-character operators?

The last step includes adjusting the test one character operator to verify if the new operators are recognized correctly.

  1. Operators to implement:

    • In addition to the existing assignment and addition, we add: subtraction, division, multiplication, less than, greater than and negation.
  2. Modification and testing:

    • We adjust the tokens expected in the test and modify the code within next token to identify these characters.

This series of implementations ensures that our interpreter not only recognizes basic elements, but can process fundamental conditions and operators. Try these new features and share your findings in the comments - always be enthusiastic about learning and growing in programming!

Contributions 8

Questions 2

Sort by:

Want to see more contributions, questions and answers from the community?

Reto resuelto! 馃槃
.
Primero escribo el test expectando los tokens:

def test_one_character_operator(self) -> None:

        source: str = "=+"
        source: str = "=+-/*<>!"
        lexer: Lexer = Lexer(source)

        tokens: List[Token] = []
        for i in range(len(source)):
            tokens.append(lexer.next_token())
        expected_tokens: List[Token] = [
            Token(TokenType.ASSIGN, "="),
            Token(TokenType.PLUS, "+"),
            Token(TokenType.MINUS, "-"),
            Token(TokenType.DIVISION, "/"),
            Token(TokenType.MULTIPLICATION, "*"),
            Token(TokenType.LT, "<"),
            Token(TokenType.GT, ">"),
            Token(TokenType.NEGATION, "!"),
        ]

        self.assertEquals(tokens, expected_tokens)

Despu茅s, en la lista de token types se agregan los nuevos tokens:

DIVISION = auto()
GT = auto() # Gretater Than (>)
MINUS = auto() # Resta
MULTIPLICATION = auto()
NEGATION = auto() # Negaci贸n (!)

Despu茅s se a帽aden las condiciones al lexer para que pueda retornar los tokens correctos (hay que escapar la multiplicaci贸n):

        elif match(r"^-$", self._character):
            token = Token(TokenType.MINUS, self._character)

        elif match(r"^/$", self._character):
            token = Token(TokenType.DIVISION, self._character)

        elif match(r"^\*$", self._character):
            token = Token(TokenType.MULTIPLICATION, self._character)

        elif match(r"^<$", self._character):
            token = Token(TokenType.LT, self._character)

        elif match(r"^>$", self._character):
            token = Token(TokenType.GT, self._character)

        elif match(r"^!$", self._character):
            token = Token(TokenType.NEGATION, self._character)

隆Y listo! 馃槃
.
https://github.com/RetaxMaster/lpp/commit/b935aa10e833565492fea8935a68e4dd87067c66

Quise incluir el NOT pero lo hab铆amos agregado como un token inv谩lido al inicio, entonces no fue posible y tuve que corregir.

```python def test_control_statement(self) -> None: source: str =''' si(5<10){ regresa verdadero; }si_no{ regresa falso; } ''' lexer: Lexer=Lexer(source) tokens:List[Token] ```
ok

Esta es mi solucion en Deno con TypeScript:
test:

  await t.step('test one character operator', () => {
    const source = '=+-/*<>!'
    const lexer = new Lexer(source)

    const tokens: Token[] = []
    for (let i = 0; i < source.length; i++) {
      tokens.push(lexer.nextToken())
    }

    const expected_tokens = [
      new Token(TokenType.ASSIGN, '='),
      new Token(TokenType.PLUS, '+'),
      new Token(TokenType.MINUS, '-'),
      new Token(TokenType.DIVISION, '/'),
      new Token(TokenType.MULTIPLICATION, '*'),
      new Token(TokenType.LT, '<'),
      new Token(TokenType.RT, '>'),
      new Token(TokenType.NEGATION, '!'),
    ]

    assertEquals(tokens, expected_tokens)
  })

tokens:

export enum TokenType {
  ASSIGN,
  COMMA,
  EOF,
  FUNCTION,
  IDENT,
  ILLEGAL,
  NUMBER,
  LBRACE,
  LET,
  LPAREN,
  PLUS,
  RBRACE,
  RPAREN,
  SEMICOLON,
  RETURN,
  IF,
 ``` LT,
  RT,
  TRUE,
  FALSE,
  ELSE,
  MINUS,
  DIVISION,
  MULTIPLICATION,
  NEGATION
}

export const validCharacters: { [key: string]: TokenType } = {
  ',': TokenType.COMMA,
  ';': TokenType.SEMICOLON,
  '{': TokenType.LBRACE,
  '}': TokenType.RBRACE,
  '(': TokenType.LPAREN,
  ')': TokenType.RPAREN,
  '': TokenType.EOF,
  
  //operators
  '+': TokenType.PLUS,
  '=': TokenType.ASSIGN,
  '<': TokenType.LT,
  '>': TokenType.RT,
  '*': TokenType.MULTIPLICATION,
  '-': TokenType.MINUS,
  '/': TokenType.DIVISION,
  '!': TokenType.NEGATION
}

lexer:

import { Token, TokenType, validCharacters, lookupTokenType } from "./token.ts"

export class Lexer {
    private source: string
    private character: string
    private readPosition: number
    private position: number

    constructor(source: string) {
        this.source = source
        this.character = ''
        this.readPosition = 0
        this.position = 0

        this.readCharacter()
    }

    nextToken(): Token {
        this.skipWhiteSpace()
        let tokenType: TokenType = TokenType.EOF
        let token: Token = new Token(tokenType, '')

        if (this.isValidCharacter(this.character)) {
            tokenType = validCharacters[this.character]
            token = new Token(tokenType, this.character)
        }
        else if (this.isLetter(this.character)) {
            const literal = this.readIdentifier()
            const tokenType = lookupTokenType(literal)
            return new Token(tokenType, literal)
        }
        else if (this.isNumber(this.character)) {
            const literal = this.readNumber()
            return new Token(TokenType.NUMBER, literal)
        }
        else {
            token = new Token(TokenType.ILLEGAL, this.character)
        }

        this.readCharacter()
        return token
    }

    private readIdentifier(): string {
        const initialPosition = this.position

        while (this.isLetter(this.character)) {
            this.readCharacter()
        }

        return this.source.slice(initialPosition, this.position)
    }

    private readCharacter(): void {
        if (this.readPosition >= this.source.length) this.character = ''
        else this.character = this.source[this.readPosition]

        this.position = this.readPosition
        this.readPosition += 1
    }

    private readNumber(): string {
        const initialPosition = this.position

        while (this.isNumber(this.character)) {
            this.readCharacter()
        }

        return this.source.slice(initialPosition, this.position)
    }

    private isLetter(character: string): boolean {
        return /^[a-zA-Z_&]$/.test(character)
    }

    private isValidCharacter(character: string): boolean {
        return Object.keys(validCharacters).includes(character)
    }

    private isNumber(character: string): boolean {
        return /^[0-9.]$/.test(character)
    }

    private skipWhiteSpace(): void {
        while (/^[\s\t]$/.test(this.character)) {
            this.readCharacter()
        }
    }
}

Definici贸n de palabras reservadas para compradores y boleanos

Aqu铆 mi soluci贸n:

def test_one_character_operator(self) -> None: # Test para reconocer los operadores de un caracter
        source: str = '=+/-*<>!'
        lexer: Lexer = Lexer(source) # Inicializamos el Lexer pasandole como argumento el source
        tokens: List[Token] = [] # Se genera la lista de tokens que el lexer deber铆a de devolver
        for i in range(len(source)): # Se hace un loop a lo largo de Lexer y en cada loop se llama a next_token para ir populando o a帽adiendo cada Token a la lista de tokens
            tokens.append(lexer.next_token())

        expected_tokens: List[Token] = [
            Token(TokenType.ASSIGN, '='),
            Token(TokenType.PLUS, '+'),
            Token(TokenType.DIV, '/'),
            Token(TokenType.MINUS, '-'),
            Token(TokenType.MULT, '*'),
            Token(TokenType.LT, '<'),
            Token(TokenType.GT, '>'),
            Token(TokenType.EXC, '!'),
        ]
        self.assertEquals(tokens, expected_tokens)
@unique # Se le a帽ade el decorador unique pora saber que los tokens son unicos, un decorador recibe como parametro una funcion, le a帽ade cosas, la ejecuta y retorna a esta misma funci贸n pero ya modificada, por eso se dice que retorna una funci贸n diferente
class TokenType(Enum):
    # Es buena practica ponerlos en orden alfabetico
    # auto es que no interesa el valor del enum
    ASSIGN = auto()
    COMMA = auto()
    ELSE = auto()
    EOF = auto()
    EXC = auto()
    DIV = auto()
    FALSE = auto()
    FUNCTION = auto()
    GT = auto()
    IDENT = auto()
    IF = auto()
    ILLEGAL = auto()
    INT = auto()
    LBRACE = auto()
    LET = auto()
    LPAREN = auto()
    LT = auto()
    MINUS = auto()
    MULT = auto()
    PLUS = auto()
    RBRACE = auto()
    RETURN = auto()
    RPAREN = auto()
    SEMICOLON = auto()
    TRUE = auto()

Y la implementaci贸n en el lexer

def next_token(self) -> Token:

        # Se va a revisar con expresiones regulares
        """
        En Python las expresiones regulares empiezan con cadenas row
        Comience al principio de la cadena, encuentre un igual y que termine en esto, se quiere un igual desde el principio hasta el final
        """
        self._skip_whitespace() # Se ignoraran los espacios en blanco siempre al empezar un nuevo token

        if match(r'^=$', self._character):
            token = Token(TokenType.ASSIGN, self._character)
        elif match(r'^\+$', self._character):  # Se escapa por tener un significado especial en las expresiones regulares
            token = Token(TokenType.PLUS, self._character)
        elif match(r'^$', self._character):
            token = Token(TokenType.EOF, self._character)
        elif match(r'^\($', self._character): # Se escapa por tener un significado especial en las expresiones regulares
            token = Token(TokenType.LPAREN, self._character)
        elif match(r'^\)$', self._character): # Se escapa por tener un significado especial en las expresiones regulares
            token = Token(TokenType.RPAREN, self._character)
        elif match(r'^{$', self._character):
            token = Token(TokenType.LBRACE, self._character)
        elif match(r'^}$', self._character):
            token = Token(TokenType.RBRACE, self._character)
        elif match(r'^,$', self._character):
            token = Token(TokenType.COMMA, self._character)
        elif match(r'^;$', self._character):
            token = Token(TokenType.SEMICOLON, self._character)
        elif match(r'^<$', self._character):
            token = Token(TokenType.LT, self._character)
        elif match(r'^/$', self._character):
            token = Token(TokenType.DIV, self._character)
        elif match(r'^-$', self._character):
            token = Token(TokenType.MINUS, self._character)
        elif match(r'^\*$', self._character): # Se escapa por tener un significado especial en las expresiones regulares
            token = Token(TokenType.MULT, self._character)
        elif match(r'^>$', self._character):
            token = Token(TokenType.GT, self._character)
        elif match(r'^!$', self._character):
            token = Token(TokenType.EXC, self._character)
        # Funciones auxiliares, en lugar de generar una expresion regular se generan funciones auxiliares
        elif self._is_letter(self._character): # Ahora si nos encontramos frente a un caracter lo que se quiere es generar una literal, donde se genera una funcion y luego se conoce que tipo de Token es
            literal = self._read_identifier()
            # Ahora como podemos tener un identifier o keyword se genera esta funcion
            token_type = lookup_token_type(literal) # Esta funcion se genera en token.py

            return Token(token_type, literal) # Se regresa lo que mande la funcion lookup y la literal
        # Se necesita saber si estamos frente a un numero
        elif self._is_number(self._character):
            literal = self._read_number()

            return Token(TokenType.INT, literal)
        else: # Si no reconoce entonces dir谩 que es un Token ilegal
            token = Token(TokenType.ILLEGAL, self._character)

        """
        Se tiene que escapar especificamente el caracter de suma porque suma significa algo especifico en las expresiones regulares, significa que haga match por lo menos una o mas veces pero aqui no interesa la funcionalidad sino especificamente el caracter, se escapa con la diagonal invertida
        """
        # Escapar: hacer que el caracter sea tomado c贸mo texto plano en lugar de su significado por defecto en la expreci贸n regular.

        
        # Se necesita correrlo despues de que se genere el token y antes de regresarlo
        self._read_character()
        
        return token # Se regresa el token

Este curso est谩 muy bueno!

Aqu铆 se me hab铆a olvidado poner los tokens en la lista y estuve media hora viendo que pasaba xd

# lexer_test.py

def test_one_character_operator(self) -> None:
        source: str = '=+-/*<>!'
        lexer: Lexer = Lexer(source)

        tokens: List[Token] = []
        for i in range(len(source)):
            tokens.append(lexer.next_token())

            expected_tokens: List[Token] = [
                Token(TokenType.ASSIGN, '='),
                Token(TokenType.PLUS, '+'),
                Token(TokenType.MINUS, "-"),
                Token(TokenType.DIVISION, "/"),
                Token(TokenType.MULTIPLICATION, "*"),
                Token(TokenType.LT, "<"),
                Token(TokenType.GT, ">"),
                Token(TokenType.NEGATION, "!"),
            ]
# lexer.py

	# Token '='
        if match(r'^=$', self._character):
            token = Token(TokenType.ASSIGN, self._character)
        # Token '+'
        elif match(r'^\+$', self._character):
            token = Token(TokenType.PLUS, self._character)
        # Token '-'
        elif match(r'^-$', self._character):
            token = Token(TokenType.MINUS, self._character)
        # Token '*'
        elif match(r'^\*$', self._character):
            token = Token(TokenType.MULTIPLICATION, self._character)
        # Token '/'
        elif match(r'^\/$', self._character):
            token = Token(TokenType.DIVISION, self._character)
        # Token '<'
        elif match(r'^<$', self._character):
            token = Token(TokenType.LT, self._character)
        # Token '>'
        elif match(r'^>$', self._character):
            token = Token(TokenType.GT, self._character)
            # Token '!'
        elif match(r'^!$', self._character):
            token = Token(TokenType.NEGATION, self._character)
DIVISION = auto() # /
GT = auto() # >
MINUS = auto() # -
MULTIPLICATION = auto() # *
NEGATION = auto() # !