Pretty-printing and parsing
As shown in the examples above, the internal representation (also called abstract syntax) of expressions quickly becomes hard to read and write as the expressions get larger. We need a printer and a parser to go back and forth between the abstract syntax and the concrete syntax, which in the case of expressions is the familiar algebraic notation (e.g. 2*x+1). For the printing function, we take into account the usual precedence rules (i.e. * binds tighter than +) to avoid printing unnecessary parentheses. To this end, we maintain the current operator precedence and print parentheses around an operator only if its precedence is less than the current precedence.
# let print_expr exp = # (* Local function definitions *) # let open_paren prec op_prec = # if prec > op_prec then print_string “(” in # let close_paren prec op_prec = # if prec > op_prec then print_string “)” in # let rec print prec exp = (* prec is the current precedence *) # match exp with # | Const c -> print_float c
# | Var v -> print_string v # | Sum(f, g) -> # open_paren prec 0; # print 0 f; print_string ” + “; print 0 g; # close_paren prec 0 # | Diff(f, g) -> # open_paren prec 0; # print 0 f; print_string ” – “; print 1 g; # close_paren prec 0 # | Prod(f, g) -> # open_paren prec 2; # print 2 f; print_string ” * “; print 2 g; # close_paren prec 2 # | Quot(f, g) -> # open_paren prec 2; # print 2 f; print_string ” / “; print 3 g; # close_paren prec 2 # in print 0 exp;; val print_expr : expression -> unit = <fun> # let e = Sum(Prod(Const 2.0, Var “x”), Const 1.0);; val e : expression = Sum (Prod (Const 2., Var “x”), Const 1.) # print_expr e; print_newline();; 2. * x + 1. – : unit = () # print_expr (deriv e “x”); print_newline();; 2. * 1. + 0. * x + 0. – : unit = () Parsing (transforming concrete syntax into abstract syntax) is usually more delicate. Caml oﬀers several tools to help write parsers: on the one hand, Caml versions of the lexer generator Lex and the parser generator Yacc (see chapter 12 of the reference manual), which handle LALR(1) languagesusingpush-downautomata; ontheotherhand, apredeﬁnedtypeofstreams(ofcharacters or tokens) and pattern-matching over streams, which facilitate the writing of recursive-descent parsers for LL(1) languages. An example using ocamllex and ocamlyacc is given in chapter 12 of the reference manual. Here, we will use stream parsers. The syntactic support for stream parsers is provided by the Camlp4 preprocessor, which can be loaded into the interactive toplevel via the #load directive below.
# #load “camlp4o.cma”;; Camlp4 Parsing version 3.08.3+4 (2005-06-21) # open Genlex;; # let lexer = make_lexer [“(“; “)”; “+”; “-“; “*”; “/”];; val lexer : char Stream.t -> Genlex.token Stream.t = <fun> For the lexical analysis phase (transformation of the input text into a stream of tokens), we use a “generic” lexer provided in the standard library module Genlex. The make_lexer function takes a list of keywords and returns a lexing function that “tokenizes” an input stream of characters. Tokens are either identiﬁers, keywords, or literals (integer, ﬂoats, characters, strings). Whitespace and comments are skipped.