If you take a college course on Programming Language Design or Compiler Construction, you will be taught that a programming language is defined in three parts. The syntax of the language is described using a formal (context-free) grammar, whose terminal symbols are called tokens or lexemes. The tokens typically have their own grammar over the input character set which is also either a regular language or a (very limited) context-free language. And the semantics or meaning of the language is defined in a quasi-legalistic prose description of the effect of each syntactic construct, often augmented with some mathematical or formal notation.

Ruby does not follow this model. The language grammar and the token grammar are mixed together, and the syntax is not a context-free grammar: Token analysis depends on syntactic context, and some syntactic context depends on semantic information, namely the definition of local variables. It is not possible to separate the lexical grammar from the language grammar, nor is it possible to describe the language grammar separate from semantics.

So… I’ve been trying to select a notation for specifying Ruby’s syntax. I want a notation that is human-friendly, so that a casual Ruby user can look at a bit of syntax and see how to use it. I don’t want to have LPAREN or ASCII(40) as terminal symbols. But it would be very nice if (someday) we had a machine-processable version of the Ruby syntax, so we could run it against sample programs for validation. Since Ruby, like C/C++, makes such extensive use of operators and non-alphanumeric characters, it seems impossible to have a grammar notation that doesn’t confusingly overlap what Ruby itself.

Some authors [Wirth] have simply quoted literal symbols and characters – but then we find ourselves with escape sequences for our quote character(s), and how will we handle non-printing characters, which are an essential part of Ruby syntax? We can’t do the traditional computer science trick of prefacing the syntax section with a paragraph explaining whitespace and how it separates tokens – because Ruby syntax doesn’t work that way! If we want to use the traditional parentheses, vertical bars, equal signs, etc. as syntactic metanotation, then we need a way to refer to these symbols as literal (terminal) elements of Ruby.

I’m leaning toward a notation that integrates highlighting, color, font/face, subscripting and superscripting to convey part of the information. It’s harder to typeset, but we avoid conflicts with Ruby itself.

We define the syntax of Ruby by formally describing the sequences of characters that comprise a (syntactically) legal Ruby program. We do this by using productions of the form S = T

which means: An S can be any of the sequence(s) of characters generated by T.

T can be:

A single character or character sequence <=> in a distinctive color or font, which generates that specific character or contiguous character sequence.

The symbol · which is an abbreviation for OptionalWhitespace.

An identifier N (capitalized), which means: The sequence of characters generated by N.

A sequence, in the form S1 S2 … which means: A sequence of characters generated by S1, followed immediately by a sequence of characters generated by S2, followed by … etc.

A grouping (S) which means: A sequence of characters generated by S.

An alternative, in the form S1 | S2 … which means: A sequence of characters generated by S1, or a sequence of characters generated by S2, or … etc.

An option, in the form [ S ] which means: Optionally, a sequence of characters generated by S.

A repetition, in the form Sn which means (S Sn-1) for n > 0. S0 means [ S1 ]. S* is a synonym for S0.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: