Algebraic types are basically useless without some kind of corresponding pattern matching facility. The alternative would be explicit selection and projection functions that would in effect offer no more functionality for C++ than what is already provided by RTTI & dynamic casting. (This is not to say that app/applib is not useful without pattern matching, as app/applib support for the handle body idiom and reference counting can be considered useful.)
Match statements are the app way of doing pattern matching. When app encounters a match statement it is subjected to a translation process that produces corresponding C++ conditional statements. App just assumes that whenever such a statement is encountered, it appears in a context valid for C++ statements. If you specify such a statement in some other context, say in the middle of a declaration, app knows nothing of it and will happily generate conditional statements in the middle of the declaration anyway.
The app input grammar portion for match statements follows.
match-stmt
-> '$match' subjects pattern-cases
subjects
-> '(' balanced-items-list ')'
pattern-cases
-> '{' pattern-case* '}'
pattern-case
-> patterns guard? '=>' '{' balanced-item* '}'
patterns
-> '(' pattern-list ')'
pattern-list
-> pattern/','
pattern
-> positional-pattern
positional-pattern
-> inductive-pattern
-> binding-pattern
-> zero-pattern
inductive-pattern
-> pattern-decl subpatterns
binding-pattern
-> id
zero-pattern
-> '0'
pattern-decl
-> pattern-spec id?
pattern-spec
-> id alpha-type-params?
subpatterns
-> '(' subpattern-list? ')'
subpattern-list
-> subpattern/','
subpattern
-> positional-pattern
-> nonpositional-pattern
nonpositional-pattern
-> explicit-pattern
-> implicit-pattern
-> ellipsis-pattern
explicit-pattern
-> id ':' positional-pattern
implicit-pattern
-> id ':'
ellipsis-pattern
-> '...'
guard
-> '|' '(' balanced-item+ ')'
Here is a simple example taken from the beta.h library code.
template <class A>
int List<A>::length() const {
int result = 0;
for (List<A> xs = *this;;) {
$match (xs) {
(Cons<A>(_, xs1)) => {
result++;
xs = xs1;
}
(0) => {break;}
}
}
return result;
}
The corresponding "pretty" translation follows.
template <class A>
int List<A>::length() const {
int result = 0;
for (List<A> xs = *this;;) {
//$$ begin $match
{ bool _alpha_fail = true;
List<A> _alpha_norm_1 = xs;
if (dynamic_cast<Cons__<A> *>(_alpha_norm_1())) {
Cons<A> _alpha_cast_1(_alpha_norm_1());
List<A> xs1 = _alpha_cast_1->b();
_alpha_fail = false;
//$$ begin user statement code
result++;
xs = xs1;
//$$ end user statement code
}
if (_alpha_fail && _alpha_norm_1() == 0) {
_alpha_fail = false;
//$$ begin user statement code
break;
//$$ end user statement code
}
if (_alpha_fail) escape("match failed");
}
//$$ end $match
}
return result;
}
Each subject of a match statement is taken to be an arbitrary C++ expression. App generally normalizes subject expressions to identifier form - even if they are already in identifier form, by generating appropriate declarations as prolog code. One reason for doing this is that subjects generally appear more than once in the translation code, and so otherwise the same subject expression might be evaluated more than once (if not already in identifier form).
Another reason is for typing purposes. App attempts to infer subject types from pattern cases; it warns you if either no such type can be inferred or if an inferred type is alpha-type inconsistent, in which cases 'Top' becomes the type (which is technically correct but also a generalization). There are additional cases when 'Top' may (silently) become the type because there is no other choice - i.e. when '0' patterns occur in relative isolation, usually as a result of translating certain kinds of nested patterns ('0' patterns match when the associated subject value is null).
The type inferencing algorithm app uses is pretty simple. For the example above the pattern case:
(Cons<A>(_, xs1)) => {...}
directly implies a subject type List<A>. The general idea is that the sum components of an algebraic type directly imply the type. As the patterns of a given pattern case directly refer to sum components (e.g. Cons<A>) it is a simple matter to determine the corresponding type (e.g. List<A>). Type parameters appearing in a pattern are taken as instances of corresponding declaration parameters. For example a pattern case like:
(Cons<int>(_, xs1)) => {...}
implies type List<int> because "int" is taken as an instance of "A" in the declaration:
$data List<class A> = 0 | Cons(A a, List<A> b) {...};
It may be the case that an inferred type is inconsistent with contextual type. For example a subject expression of contextual type int but with pattern cases implying type List<int> results in an inconsistent type that app cannot detect (it knows nothing of context), and in such cases app lets the compiler complain about it (having no other choice).
Normalizing subject expressions may result in some code redundancy (which however an optimizing compiler ought to be able to handle), but it does allow pattern matching to be first class with respect to static typing.
Match statement pattern cases are translated in appearance order. Match statements translate to an ordered sequence of conditional 'if' statements corresponding to the pattern cases and wrapped as a single compound statement. If it is possible for a match statement to fail, epilog code is generated in the form of a final 'if' statement that checks for failure and which calls the alpha core function 'escape' in that case (see also alpha.h).
A patterns construct is an ordered sequence of pattern constructs corresponding to an ordered sequence of subjects. Such patterns are also called top-level patterns, particularly when it is useful to distinguish between them and nested i.e. subpattern constructs.
There are three general classes of patterns, inductive, binding, and zero. Inductive patterns refer to an alpha base, unit, or data type (via the id part of a pattern-spec construct), and may have subpatterns that inductively refer to such types (except for those involving base type references). Inductive patterns match corresponding subjects if the subjects are compatibly constructed and the subpatterns match. Binding patterns always match corresponding subjects, and zero patterns match null valued subjects.
Patterns can further be classified as positional or nonpositional. Positional patterns correspond with subjects according to sequence order. Nonpositional patterns are defined in terms of positional patterns, and are further discussed below; patterns are understood to be positional unless otherwise indicated. Top-level patterns can only be positional.
The general translation scheme for inductive patterns involves the generation of selection code as conjunctions of dynamic_cast expressions over polymorphic body class objects via operator () applied to subjects, casting code for body to handle class hierarchy mapping and cross/downcasting, and projection code to access member variables via operator ->. Zero patterns involve generated null-testing selection code, and binding patterns involve generated casting and projection code.
Patterns containing subpatterns are inductive patterns, and are inductively translated by synthesizing new subjects with values associated with non-binding subpatterns, taking the subpatterns as new (top-level) patterns, the whole being considered as an implicit nested match statement. When all such nesting is accounted for, and if there is a guard portion, taken as an arbitrary C++ expression, the guard becomes the conditional part of a final innermost 'if' statement. The balanced-item* construct associated with the pattern-case construct, taken as an arbitrary C++ statement or statement sequence, is then emitted as the then-part of the final 'if' statement as statement code. If there is no guard, the statement code is emitted directly.
The inductive-pattern construct has the same basic syntax used for applying a constructor. The syntax "Cons<int>" for example is used both for constructing an object of (alpha) type List<int>, as in the application "Cons<int>(x, y)", and for specifying a pattern intended to match such objects as subjects, as in the pattern "Cons<int>(x, y)". The above application results in an anonymous object, but in declarative contexts e.g. "Cons<int> xs(x, y)" the object is named (xs). The id part of the pattern-decl construct provides for analogous naming - it stands for the value of the corresponding subject.
The id part of a pattern-spec construct presumably mentions either a base type, unit type, or itor sum component of a data type; app complains otherwise. In particular app complains if a type type is mentioned. While it would be technically feasible for app to transitively dereference type types, thereby allowing type types to be so mentioned, app refuses to do so on the grounds that type types used in that manner could too easily undermine the underlying intent. Consider for example the following.
$type Point = Pair<int, int>; // a point on a plane defined by coordinates
$type Edge = Pair<int, int>; // an edge of a graph defined by vertices
...
$match (foo) {
(Point(_,_)) => {...}
(Edge(_,_)) => {...}
}
If alpha type types could be so mentioned, subject foo would be (alpha) type consistent, yet nevertheless there is clearly something wrong with the intent here. Alpha type types are renaming devices, primarily of convenience. If it is truly necessary to refer to the same underlying thing, as patterns, and in different ways, use unit or data types instead.
The utility of alpha base types for pattern matching purposes is limited. Any and all subpatterns of pattern constructs in which they are mentioned as a pattern-spec id must be binding patterns; app complains otherwise. This is because elements of a base-products construct can involve arbitrary C++ types and app has no idea how to match subjects of such types as anything other than binding patterns.
Base types introduced in order to make arbitrary C++ types appear as alpha types (via the "$user {} $user{}" construct) should probably not be referenced as a pattern-spec id at all, due to the various assumptions that generated selection, casting, and projection code requires. In short, only binding patterns should be used for matching such types. Examples:
// this is okay:
$base string;
string s = "okay";
$match(Cons<string>(s, 0)) {
(Cons<string>(t, _)) => {cout << t << endl;}
(_) => {}
}
// this is not - app accepts it but generates uncompilable code:
$base float(float x) $user {} $user{};
float y = -42;
$match (y) {
(float(z)) => {cout << z << endl;}
}
Binding patterns correspond to generated variables with the same id name, and when appearing as subpatterns, are bound to corresponding subject components in the translated code by means of generated casting and projection code that appears immediately after the generated selection code in the constraining conditional expression. Top-level binding patterns are bound directly.
Binding patterns with ids spelled '_' are taken as wildcard patterns - they match anything and in themselves result in no generated code.
Binding patterns must be unique for each pattern case (excepting wildcards); app complains otherwise. In a pattern like "Foo(x, x)" for example the binding pattern "x" is not unique. Supposing that binding patterns need not be unique, the implied effect of nonunique binding patterns would be application of operator ==. One way to get that effect with unique patterns is to use a guard involving operator == as in "(Foo(x1, x2)) | (x1 == x2)".
Zero patterns are intended for use with corresponding subjects that may be nominally null and alpha typed such that a '0' sum component appears in the associated alpha data type declaration; app warns you otherwise. The bool functions Top::null() and Top::not_null(), and Top__* Top::operator () may be utilized to test for null cases in general.
$data List<class A> = ...
and:
$match (ys) {
(Cons<int>(x, xs1)) => {...}
}
the generated code involves the substitution of int for A.
Parametric alpha base types cannot participate in such substitutions, as app doesn't know what to do about it, and complains if given a match statement that suggests otherwise. Inductive patterns therefore cannot involve parametric base types; you can always use accessor functions instead.
A pattern of the form "x:y" is a nonpositional (explicit) pattern, where x is an alpha unit or data type product-decl id and y is a positional pattern that may contain nonpositional subpatterns. The id "x" must be associated with the alpha unit or data type to which the inductive pattern that contains the given subpatterns refers. Only subpatterns can be nonpositional.
A nonpositional (implicit) pattern of the form "x:" is taken as shorthand for the nonpositional explicit pattern "x:x". A nonpositional (ellipsis) pattern of the form "..." is taken as "any other pattern not explicitly mentioned is '_' i.e. a wildcard".
If any subpattern appearing in a subpattern-list is nonpositional, all such subpatterns must be nonpositional. An ellipsis pattern should appear last in the list if the number of subpatterns is less than the number implied by the associated product-decl; app warns you otherwise.
App internally rewrites nonpositional patterns as equivalent positional patterns. Here's an example.
$data A = B(int c, Atom d, A e) | F();
...
A a = B(0, "ok", B(0, "ok", F()));
...
$match (a) {
(B(e:B(d:g, c:, ...), ...)) => {...c...g...} // internally rewritten as:
(B(_, _, B(c, g, _))) => {...c...g...}
...
}
By default app generates runtime type selection (stor) code that involves dynamic_cast. Considering again the List<A>::length() function illustrated above, the match statement:
$match (xs) {
(Cons<A>(_, xs1)) => {
...
}
(0) => {break;}
}
has generated code that involves dynamic_cast:
//$$ begin $match
{ bool _alpha_fail = true;
List<A> _alpha_norm_1 = xs;
if (dynamic_cast<Cons__<A> *>(_alpha_norm_1())) {
...
}
if (_alpha_fail) escape("match failed");
}
//$$ end $match
This behavior can be changed either globally, via a command line option, or locally on a match statement basis, via the alpha-nested-item option construct. The option in either case is '--stor_method=m' (or '-sm=m') where, if m=d the generated code involves dynamic_cast as before, and if m=r the code involves an internal alpha core function "raw_type_is" that essentially just does string comparisons between "raw" types - i.e. handle class names without parametric elaboration (see also alpha.h). When used locally the option construct applies to subsequent match statements as textually encountered by app, the option being effective until the next such construct is encountered; e.g.
$option "-sm=r"
$match (...) { // involves raw_type_is
(...) => {
...
$option "-sm=d"
$match (...) {...} // involves dynamic_cast
...
}
}
$option "-sm=r"
$match (...) {...} // involves raw_type_is
For the List<A>::length() example the generated code with '-sm=r' looks like:
//$$ begin $match
{ bool _alpha_fail = true;
List<A> _alpha_norm_1 = xs;
if (raw_type_is("Cons",_alpha_norm_1())) {
...
}
if (_alpha_fail) escape("match failed");
}
//$$ end $match
This code is generally faster than dynamic_cast code. Use of the raw_type_is function requires that associated app generated body classes define function "raw_type" that overrides virtual "Top__::raw_type" (alpha.h). These functions are automatically generated by app for unit and data types but not base types, and all just return a related string, e.g. "Cons" (pattern matching involving base types always involves generated dynamic_cast code, a non-issue when base type accessor methods are used instead of pattern matching).
This works in general because pattern matching is statically typed; for any given match statement involving a subject of alpha type T, all associated patterns corresponding to the same sum component of T have the same "raw" form, provided that T is consistent (app and the C++ compiler ensure this). In the example above only Cons<A> patterns can validly appear in the match statement as inductive patterns for any given A, so raw_type_is("Cons",...) suffices, and that is the case in general.
The use of the raw_type and raw_type_is functions is safe to the extent that static typing in C++ is safe. It is still possible to externally construct bad alpha typed instances e.g. via bad yacc code, but then there is a larger problem (in which cases option '-sm=d' might be useful).
The functions are not used in generated code by default; app generated applib code (e.g. _beta.h) as provided involves dynamic_cast; you will need to re-app the relevant files (e.g. beta.h) with option 'sm=r' if you want to use the applib code together with that option and option '-ori' (i.e. if you want to include the provided app generated applib headers with option 'sm=r' applied).
App employs a simple analysis algorithm to determine if all relevant sum components are mentioned in the pattern cases of a given match statement, and warns you if something is missing, unless the last pattern case patterns are all binding patterns (including wildcards '_'), as such pattern cases in effect constitute default cases.
Despite the simplicity of the algorithm, it is an important part of app, as it directly addresses one of the primary objections against the casing style itself - how do you deal with changes to the underlying case structure in a way that ensures the programmer is not left completely in the dark? The traditional casing style tends to induce severe maintenance related problems, especially for projects involving multiple programmers. Add a new case here, delete one there, and the next thing you know you have a lot of code not accounting for it, and no way to deal with it except by grepping and/or testing.
Cases in the present system correspond to sum components. If you delete or change the (itor) name of a sum component, app and/or the compiler will detect potential problems thereby introduced - case symbols are static entities that both app and the compiler track. If you add a new sum component, and fail to account for it in some match statement (without a default case that is), app lets you know about it.
In a sense default cases correspond to default virtual functions of a base class corresponding to the root class of some alpha data type, and non-default cases correspond to overrides of virtual functions defined in derived classes corresponding to the associated injectors. Traditional OO wisdom has it that defaulted virtual functions can be dangerous, as in effect they provide (possibly unexpected) behavior for cases not explicitly accounted for. The same wisdom applies here. If you have default cases, app has no idea whether that is what you intended or not; default cases effectively short-circuit the case coverage analysis algorithm.
Default cases however tend to be convenient, and also allow for more efficient generated code because without them app generates match statement epilog code that checks for match failure (omitted if a default case appears). To allow for the convenience and implied efficiency of default cases, while at the same time still providing for case coverage analysis, app lets you specify "safety cases". A safety case is any pattern case that follows another pattern case in which all patterns are binding patterns, i.e. any pattern case that follows a default case.
In effect safety cases prevent default cases from short-circuiting the analysis algorithm. Pattern cases following a default case are treated as irrelevant at run time and are excluded from the translation. App warns you if a safety case has associated guard or statement code. Here's an example.
$data A = B(int c, Atom d) | E(int f) | G(Atom h, int i);
...
A a = B(0, "ok");
...
$match (a) {
(B(x, y)) => {...}
(_) => {} // default case
(E(...)) => {} // safety case
(G(...)) => {} // safety case
}