App accepts as input arbitrary C++ code that may contain embedded constructs for specifying algebraic data types and associated pattern matching operations. The constructs for specifying algebraic types are alpha type declarations, and algebraic types are said to be alpha types herein. The term 'alpha' is taken as a kind of app specific synonym for 'algebraic'.
When app encounters an alpha type declaration it is subjected to a translation process that generally produces corresponding C++ class declarations. App is no smarter than it needs to be however which for one thing means that it just assumes that whenever such a declaration is encountered, it appears in a context valid for C++ class declarations. If you specify such a declaration in some other context, say in the middle of an expression, app knows nothing of it and will happily generate class declarations in the middle of the expression anyway.
App generates class declarations largely formatted according to a number of skeletons, a kind of macro definition facility, and which can be user-defined, both at app build time and run time (--skeletons=f option), providing for a significant degree of customization in the generated code; see the file skeletons.txt in the skel directory, and also the source for additional information. For most purposes the default skeletons should suffice.
There are four kinds of alpha types, base types, unit types, data types, and type types, and correspondingly, four kinds of alpha type declarations. There are also forward type declarations for those types that may be defined mutually recursive.
Alpha base types form the base case for the alpha typing system as it were. Non-base alpha types are defined in terms of other alpha types. Alpha base types provide the means for introducing arbitrary C++ types into the system. The app input grammar portion for alpha base type declarations follows. For the complete grammar and related notation see "The App Input Grammar".
alpha-base-type-decl
-> '$base' 'explicit'? id alpha-type-head? base-products code? ';'
-> '$base' id alpha-type-head? ';'
base-products
-> '(' balanced-items-list? ')'
user
-> '{' balanced-item* '}'
user-code
-> '$user'? user
body-user-code
-> user-code
handle-user-code
-> user-code
code
-> body-user-code handle-user-code?
alpha-type-head
-> '<' alpha-type-head-param/',' '>'
alpha-type-head-param
-> 'class' id
App does not attempt to parse arbitrary C++ code. However when arbitrary C++ code can appear embedded within app specific constructs, app must still figure out which is which. App accomplishes this by assuming that such C++ code is "balanced" with respect to parenthesis, braces, and brackets. This same assumption also allows app to sort out comma separated lists of arbitrary C++ code. If said C++ code is syntactically incorrect with respect to balancing, app will notice and complain. (The various balanced-xxx grammar rules relate to this syntactic balancing act).
Here is an example of an alpha base type declaration that includes most of the optional syntactic constructs.
$base Foo<class A>(A x, int y) $user {} $user {};
This states that "Foo" is a parametric base type with two constituent base products, A x and int y. The presence of "$user {} $user {}" says two things; first, that the body class Foo__ is user defined, corresponding to the first "$user {}" construct, and second, that the handle class Foo is also user defined, corresponding to the second construct. Both classes are template classes (parametric alpha types correspond to template classes). Note that "user defined" really just means "not app defined".
This declaration does not have corresponding generated classes. Such a situation may arise when the skeletal code for base classes insufficiently describes the desired generation format. App still needs to know that Foo is an alpha type however and so in such situations this method can be utilized. Another potential use for this method is when an independent C++ type already exists and it is desired to make it appear as an alpha type. The standard C++ string class can be made to appear as an alpha type in this way for example.
$base string() $user {} $user {};
The second alpha-base-type-decl grammar rule allows shorthand notation for when base-products is "()" and code is "$user {} $user {}":
alpha-base-type-decl -> '$base' 'explicit'? id alpha-type-head? base-products code? ';' -> '$base' id alpha-type-head? ';'The string declaration above for example can also be stated as follows.
$base string;
Any named C++ type can be made to appear as an alpha base type in the same manner. As "$user {} $user {}" effectively states that the user ("not app") has full responsibility for the implementation, it is not necessary that such types adhere to any particular implementation idiom, including the handle/body idiom app employs (thus C++ native types can also appear as alpha base types). They may need to support stream output however (via operator <<), depending on usage particulars - alpha types in general support stream output. Whether a given C++ type should or needs to appear as an alpha type depends on if it is referenced from a context in which alpha typing is assumed, e.g. from another alpha type, or from a match statement (as part of a pattern).
Specifying "$user {...}" as a user-code construct means that the associated body (or handle) class is user defined and that whatever "..." is in the accompanying "{...}" is emitted as-is. When "$user" is specified usually there is no "..." as the definition is typically placed elsewhere. (This applies to all alpha types for which "$user" can be specified - not just base types).
The user-code construct allows for "{...}" instead of "$user {...}". This means that the corresponding body (or handle) class is generated, but also extended according to "...", taken as a replacement for the skeletal variable named $user_part$. (Skeletons generally have one or more variables $X$ that are replaced at generation time according to syntactic particulars).
Another way to specify a code construct is "$user {}" or "$user {} {...}" which says that, while the corresponding body class is user defined, the handle class is app generated, and possibly extended according to "...". The Atom alpha type in the beta.h library file is an example of a base type declaration with a "$user {} {...}" code construct.
class Atom__ : public Top__ {
char* data_;
public:
Atom__(const char* data) : data_(new_string(data)) {}
Atom__(const char* data, bool clone) :
data_(clone ? new_string(data) // safe but maybe less efficient
: const_cast<char*>(data)) // efficient but maybe less safe
{}
~Atom__() {delete[] data_;} // *assumes* safe ownership of new[]'d buffer
const char* data() {return data_;}
bool print(ostream& os) {write_string(os, data()); return true;}
};
$base Atom(const char* data) $user {} {
Atom(const char* data, bool clone) :
Top(new Atom__(data, clone), nocopy)
{}
};
Here there is just one base product "const char* data". If you look at the default skeletons, you will notice that there are no explicit provisions for handling anything other than "simple" base products regarding ctor member initializations, where a "simple" base product is something that involves a native type like "int x" or, in general, something that behaves like a native type. The dynamic allocation requirements for the data_ member above precludes use of the default skeleton in this case. Note also that the additional body ctor (with the 'clone' argument) requires a corresponding handle ctor.
The first alpha-base-type-decl grammar rule allows for optional specification of the 'explicit' keyword:
alpha-base-type-decl -> '$base' 'explicit'? id alpha-type-head? base-products code? ';' -> '$base' id alpha-type-head? ';'
The keyword has the effect of prefixing the generated handle class ctor that instantiates the associated body object with the C++ 'explicit' keyword. This can be useful for certain kinds of unary ctors. The keyword does not apply to the second grammar rule because in that case there are no generated classes.
The last component of each base product (a balanced-item-no-comma construct) is required to be an identifier X that implicitly names a corresponding member variable X_ (the preceding components presumably denote the type of X_). The default skeletons are defined such that all member variables corresponding to products are so named and private, and that access to such members is always by means of a public access function X(). For example given "const char* data" as a base product, the member variable is "data_", and "data()" is the access function. Access functions for base types are automatically generated unless "$user" is specified for the body class.
The Atom__ body class also defines a bool function 'print' that returns true after printing. In the alpha.h library file, the Top__ base class for the body hierarchy defines a non-virtual wrapper function 'show' that coordinates output by first calling a protected virtual function 'print' that by default returns false, and then (if print returns false) calling protected virtual function 'write' that by default just writes '?'; the write function is app generated for unit and data types but not base types. By defining 'print' accordingly in a base body class, you can arrange for that alpha type to generate output as desired (this holds for any alpha type - not just base types). The stream output operator:
ostream& operator << (ostream& os, Top& t)
is defined in alpha.h so as to apply to all alpha types. See also the comments in beta.h about formatting conventions.
The handle class generated for the Atom base type declaration follows.
class Atom : public Top {
public:
Atom() {}
Atom(const char* data) :
Top(new Atom__(data), nocopy)
{}
Atom(Top__* handle, Copy docopy = copyok) : Top(handle, docopy) {}
Atom__* operator->() const {
return static_cast<Atom__*>(handle);
}
Atom__& operator*() const {
return static_cast<Atom__&>(*handle);
}
static Atom nil() {return Atom(0, nocopy);}
Atom(const char* data, bool clone) :
Top(new Atom__(data, clone), nocopy)
{}
};
The alpha base type mechanism allows for the introduction of arbitrary C++ types either by means of suitable specifications for base products or by using the $user construct. However, as the last component of each base product is required to be an identifier corresponding to a member variable, in order to introduce certain C++ types in this way, such as arrays or function pointers, you need to do so indirectly, using typedefs or the like. Alternatively, you can define base types as you wish via $user, and if necessary, use accessor functions instead of pattern matching (pattern matching with base types is limited and involves certain assumptions).
A general nonparametric algebraic type T has the abstract form:
T = S1T + ... + SmT
where S1T ... SmT are m >= 1 sum components such that for 1 <= i <= m each
component SiT has the abstract form:
IiT (PiT1 * ... * PiTn)
where IiT is the SiT injector (or itor), and PiT1 ... PiTn are n >= 0 product components of SiT such that for 1 <= j <= n each product PiTj mentions an algebraic type, and where n is a particular function of i.
Parametric algebraic types introduce type parameters such that any of the PiTj may depend on said parameters in a manner analogous to the way members of a C++ template class may depend on template parameters. Indeed alpha parametric types are defined in terms of C++ template classes with template parameters serving as alpha type parameters in the obvious way.
App translates an alpha type T corresponding to a general nonparametric algebraic type T abstractly as follows.
class T : public Top {...};
class I1T__ : public Top__ {P1T1' ... P1Tn' ...};
...
class ImT__ : public Top__ {PmT1' ... PmTn' ...};
class I1T : public T {... I1T__* operator->() {return ...(handle);} ...};
...
class ImT : public T {... ImT__* operator->() {return ...(handle);} ...};
Here each PiTj' denotes a product translation, which in brief amounts to an appropriate C++ member variable declaration. All such declarations are private; there are corresponding public access functions automatically generated by default, but which may be user defined. The various public operator->()'s provide access to body classes IiT__ from handle classes IiT. For parametric types, template declarations and parameters are included in the translation in the obvious way, noting that neither Top nor Top__ are parameterized.
Thus in general the handle class hierarchy is three deep, with Top at the top level, classes T (also called root classes) at the middle level, and classes IiT at the bottom level; and the body class hierarchy is two deep, with Top__ at the top level, and classes IiT__ at the bottom level. The hierarchies are usage-connected by means of the various smart pointer operators, in effect forming a kind of bridge pattern.
Conspicuously absent in the translation is the equivalent of a T__ class. The essential reason for this is that such classes, while technically feasible, would be redundant if included. For dynamic purposes, the handle hierarchy primarily provides an interface that includes bridging services to the body hierarchy, which primarily provides implementations that manage object memory requirements and so forth; but an intermediate T__ class would have no real useful role in the hierarchy as the Top & Top__ base classes provide common services. For static purposes, the T classes essentially just enforce the algebraic typing constraints, which if also done by T__ classes would again be feasible but needless.
It should probably be mentioned however that there is no real reason to suppose that this is an absolute state of affairs; if it should turn out that such classes might be useful after all, then it would not be all that difficult to get app to accomodate.
The default skeletons define a set of functions that generated handle classes have (excepting root classes) including operator->() and operator*(), thereby allowing access to body functions from handle objects via the usual smart pointer operations, e.g.
Atom x = "hello world"; cout << "The length of " << x << " is " << strlen(x->data()) << endl;
A static function nil() is also provided that returns a null alpha object of the appropriate type - a null alpha object is a handle class object with a null handle (i.e. one that references no body object). This is equivalent to handle class default construction for classes with non-nullary ctors (those having one or more arguments such as Atom), but the function is convenient for classes that do have nullary ctors (having no arguments) because in such cases the default handle class ctor instantiates a new body class object.
Generated handle classes have default ctors according to the default skeletons, but only those with non-nullary ctors generate null objects as a result of default construction. For example the handle class generated for the base type declaration
$base Empty();has a nullary ctor, and default construction does not generate a null object:
class Empty : public Top {
public:
Empty() : Top(new Empty__(), nocopy) {}
Empty(Top__* handle, Copy docopy = copyok) : Top(handle, docopy) {}
Empty__* operator->() const {
return static_cast<Empty__*>(handle);
}
Empty__& operator*() const {
return static_cast<Empty__&>(*handle);
}
static Empty nil() {return Empty(0, nocopy);}
};
Alpha data types are the app equivalent of general algebraic types. The grammar portion for alpha data type declarations follows.
alpha-data-type-decl
-> '$data' id alpha-type-head? '=' sums root-code? ';'
sums
-> sum/'|'
sum
-> 'explicit'? id products code?
-> '0'
products
-> '(' product-list? ')'
product-list
-> product/','
product
-> product-decl accessor
product-decl
-> 'const'? alpha-type-spec '&'? id
accessor
-> '=>' '(' ')'
-> '=>' 'const' '(' ')'
-> '=>' 'const'
->
root-code
-> ':' handle-user-code
alpha-type-spec
-> id alpha-type-params?
alpha-type-params
-> '<' alpha-type-param/',' '>'
alpha-type-param
-> id alpha-type-params?
Here is an example of an alpha data type declaration together with a couple of function definitions taken (incompletely) from the beta.h library code.
$data List<class A> = 0 | Cons(A a, List<A> b) {
A a(A new_a) {return a_ = new_a;}
List<A> b(List<A> new_b) {return b_ = new_b;}
bool print(ostream& os);
} : {
A head() const;
List<A> tail() const;
int length() const;
};
template <class A>
bool Cons__<A>::print(ostream& os) {
show_begin(os, form_using_rtti() ? typeid(*this).name() :
form_using_zero() ? 0 : "Cons", "[");
os << a();
for (List<A> xs = b(); xs(); xs = xs.tail()) {
show_delim(os);
os << xs.head();
}
show_end(os, "]");
return true;
}
template <class A>
int List<A>::length() const {
int result = 0;
for (List<A> xs = *this;;) {
$match (xs) {
(Cons<A>(_, xs1)) => {
result++;
xs = xs1;
}
(0) => {break;}
}
}
return result;
}
Any alpha object can be null as for example when default constructed or when resulting from nil() function application. The specification of '0' as a sum component in an alpha data type declaration conveys the notion that zero (null) is a nominal or expected value for at least some objects of a given alpha type.
Looking at this from an implementation point of view, the handle/body idiom does not necessarily remove pointers, it just abstracts them. As pointers in general can be null so too can alpha objects, and so must be accounted for. As at least some pointers can be null as nominal values, the grammar provides for the explication of such values as sum components, and that is why 0 is a sum component for List<A> for example.
There are also general means for null testing: Top__* Top::operator (), as illustrated by the expression "xs()" in the for-loop of the print function above, bool Top::null(), and bool Top::not_null().
Some alpha types are naturally associated with singleton sets, in that they may have one value that can associate with the empty set, and other values that can associate with a non-empty singleton set of such types. In such cases it may be tempting to use null to represent associated empty set values, and the value itself to represent the associated singleton set element. Using null in this way can be error-prone however and is also insufficient if null itself can be an element, as there is then no way to distinguish between the empty set and a non-empty set with a single null element.
One way to deal with such cases is to use an alpha data type with a '0' sum, although that can be inconvenient if done just for that purpose. The beta.h library code provides a parametric type Single that may be generally used for such purposes; it models singleton sets using a '0' sum for the empty case.
$data Single<class A> = 0 | Single_Element(A a) : {
bool empty() const {return null();} // true if zero constructed
bool not_empty() const {return not_null();} // false if zero constructed
A element() const; // a if Single_Element(A a) constructed
};
Note that Single<Top> is valid, and Single_Element<Top>(Top()) is a singleton set with a null element resulting from application of Top(). This in effect provides an alternate way to represent boolean values, with a non-empty set akin to "true", and the empty set akin to "false".
Rather than using a '0' sum, it would be possible to define List<A> in the example above more traditionally.
$data List<class A> = Nil() | Cons(A a, List<A> b) ...
But then Nil and Nil__ would be classes in the translation and Nil() would be the handle class (nullary) ctor which according to the default skeletons would apply operator new when applied to yield a new body object. As body objects according to the provided runtime library require at a minimum space for a virtual function table pointer and reference counter, the net effect would be that applications of Nil<A>() for any type A would result not in some null object corresponding to the usual notion of "nil" but a real space consuming body object.
The Cons sum component for the List example has its own body-user-code in which two "setters" are defined and a custom print function. The setters provide update capabilities as by default the corresponding member variables in the body class Cons__ are not const qualified. There is a way to specify const qualification, by specifying a 'const' accessor construct. If it was desired for example that List<A> should be immutable rather than mutable, the declaration could be as follows.
$data List<class A> = 0 | Cons(A a => const, List<A> b => const) ...
Note that interface immutability can also be achieved simply by not providing interface update functions; using the const accessor construct serves to emphasize both interface and implementation immutability. If in addition custom access was desired for List<A>, the declaration could be as follows.
$data List<class A> = 0 | Cons(A a => const (), List<A> b => const ()) {
const A a() {return a_;} // user defined
const List<A> b() {return b_;} // user defined
...
} : {
...
};
The accessor construct provides the means for specifying various member variable access mechanisms.
accessor
-> '=>' '(' ')' -- mutable user defined access function
-> '=>' 'const' '(' ')' -- immutable user defined access function
-> '=>' 'const' -- immutable generated access function
-> -- mutable generated access function (default)
The product-decl construct provides the means for specifying various ctor parameter passing mechanisms.
product-decl -> 'const'? alpha-type-spec '&'? id
A product-decl translates directly to a ctor parameter declaration. For example a sum "Foo(const Bar& baz)" with product-decl "const Bar& baz" has a translated ctor that looks like "Foo(const Bar& baz) ...".
The product-decl construct does not apply to corresponding member variables. In particular there is no provision for specifying reference member variables either via the product-decl or accessor constructs. Alpha data types are in general instances of compound concrete data types - they contain other such types. Containment in this manner is an ownership rather than a referencing relation.
The sum grammar rule allows for optional specification of the 'explicit' keyword:
sum -> 'explicit'? id products code? -> '0'
The keyword has the same effect for generated itor handle class ctors that instantiate body objects as it has for generated base type ctors.
The root-code construct provides the means for extending (or defining) root classs. There is however a restriction as to what functions can be directly defined. The order in which translation classes are generated implies that there can be no direct references to itors from root classes, so functions are restricted accordingly. That is why List<A>::length() for example cannot be directly defined in the root class as part of the root-code construct (it is declared there but defined later).
The alpha-type-spec construct directly corresponds to the analogous C++ means for specifying references to named types that may have template arguments, noting that referenced types must be alpha types.
Alpha unit types are a special case of alpha data types, in that they have just one sum component, and are primarily devices of convenience because the name of the itor is the name of the type, eliminating the need to invent a new name for certain common cases. The grammar portion for alpha unit type declarations follows.
alpha-unit-type-decl -> '$unit' 'explicit'? id alpha-type-head? products unit-code? ';' unit-code -> body-user-code? root-code?
The grammar rule allows for optional specification of the 'explicit' keyword, and has the same effect for generated handle class ctors that instantiate body objects as it has for generated base type ctors.
The beta.h Pair type provides a typical example.
$unit Pair<class A, class B>(A a, B b);
There is a special consideration regarding mutually recursive alpha typing contexts, further elaborated below.
Alpha type types are more or less the app equivalent of typedefs. They have essentially the same renaming role as typedefs but renamed types are also alpha types. The grammar portion for alpha type type declarations follows.
alpha-type-type-decl -> '$type' id alpha-type-head? '=' alpha-type-spec ';'
An example without the alpha-type-head construct follows.
$type Atoms = List<Atom>;
This translates directly to a typedef.
typedef List<Atom> Atoms;
The alpha-type-head construct is provided even though the translation is at the present time not valid C++ (it is there "just in case").
$type Xs<class X> = List<X>;
The translation follows.
template <class X> typedef List<X> Xs;
It is also possible to get an approximation of template typedefs using the --skeletons=f option when f contains something like the following.
$+ type_synonym
$temp_head$
struct $id$ : public $type_spec$ {
$id$() : $type_spec$() {}
$id$($type_spec$ arg) : $type_spec$(arg(), copyok) {}
};
$-
Given such a skeleton, the example above translates as follows.
template <class X>
struct Xs : public List<X> {
Xs() : List<X>() {}
Xs(List<X> arg) : List<X>(arg(), copyok) {}
};
Note that the --skeletons=f option is available via the $option syntactic construct, so it is possible to selectively generate appropriate translations through judicious use of the construct.
There is a little more to alpha type types than this apparent redundancy. In particular such types can be defined mutually recursive.
Alpha forward types provide for mutually recursive alpha types. The grammar portion for alpha forward type declarations follows.
alpha-forward-type-decl -> '$data' id alpha-type-head? '$forward'? root-code? ';' -> '$type' id alpha-type-head? '$forward'? ';'
Consider again the declaration for List<A>.
$data List<class A> = 0 | Cons(A a, List<A> b) ...
The List alpha type is self recursive, but not mutually recursive. Self recursive types are about as natural as non-recursive types insofar as alpha typing goes (they require no special considerations beyond those already in place). However mutually recursive types are beasts of a different sort.
Suppose it is desired to represent the parse of an alpha-type-params syntactic construct as an abstract syntax tree (ast), and further that the ast structure is to be defined as an alpha type. Here's the grammar for the construct again.
alpha-type-params -> '<' alpha-type-param/',' '>' alpha-type-param -> id alpha-type-params?
A first attempt might look something like this.
$type Alpha_Type_Params $forward; $unit Alpha_Type_Param(Leaf id, Alpha_Type_Params atps); $type Alpha_Type_Params = List<Alpha_Type_Param>;
This won't work because as it happens unit types cannot be declared within the scope of a mutually recursive alpha typing context - i.e. the contextual scope defined by an initial forward declaration and its corresponding resolving declaration. The reason for this is that the translation for unit types takes advantage of the reduced space of app generated names.
Instead of generating a root class, an itor handle class, and corresponding body class, app generates only a body class and itor handle class that also serves as the root class. But this means that there can be no self recursive references (i.e. from the body class to the root class), as such references assume predefinition of the root class. For the sake of apparent consistency, app precludes the declaration of unit types within the scope of mutually recursive typing contexts. (It is an anomaly that the same restriction does not apply to base types - but then base types are anomalous by nature).
The remedy for this situation is to use an alpha data type rather than a unit type, because the generation order of translation classes for data types does allow for such recursive declarations. This requires an artificial name for the singleton sum component. A rule-of-thumb is to derive it from the type name, using 'a_' or 'an_' as a name prefix.
$type Alpha_Type_Params $forward;
$data Alpha_Type_Param =
an_Alpha_Type_Param(Leaf id, Alpha_Type_Params atps);
$type Alpha_Type_Params = List<Alpha_Type_Param>;
When an alpha data type or type type T1 is declared forward, and then another such type T2 is also declared forward but before the resolving declaration for T1, then T2 must be resolved prior to T1. In other words,
$data T1 $forward; ... $type T2 $forward; ... $type T2 = ...; ... $data T1 = ...;
is okay, but
$data T1 $forward; ... $type T2 $forward; ... $data T1 = ...; ... $type T2 = ...;
is not. The reason for this constraint is primarily aesthetic, and secondarily algorithmic.
App employs an algorithm for treating mutually recursive types that involves a "forward stack" that is pushed on forward declarations and popped on resolving declarations. It is best not to introduce any other (non-declaring) code that depends on alpha types still in the forward stack until after the stack empties as certain corresponding translation classes are not generated until after the stack becomes empty.
For mutually recursive alpha data types, the corresponding forward declaration allows a root-code construct that provides the means for extending (or defining) root classes. The same restrictions regarding function definitions described earlier for root classes applies here. The translation for forward data types includes root class generation for forward declarations but not for resolving declarations; app warns you if any resolving declaration has an associated root-code.
It should also be mentioned that there are no additional constraints or special considerations regarding parametric alpha types; they're first class with respect to recursion.
You may wish to examine the ast.h app source file as it contains the complete set of alpha type declarations for asts corresponding to the app input grammar.
Int is a base type for "boxed" native ints provided in beta.h:
$base Int(int data) {
bool print(ostream& os) {os << data(); return true;}
};
Notice the use of the constant '0' in the following expression.
Cons<Int>(0, 0)
The constant is being used in two ways here. First, for alpha base type Int app generates a corresponding handle class with a ctor argument of type int, and as the ctor is not explicit, 0 implicitly converts to Int(0). Second, for alpha type List<A> app generates a corresponding itor class Cons such that for any type A, the type of the second ctor argument is List<A>. For any alpha data type T, or base or unit type T without a nullary ctor, the default constructed object T() is defined to be the null object a.k.a. 0 (if T is a base or unit type that does have a nullary ctor, T() is not the null object). Hence the above expression example is equivalent to the following.
Cons<Int>(Int(0), List<Int>())
The default constructed object T() can be used instead of 0 in those cases when the use of 0 may either cause problems at compile time, or generate a possibly unexpected result at run time (provided that T() yields a null object). The latter case may arise for example when 0 is passed as an argument to a unary (single argument) ctor K, expecting 0 to denote a null argument of some alpha type A. But K(0) is equivalent to K(), making the K object itself null. Instead use K(A(0)), or equivalently (if A() yields the null object), K(A()). The static nil() function can be used instead of the default ctor for handle classes with nullary ctors.
Generated handle classes for alpha types get a ctor with arguments of the following form (according to the default skeletons).
(Top__* handle, Copy docopy = ...)
The docopy argument is defaulted so the ctor can be unary and serve as an implicit conversion operator in certain cases, including those cases when 0 denotes the null object. This particular ctor form also has special uses relating to generated casting code, and to externally generated handles from e.g. yacc code.
The function Top::zero() provides another way to generate null alpha objects:
static Top__* Top::zero() {return 0;}
Thus the expression example above is also equivalent to the following.
Cons<Int>(Int(0), Top::zero())
Unary ctors can also cause potential problems given certain argument types. Consider again alpha type types. They are really just synonyms, so the following doesn't work (try it).
$type T1 $forward; $type T2 = T1; $type T1 = T2;
To break the circularity, one can introduce an intervening non-synonymous type, e.g.
$type T1 $forward; $data T2 = U(T1 t1) | V(int x); $type T1 = T2;
One might think then that U(U(U(V(42)))) ought to be well-defined (it is) and produce something like "U:{U:{U:{V:{42}}}}" as output. It doesn't - instead it produces "U:{V:{42}}".
The problem involves overload resolution of unary ctors like U. The expression U(V(42)), while of type T2 (or T1), is treated by the compiler as "const Top&" instead of T2 (T1) when used as an argument for the U ctor. The Top copy ctor is favored for overload resolution, so U(U(...(U(V(42)))...)) winds up being equivalent to U(V(42)). (Or at least that's what the compiler used seemed to indicate). The following does produce the expected result.
T2 a = U(V(42));
T2 b = U(a);
T2 c = U(b);
cout << c << endl; // "U:{U:{U:{V:{42}}}}"
The problem does not occur with non-unary ctors, because only unary ctors can serve as implicit conversion operators, and it does not occur when the type of a unary ctor argument is not identical to (or synonymous with) the associated class (accounting also for parameterization), because then copy construction cannot apply.
The problem can also be avoided by defining a constructor function as a static member of a problematic itor, e.g.
$type T1 $forward;
$data T2 = U(T1 t1) {} {static T2 u(T1 t1) {return U(t1);}} | V(int x);
$type T1 = T2;
...
cout << U::u(U::u(U::u(V(42)))) << endl; // expected result
The same method may also be useful for avoiding some of the problems related to implicit conversion mentioned above.
To summarize, unary ctors can serve as implicit conversion operators, and
copy ctors can serve as implicit (and possibly unexpected) identity operators,
and so as usual one should be a little careful with them. It is possible to
avoid identity operator problems by using static constructor functions for
problematic itors, and which may also be useful for some of the problems
related to implicit conversion (the 'explicit' keyword may also be useful).