|
Cat SpecificationCat has three different levels of specification: level-0, level-1, and level-2. This is so that tools can identify their degree of compliance with the specification. Each level of the specification specifies a number of primitive functions. These functions can be built-in, or can be defined in an implicitly linked standard library. Core Language (Level-0)The core Cat Language is known as Level-0 Cat. The core Cat language is a pure functional language. The core language consists of four types:
The core Cat primitive functions are:
Basic ConceptsNames (Identifiers)Cat names (a.k.a. identifiers) are similar to the Scheme language. They consist of a consecutive sequence of letters, numbers, and symbol characters ( ~ ` ! @ # $ % ^ & * - + = | : ; < > . ? / ). However, an implementation of Cat that is only level-0 compliant might restrict names to those that consist of lowercase alphanumeric english letters, the numbers 0..9, and the underscore (a total of 37 characters). This would be to reduce the burden on letter encoding where memory space is particularly precious. Cat is normally case-sensitive, but we can expect a level-0 compliant version to be case-insensitive. The following characters are also valid names: ( ) ,, as is the paranthesis pair enclosing nothing (). Note: ( ) is two names, whereas () is one. One restriction is that a name can not start with a number. Examples of valid names are: thisisaverylongname q n42 is-big? ++ <html> ( =^,,^= DefinitionsNew functions are defined in Cat using the define keyword. A function has the form define definition-name (: type-declaration)? metadata? { expression }Example of a simple function declaration: define succ { 1 + } Type SystemCat Type SystemCat has a strong static type system. Every expression in a valid Cat program has a single well-defined type that can be determined through static code analysis using a decidable algorithm. Cat also allows dynamic type-checking. The Cat type system consists of simple types (first-order types without quantifiers) enriched with type schemes (simple types with outer-most universal quantifiers), and row variables. Type-inference of Cat expressions is possible using a unification algorithm based on the Hindley-Milner type reconstruction algorithm, as extended by Damas to support polymorphism. A Cat type is either a simple built-in type, a function type, a type variable or a type vector variable. Function Type Declarations (Stack Effect Diagrams)Function type declarations (also called type annotations, type signatures, and stack-effect diagrams) are optional. All valid Cat functions and expressions can have the type inferred (i.e. reconstructed). A type-declaration is useful as a form of documentation or automated testing tool. If the declared type is not the same as the inferred type, then the Cat program is invalid. Known Issue Type checking is currently disabled in the beta implementation, but type-inference currently works. A type declaration indicates the consumption and production of a Cat function. The consumption indicates the configuration of types on the stack that a function requires on the stack to execute. The production is the configuration of types on the stack that a function promises to leave in place of the consumption after execution. The form of a pure function declarations is: : ( consumption -> production )If a function has side-effects the function declaration is written instead as: : ( consumption ~> production ) Type declarations consist of concrete types (e.g. int, bool, string, etc.), type variables which are identified by an apostrophe and a name starting with a lowercase letter (e.g. 'a, 'myvar) and type-vector variables which are are identified by an apostrophe and a name starting with an uppercase letter ('A, 'Myvar). Some examples of type-declarations: inc : (int -> int) eqz : (int -> bool) eq : ('a 'a -> bool) swap : ('a 'b -> 'b 'a) dup : ('a -> 'a 'a) pop : ('a -> ) if : ('A bool ('A -> 'B) ('A -> 'B) -> 'B) The consumption and production are expressed as a vector of types t0 t1 ... tn. Simple Built-In TypesA minimal Cat implementation is expected to provide the following primitive or built-in types: Level 0
Level 1
Level 2
Type VariablesA type variable corresponds to a single type on the stack or in a function signature. Type variables are identified by starting with a single back-quote character "'" followed by a lower-case letter, and zero or more letters of either case. Stack Variables: Type Vector VariablesA stack variable (or type vector variable) corresponds to zero or more types on the stack or in a function signature. Stack variables are identified by a single back-quote character "'" followed by an upper-case letter, and zero or more letters of either case. Implicit Stack Variables: Row VariablesA function type where either the left-most position of a consumption or production is not a stack variable, means that the type has an implicit stack variable (called either the row variable or "rho" variable) that represents the rest of the stack. Cat types use what could be termed a dishonest notation since every function has an implicit stack variable. For example the integer addition function is expressed as: For example the dup primitive function has the type written as ('a -> 'a 'a) but to be precise the type is: ('R 'a -> 'R 'a 'a). Where 'R is the row variable. Notice that the apply does not require a row variable: ('A ('A -> 'B) -> 'B) because the stack variables 'A and 'B refer to the entire stack. Universal QuantificationAll type variables and stack variables within a function type, are universally quantified at the outer-most level. In other words given a type signature such as: ('a ('a -> 'b) -> )an implicit forall operator occurs at the outer most level for all variables ('a, 'b and the row variables). So the full type is in actuality: forall R. forall S. forall a. forall b. ('R 'a ('S 'a -> 'S 'b) -> 'R) Implementing a Type Inference Engine for CatFor information on how to construct a type-inference engine for Cat see: Luca Cardelli's paper Basic Polymorphic Typechecking at http://research.microsoft.com/Users/luca/Papers/BasicTypechecking.pdf and Benjamin Pierce's book, Types and Programming Languages). Public domain source code of a type inference (i.e. type reconstruction) algorithm in C# for Cat can be found at http://code.google.com/p/cat-language/downloads/list. SemanticsTermA term in the Cat language is defined as follows: term ::== prim // primitive function def // defined function lit // literal [term] // quotation term term // juxtaposition empty // empty term Every term corresponds to a function on stacks that maps from an input stack to output stack. Cat terms can not refer to previous stack states, which allows an implementation to use a single shared stack. A copy of the current stack can be created through a closure or continuation. Every term pops zero or more values by the stack and pushes zero or more values from the stack. The types of the values popped from the stack is known as the consumption, and the types of the values pushed onto the stack is known as the production. Given a particular configuration of types on the input stack, the size and shape of the consumption and production of a function must always be the same. Primitive FunctionA primitive function is a predefined function. The set of primitive functions is defined by the level of compatibility with the Cat standard. Defined FunctionA defined function is a user-defined function. LiteralsLiterals are functions that consume no values and push a single value onto the stack. Examples of literals include numbers (e.g. 42), strings (e.g. "hello"), and characters (e.g. 'q'). QuotationsThe quotation of a term pushes an anonymous function consisting of those terms onto the stack. A quotation is a literal function object. JuxtapositionThe juxtaposition of terms (placing two terms side by side) has the result of composing the stack functions of the two terms. The juxtaposition of terms is sometimes called concatenation, hence the term concatenative language. QuotationsA quotation is an anonymous function pushed onto the stack without execution. There are three methods for constructing anonymous functions:
Quotations can be evaluated (or executed or dequoted depending on your terminology preference) using the apply function (among others). Functions that expect quotations as arguments are often called combinators. CombinatorsA combinator is a term that is used somewhat loosely to refer to any function that accepts a quotation as one of its arguments. The apply combinator evaluates a function on the top of the stack. The dip combinator executes a function on the top of the stack, but first removes the element below it, pushing the element on top upon completion. >> 1 2 3 [swap] dip stack: 2 1 3In effect dip provides an auxiliary storage mechanism that emulates a secondary stack. Other combinators, such as if and while are used to affect control flow. Conditional ExecutionConditional execution can be perfomed using the primitive function if. The if combinator consumes two functions and a boolean value. Depending on whether the boolean value is true or false it executes the second function or top function respectively. In other words it has the following type: if : ('A bool ('A -> 'B) ('A -> 'B) -> 'B) LoopsThere are several primitive combinators which behave as looping constructs: while, whilen, whilenz, whilene, for, for_each, rfor, repeat. RecursionRecursive DefinitionsCat allows recursive definitions. For example the factorial function can be expressed as: define rec_fac { dup 1 <= [pop 1] [dec rec_fac *] if } Binary RecursionCat provides a bin_rec primitive for performing binary recursive computations. Fixed Point Combinator (Y Combinator)Recursion can also be achieved using the y combinator. Side EffectsFunctions in Cat may have side-effects. Functions with side-effects have a different type than those without and are annotated using ~> instead of ->. Functions without side-effects are called pure functions, otherwise they are called impure. Impurity is polluting. Passing an impure function as an argument to a function will cause the type of that function to be impure. Examples of side-effects include:
A pure expression will always return the same result given the same arguments on the top of the stack. Well TypednessWell Typed ProgramsA Cat program is well-typed if and only if the program as a term is well-typed given an empty stack [and all type constraints are satisfied]. Well Typed TermsA term constructed by juxtaposing two sub-terms is well-typed if and only if ther exists a stack which when consumed by by the first term will produce a stack that can be consumed by the second term. A quotation is well-typed if and only the term enclosed by square brackets is well-typed. The empty term is well-typed. PrimitivesFor a full list of primitives, see primitives.html. MetaDataMetadata blocks are a standardized comment format for defined functions. Metadata blocks are delimited by "{{" and "}}". Metadata blocks cannot be nested. Metadata should not affect whether a Cat program be compiled or not, and can be reduced to whitespace by a parser. The metadata format allows Cat programmers and tools to standardize the layout of documentation, automated tests, and other similar kind of data. A metadata block has a labeled-node tree structure very similar to YAML. The start of a metadata block is indicated by two opening curly braces in succession (e.g. {{) which occur on a single line surrounded only by whitespace. The end of a metadata block is indicated by two closing curly braces in succession (e.g. }}) which also must occur on its own line. Node labels are followed by a colon character (e.g. "my-node:"). The node data immediately follows the colon chararacter and may occur on multiple lines. The node data is terminated when a new label is encountered. The indentation of node labels indicates the nesting structure. Example of a function declaration with metadata: define fact : (int -> int) {{ desc: A simple factorial function precondition: dup 0 gteq test: in: 5 fact out: 120 tags: demo,algorithms }} { dup eqz [pop 1] [dup dec fact mul_int] if } Metadata ConventionsMetadata is organized by convention. The convention used by the current standard library is to provide the following optional node-types:
SyntaxA Cat program consists of definitions, words, and quotations. A word is an identifier or literal which can either be defined in the program, or is a primitive definition. An identifier is a consecutive series of of letters, numbers, and symbols, that does not start with a number or a minus sign. The following are examples of valid identifiers: hello, _hello, h3ll0, +3+, =^,,^= A Cat literal can be a decimal integer (e.g. 234 or -12), binary integer (e.g. 0b100101110), hexadecimal integer (e.g. 0xD166E8), floating point number (e.g. 0.01), a character (e.g. 'q' or '\n') or a string (e.g. "Hello World!\n". A quotation is a one or more identifiers, quotations, and/or literals surrounded by square brackets: [ and ]. Any sequence of identifiers, quotations, and/or literals is called an expression or term. A definition equates a term with a new name. An example of a definition is: define f { 1 + 2 * } Definitions may have optional type signatures: define f : (int -> int) { 1 + 2 * } Cat GrammarBelow is the Cat grammar expressed in an informal parsing expression grammar (PEG) form: program ::== (instruction | definition)* definition ::== define identifier arguments (: type_decl)? (: meta_data)? block block ::== { operation* } instruction ::== identifier | literal literal ::== quotation | string | char | float | integer quotation ::== [ operation* ] arguments ::== ( arg* ) arg ::== alphanum+ type_decl ::== pure_fxn_type_decl | impure_fxn_type_decl pure_fxn_type_decl ::== ( type_element* -> type_element* ) impure_fxn_type_decl ::== ( type_element* ~> type_element* ) type_element ::== type_name | pure_fxn_type_decl | impure_fxn_type_decl | type_variable | type_vector_variable named_type ::== type_name=unnamed_type type_variable ::== ' lower_case_letter alphanum* type_vector_variable ::== ' upper_case_letter alphanum* identifer ::== alphanum+ | group_symbol* | single_symbol | escaped_char ::== \ not_newline not_newline ::== ^newline any_char string_char ::== escaped_char | ^" not_newline char_char ::== escaped_char | ^' not_newline string ::== " string_char* " char ::== ' char_char ' float ::== integer.integer integer ::== number+ number ::== [0..9] single_symbol ::== [,,(,)] group_symbol ::== [~,#,$,%,^,&,*,-,+,=,|,\,:,;,<,>,.,?,/] alphanum ::== letter | number letter ::== [a..z,A..Z,_] comment ::== single_line_comment | multi_line_comment single_line_comment ::== // anychar* \n multi_line_comment ::== /* anychar* */ meta_data ::== \n{{\n anychar* \n}}\n Grammar for Cat Meta-DataDefinitions may also contain a special kind of comment that contains meta-data. For example: define f : (int -> int) {{ desc: Adds one and then multiplies by two test: in: 3 f out: 8 test: in: 5 f out: 12 }} { 1 + 2 * } This meta-data can be used by tools but can be stripped out without affecting the meaning of program. The meta-data entries conform to a tree structure according to the level of indentation (i.e. number of space and tab characters) that proceeds the meta-data label for that entry. In other-words all meta-data entries that have the same amount of indentation are considered siblings in a meta-data tree. Spaces and tabs are considered equivalent for the purpose of establishing a meta-data entry's position in the tree. meta_data_block ::== \n{{\n meta_data_entry* \n}}\n meta_data_entry ::== meta_data_label meta_data_wspace* meta_data_content meta_data_wspace ::== space | tab meta_data_label ::== meta_data_wspace ident : Scheme EvaluatorA definition of the core Cat language is expressed by the following evaluation function written in Scheme: (define (cat-eval stk exp) (if (null? exp) stk (cat-eval (case (car exp) ((add_int)(cons (+ (car stk) (cadr)) (cddr stk))) ((and) (cons (and (car stk) (cadr stk)) (cddr stk))) ((apply) (cat-eval (cdr stk) (car stk))) ((compose)(cons (append (car stk) (cadr stk)) (cddr stk))) ((cons) (cons (list (cadr stk) (car stk)) (cddr stk))) ((dec) (cons (- (car stk) 1) (cdr stk))) ((dip) (cons (cadr stk) (cat-eval (cddr stk) (car stk)))) ((div_int)(cons (/ (car stk) (cadr)) (cddr stk))) ((dup) (cons (car stk) stk)) ((eq) (cons (= (cadr stk) (car stk)) (cddr stk))) ((eq) (cons (empty? (car stk)) (cdr stk))) ((false) (cons #f stk)) ((if) (if (caddr stk) (cat-eval (cdddr stk) (cadr stk)) (cat-eval (cdddr stk) (car stk)))) ((inc) (cons (+ (car stk) 1) (cdr stk))) ((list) (cons (list (car stk)) (cdr stk))) ((mod_int)(cons (% (cadr stk) (car)) (cddr stk))) ((mul_int)(cons (* (car stk) (cadr)) (cddr stk))) ((not) (cons (not (car stk)) (cdr stk))) ((or) (cons (or (car stk) (cadr stk)) (cddr stk))) ((papply) (cons (cons (cadr stk) (car stk)) (cddr stk))) ((pop) (cdr stk)) ((quote) (cons (list (car stk)) (cdr stk))) ((sub_int)(cons (- (car stk) (cadr)) (cddr stk))) ((swap) (cons (cadr stk) (cons (car stk) (cddr stk)))) ((true) (cons #t stk)) ((uncons) (cat-eval (cdr stk) (car stk))) (else (cons (car exp) stk))) (cdr exp)))) copyright Christopher Diggins, 2007 |