Cat Specification

Cat has three different levels of specification: level-0, level-1, and level-2. This is so that tools can identify their degree of compliance with the specification. Each level of the specification specifies a number of primitive functions. These functions can be built-in, or can be defined in an implicitly linked standard library.

Core Language (Level-0)

The core Cat Language is known as Level-0 Cat. The core Cat language is a pure functional language.

The core language consists of four types:

  • int - integer value
  • bool - boolean value
  • list - abstract collection
  • var - polymophic variant

The core Cat primitive functions are:

nametypedescription
add_int(int int -> int)Adds the top two values on the stack.
and(bool bool -> bool)Computes the boolean 'and' operator.
apply('A ('A -> 'B) -> 'B)Applies a function on the stack to the rest of the stack.
compose(('A -> 'B) ('B -> 'C) -> ('A -> 'C))Composes two functions.
cons(list 'a -> list)Add an item to the beginning of a list.
dec(int -> int)Decrements an integer.
dip('A 'b ('A -> 'C) -> 'C 'b)Executes a function, temporarily removing the next item on the stack.
div_int(int int -> int)Divides top value from the second value.
dup('a -> 'a 'a)Duplicates top item on the stack.
eq('a 'a -> bool)Compares two items for equality.
empty(list -> list bool)Returns true if the list is empty.
false( -> bool)Pushes the boolean value true onto the stack.
if('A bool ('A -> 'B) ('A -> 'B) -> 'B)Conditionally executes one of two functions.
inc(int -> int)Increments an integer.
list('a -> ( -> 'a))Quotes a value, creating a constant generating function.
mod_int(int int -> int)Computes the remainder of dividing the top value from the second value.
mul_int(int int -> int)Multiplies the top two values on the stack.
not(bool -> bool)Negates a boolean value.
or(bool bool -> bool)Computes the boolean 'or' of top value.
papply('a ('B 'a -> 'C) -> ('B -> 'C))Performs partial application.
pop('a -> )Removes top item from the stack.
quote('a -> ( -> 'a))Quotes a value, creating a constant generating function.
sub_int(int int -> int)Subtract the top value on the stack from the second.
swap('a 'b -> 'b 'a))Swaps the top two values on the stack.
true( -> bool)Pushes the boolean value true onto the stack.
uncons(list -> list var)Returns the first item of a list, and the rest of the list

Basic Concepts

Names (Identifiers)

Cat names (a.k.a. identifiers) are similar to the Scheme language. They consist of a consecutive sequence of letters, numbers, and symbol characters ( ~ ` ! @ # $ % ^ & * - + = | : ; < > . ? / ). However, an implementation of Cat that is only level-0 compliant might restrict names to those that consist of lowercase alphanumeric english letters, the numbers 0..9, and the underscore (a total of 37 characters). This would be to reduce the burden on letter encoding where memory space is particularly precious. Cat is normally case-sensitive, but we can expect a level-0 compliant version to be case-insensitive.

The following characters are also valid names: ( ) ,, as is the paranthesis pair enclosing nothing (). Note: ( ) is two names, whereas () is one. One restriction is that a name can not start with a number. Examples of valid names are:

      thisisaverylongname
      q
      n42
      is-big?
      ++
      <html>
      (
      =^,,^=

Definitions

New functions are defined in Cat using the define keyword. A function has the form

      define definition-name (: type-declaration)? metadata?
      { expression }
    
Example of a simple function declaration:
      define succ
      { 1 + }
    

Type System

Cat Type System

Cat has a strong static type system. Every expression in a valid Cat program has a single well-defined type that can be determined through static code analysis using a decidable algorithm. Cat also allows dynamic type-checking.

The Cat type system consists of simple types (first-order types without quantifiers) enriched with type schemes (simple types with outer-most universal quantifiers), and row variables. Type-inference of Cat expressions is possible using a unification algorithm based on the Hindley-Milner type reconstruction algorithm, as extended by Damas to support polymorphism.

A Cat type is either a simple built-in type, a function type, a type variable or a type vector variable.

Function Type Declarations (Stack Effect Diagrams)

Function type declarations (also called type annotations, type signatures, and stack-effect diagrams) are optional. All valid Cat functions and expressions can have the type inferred (i.e. reconstructed). A type-declaration is useful as a form of documentation or automated testing tool. If the declared type is not the same as the inferred type, then the Cat program is invalid.

Known Issue Type checking is currently disabled in the beta implementation, but type-inference currently works.

A type declaration indicates the consumption and production of a Cat function. The consumption indicates the configuration of types on the stack that a function requires on the stack to execute. The production is the configuration of types on the stack that a function promises to leave in place of the consumption after execution.

The form of a pure function declarations is:

      : ( consumption -> production )
If a function has side-effects the function declaration is written instead as:
      : ( consumption ~> production )

Type declarations consist of concrete types (e.g. int, bool, string, etc.), type variables which are identified by an apostrophe and a name starting with a lowercase letter (e.g. 'a, 'myvar) and type-vector variables which are are identified by an apostrophe and a name starting with an uppercase letter ('A, 'Myvar).

Some examples of type-declarations:

      inc  : (int -> int)
      eqz  : (int -> bool)
      eq   : ('a 'a -> bool)
      swap : ('a 'b -> 'b 'a)
      dup  : ('a -> 'a 'a)
      pop  : ('a -> )
      if   : ('A bool ('A -> 'B) ('A -> 'B) -> 'B)

The consumption and production are expressed as a vector of types t0 t1 ... tn.

Simple Built-In Types

A minimal Cat implementation is expected to provide the following primitive or built-in types:

Level 0
  • bool - a boolean value containing True or False
  • int - a signed integer of implementation defined size (greater than or equal to 32 bits)
  • var - a dynamically checked variant
  • list - a collection of variants
Level 1
  • byte - an unsigned integer with a size of precisely 8 bits
  • char - an ascii character repesentation of implementation defined size
  • string - a sequence of ascii characters (note: not a subtype of list, but this may change)
  • dbl - a double precision floating point value of implementation defined size (greater than or equal to 32 bits)
  • bit - a single binary value
  • function - a dynamically checked function type
  • type - a value that represents the dynanmic
Level 2
  • hash_list - an associative collection type
  • istream - a handle to an input stream (such as a file, pipe, or console handle)
  • ostream - a handle to an output stream (such as a file, pipe, or console handle)

Type Variables

A type variable corresponds to a single type on the stack or in a function signature. Type variables are identified by starting with a single back-quote character "'" followed by a lower-case letter, and zero or more letters of either case.

Stack Variables: Type Vector Variables

A stack variable (or type vector variable) corresponds to zero or more types on the stack or in a function signature. Stack variables are identified by a single back-quote character "'" followed by an upper-case letter, and zero or more letters of either case.

Implicit Stack Variables: Row Variables

A function type where either the left-most position of a consumption or production is not a stack variable, means that the type has an implicit stack variable (called either the row variable or "rho" variable) that represents the rest of the stack.

Cat types use what could be termed a dishonest notation since every function has an implicit stack variable. For example the integer addition function is expressed as: For example the dup primitive function has the type written as ('a -> 'a 'a) but to be precise the type is: ('R 'a -> 'R 'a 'a). Where 'R is the row variable. Notice that the apply does not require a row variable: ('A ('A -> 'B) -> 'B) because the stack variables 'A and 'B refer to the entire stack.

Universal Quantification

All type variables and stack variables within a function type, are universally quantified at the outer-most level. In other words given a type signature such as:

      ('a ('a -> 'b) -> )
an implicit forall operator occurs at the outer most level for all variables ('a, 'b and the row variables). So the full type is in actuality:
      forall R. forall S. forall a. forall b. ('R 'a ('S 'a -> 'S 'b) -> 'R)

Implementing a Type Inference Engine for Cat

For information on how to construct a type-inference engine for Cat see: Luca Cardelli's paper Basic Polymorphic Typechecking at http://research.microsoft.com/Users/luca/Papers/BasicTypechecking.pdf and Benjamin Pierce's book, Types and Programming Languages). Public domain source code of a type inference (i.e. type reconstruction) algorithm in C# for Cat can be found at http://code.google.com/p/cat-language/downloads/list.

Semantics

Term

A term in the Cat language is defined as follows:

      term ::==
        prim      // primitive function
        def       // defined function
        lit       // literal
        [term]    // quotation
        term term // juxtaposition
        empty     // empty term
    

Every term corresponds to a function on stacks that maps from an input stack to output stack. Cat terms can not refer to previous stack states, which allows an implementation to use a single shared stack. A copy of the current stack can be created through a closure or continuation.

Every term pops zero or more values by the stack and pushes zero or more values from the stack. The types of the values popped from the stack is known as the consumption, and the types of the values pushed onto the stack is known as the production.

Given a particular configuration of types on the input stack, the size and shape of the consumption and production of a function must always be the same.

Primitive Function

A primitive function is a predefined function. The set of primitive functions is defined by the level of compatibility with the Cat standard.

Defined Function

A defined function is a user-defined function.

Literals

Literals are functions that consume no values and push a single value onto the stack. Examples of literals include numbers (e.g. 42), strings (e.g. "hello"), and characters (e.g. 'q').

Quotations

The quotation of a term pushes an anonymous function consisting of those terms onto the stack. A quotation is a literal function object.

Juxtaposition

The juxtaposition of terms (placing two terms side by side) has the result of composing the stack functions of the two terms. The juxtaposition of terms is sometimes called concatenation, hence the term concatenative language.

Quotations

A quotation is an anonymous function pushed onto the stack without execution.

There are three methods for constructing anonymous functions:

  • using the quotation constructs "[" and "]".
  • using compose function to concatenate two functions on the stack to create a new one
  • using qv to create a nullary function that returns a constant value.

Quotations can be evaluated (or executed or dequoted depending on your terminology preference) using the apply function (among others). Functions that expect quotations as arguments are often called combinators.

Combinators

A combinator is a term that is used somewhat loosely to refer to any function that accepts a quotation as one of its arguments.

The apply combinator evaluates a function on the top of the stack.

The dip combinator executes a function on the top of the stack, but first removes the element below it, pushing the element on top upon completion.

      >> 1 2 3 [swap] dip
      stack: 2 1 3
In effect dip provides an auxiliary storage mechanism that emulates a secondary stack.

Other combinators, such as if and while are used to affect control flow.

Conditional Execution

Conditional execution can be perfomed using the primitive function if. The if combinator consumes two functions and a boolean value. Depending on whether the boolean value is true or false it executes the second function or top function respectively. In other words it has the following type:

      if : ('A bool ('A -> 'B) ('A -> 'B) -> 'B)

Loops

There are several primitive combinators which behave as looping constructs: while, whilen, whilenz, whilene, for, for_each, rfor, repeat.

Recursion

Recursive Definitions

Cat allows recursive definitions. For example the factorial function can be expressed as:

      define rec_fac
      { dup 1 <= [pop 1] [dec rec_fac *] if }
    

Binary Recursion

Cat provides a bin_rec primitive for performing binary recursive computations.

Fixed Point Combinator (Y Combinator)

Recursion can also be achieved using the y combinator.

Side Effects

Functions in Cat may have side-effects. Functions with side-effects have a different type than those without and are annotated using ~> instead of ->. Functions without side-effects are called pure functions, otherwise they are called impure.

Impurity is polluting. Passing an impure function as an argument to a function will cause the type of that function to be impure.

Examples of side-effects include:

  • file i/o
  • connecting to networks or devices
  • accessing RAM
  • generating random numbers
  • configuring the interpreter
  • throwing an exception
In Cat 'non-termination' is not considered an effect. This means that a function can be converted from a non-terminating function in to a terminating one without problem.

A pure expression will always return the same result given the same arguments on the top of the stack.

Well Typedness

Well Typed Programs

A Cat program is well-typed if and only if the program as a term is well-typed given an empty stack [and all type constraints are satisfied].

Well Typed Terms

A term constructed by juxtaposing two sub-terms is well-typed if and only if ther exists a stack which when consumed by by the first term will produce a stack that can be consumed by the second term.

A quotation is well-typed if and only the term enclosed by square brackets is well-typed.

The empty term is well-typed.

Primitives

For a full list of primitives, see primitives.html.

MetaData

Metadata blocks are a standardized comment format for defined functions. Metadata blocks are delimited by "{{" and "}}". Metadata blocks cannot be nested. Metadata should not affect whether a Cat program be compiled or not, and can be reduced to whitespace by a parser. The metadata format allows Cat programmers and tools to standardize the layout of documentation, automated tests, and other similar kind of data.

A metadata block has a labeled-node tree structure very similar to YAML. The start of a metadata block is indicated by two opening curly braces in succession (e.g. {{) which occur on a single line surrounded only by whitespace. The end of a metadata block is indicated by two closing curly braces in succession (e.g. }}) which also must occur on its own line. Node labels are followed by a colon character (e.g. "my-node:"). The node data immediately follows the colon chararacter and may occur on multiple lines. The node data is terminated when a new label is encountered. The indentation of node labels indicates the nesting structure.

Example of a function declaration with metadata:

      define fact : (int -> int)
      {{
        desc:
          A simple factorial function
        precondition:
          dup 0 gteq
        test:
          in: 5 fact
          out: 120
        tags:
          demo,algorithms
      }}
      {
        dup eqz
        [pop 1]
        [dup dec fact mul_int]
        if
      }
    

Metadata Conventions

Metadata is organized by convention. The convention used by the current standard library is to provide the following optional node-types:

  • desc - A short plain-text description of the function
  • precondition - An expression that if evaluated before a function is executed is expected to always leave true on the stack (not implemented)
  • postcondition - An expression that if evaluated after a function is executed is expected to always leave true on the stack (not implemented)
  • test - Contains to sub-nodes 'in' and 'out' that should returns return the same values if the definition is correct
  • tags - A comma delimited list of categories that the function belongs to
  • notes -A plain-text tag that provides extra information for developers
  • bugs - A plain-text tag that describes known issues with the function

Syntax

A Cat program consists of definitions, words, and quotations. A word is an identifier or literal which can either be defined in the program, or is a primitive definition.

An identifier is a consecutive series of of letters, numbers, and symbols, that does not start with a number or a minus sign. The following are examples of valid identifiers: hello, _hello, h3ll0, +3+, =^,,^=

A Cat literal can be a decimal integer (e.g. 234 or -12), binary integer (e.g. 0b100101110), hexadecimal integer (e.g. 0xD166E8), floating point number (e.g. 0.01), a character (e.g. 'q' or '\n') or a string (e.g. "Hello World!\n".

A quotation is a one or more identifiers, quotations, and/or literals surrounded by square brackets: [ and ].

Any sequence of identifiers, quotations, and/or literals is called an expression or term.

A definition equates a term with a new name. An example of a definition is:

  define f { 1 + 2 * }

Definitions may have optional type signatures:

  define f : (int -> int) { 1 + 2 * }

Cat Grammar

Below is the Cat grammar expressed in an informal parsing expression grammar (PEG) form:

    program ::==
      (instruction | definition)*

    definition ::==
      define identifier arguments (: type_decl)? (: meta_data)? block

    block ::==
      { operation* }

    instruction ::==
      identifier |
      literal

    literal ::==
      quotation |
      string |
      char |
      float |
      integer

    quotation ::==
      [ operation* ]

    arguments ::==
      ( arg* )

    arg ::==
      alphanum+

    type_decl ::==
      pure_fxn_type_decl |
      impure_fxn_type_decl

    pure_fxn_type_decl ::==
      ( type_element* -> type_element* )

    impure_fxn_type_decl ::==
      ( type_element* ~> type_element* )

    type_element ::==
      type_name |
      pure_fxn_type_decl |
      impure_fxn_type_decl |
      type_variable |
      type_vector_variable

    named_type ::==
      type_name=unnamed_type

    type_variable ::==
      ' lower_case_letter alphanum*

    type_vector_variable ::==
      ' upper_case_letter alphanum*

    identifer ::==
      alphanum+ |
      group_symbol* |
      single_symbol |

    escaped_char ::==
      \ not_newline

    not_newline ::==
      ^newline any_char

    string_char ::==
      escaped_char |
      ^" not_newline

    char_char ::==
      escaped_char |
      ^' not_newline

    string ::==
      " string_char* "

    char ::==
      ' char_char '

    float ::==
      integer.integer

    integer ::==
      number+

    number ::==
      [0..9]

    single_symbol ::==
      [,,(,)]

    group_symbol ::==
      [~,#,$,%,^,&,*,-,+,=,|,\,:,;,<,>,.,?,/]

    alphanum ::==
      letter |
      number

    letter ::==
      [a..z,A..Z,_]

    comment ::==
       single_line_comment |
       multi_line_comment

    single_line_comment ::==
        // anychar* \n

    multi_line_comment ::==
        /* anychar* */

    meta_data ::==
        \n{{\n anychar*  \n}}\n
  

Grammar for Cat Meta-Data

Definitions may also contain a special kind of comment that contains meta-data. For example:

  define f : (int -> int)
  {{
    desc:
      Adds one and then multiplies by two
    test:
      in: 3 f
      out: 8
    test:
      in: 5 f
      out: 12
  }}
  { 1 + 2 * }

This meta-data can be used by tools but can be stripped out without affecting the meaning of program.

The meta-data entries conform to a tree structure according to the level of indentation (i.e. number of space and tab characters) that proceeds the meta-data label for that entry. In other-words all meta-data entries that have the same amount of indentation are considered siblings in a meta-data tree. Spaces and tabs are considered equivalent for the purpose of establishing a meta-data entry's position in the tree.

    meta_data_block ::==
      \n{{\n meta_data_entry*  \n}}\n

    meta_data_entry ::==
      meta_data_label meta_data_wspace* meta_data_content

    meta_data_wspace ::==
      space | tab

    meta_data_label ::==
      meta_data_wspace ident :

Scheme Evaluator

A definition of the core Cat language is expressed by the following evaluation function written in Scheme:

    (define (cat-eval stk exp)
      (if (null? exp)
          stk
          (cat-eval
           (case (car exp)
             ((add_int)(cons (+ (car stk) (cadr)) (cddr stk)))
             ((and)    (cons (and (car stk) (cadr stk)) (cddr stk)))
             ((apply)  (cat-eval (cdr stk) (car stk)))
             ((compose)(cons (append (car stk) (cadr stk)) (cddr stk)))
             ((cons)   (cons (list (cadr stk) (car stk)) (cddr stk)))
             ((dec)    (cons (- (car stk) 1) (cdr stk)))
             ((dip)    (cons (cadr stk) (cat-eval (cddr stk) (car stk))))
             ((div_int)(cons (/ (car stk) (cadr)) (cddr stk)))
             ((dup)    (cons (car stk) stk))
             ((eq)     (cons (= (cadr stk) (car stk)) (cddr stk)))
             ((eq)     (cons (empty? (car stk)) (cdr stk)))
             ((false)  (cons #f stk))
             ((if)     (if (caddr stk)
                          (cat-eval (cdddr stk) (cadr stk))
                          (cat-eval (cdddr stk) (car stk))))
             ((inc)    (cons (+ (car stk) 1) (cdr stk)))
             ((list)   (cons (list (car stk)) (cdr stk)))
             ((mod_int)(cons (% (cadr stk) (car)) (cddr stk)))
             ((mul_int)(cons (* (car stk) (cadr)) (cddr stk)))
             ((not)    (cons (not (car stk)) (cdr stk)))
             ((or)     (cons (or (car stk) (cadr stk)) (cddr stk)))
             ((papply) (cons (cons (cadr stk) (car stk)) (cddr stk)))
             ((pop)    (cdr stk))
             ((quote)  (cons (list (car stk)) (cdr stk)))
             ((sub_int)(cons (- (car stk) (cadr)) (cddr stk)))
             ((swap)   (cons (cadr stk) (cons (car stk) (cddr stk))))
             ((true)   (cons #t stk))
             ((uncons) (cat-eval (cdr stk) (car stk)))
           (else
             (cons (car exp) stk)))
           (cdr exp))))

back to the Cat Programming Language home page
copyright Christopher Diggins, 2007