CGOL - an Algebraic Notation For MACLISP users V. R. Pratt 1/27/77 Common-Lisp annotations by G.J. Carrette 9/14/82 ABSTRACT MACLISP programmers who feel comfortable with ALGOL-like notation, that is, an algebraic style in which one might write a matrix multiply routine as for i in 1 to n do for k in 1 to n do (ac := 0; for j in 1 to n do ac := ac + a(i,j)*b(j,k); c(i,k) := ac) can now write LISP programs in just such a notation. This notation is essentially transparent to the MACLISP system, and files containing CGOL code (possibly mixed in with standard code) can be read by the interpreter and compiled by the compiler just as though they were written in straight LISP notation. 1. PHILOSOPHY 1.1 Representational Issues LISP S-expressions ("abstract" objects in a domain that contains atoms and is closed under the pairing function CONS) require an internal and an external representation ("concrete" objects). The former is for the convenience of the programmer, the latter for that of the machine. The functions (READ) and (PRINT) provide maps between the two concrete representations. We use the term "LISP" principally to denote the abstract objects; out of respect for usage, however, we shall also use it to denote "the" standard external representation, which varies mainly in detail from one implementation to the next. We rely on context to disambiguate these usages. We use "INT" to denote whatever internal representation obtains in a particular implementation; this may vary considerably between installations. 1.2 CGOL as an alternative external representation. This document describes an alternative external representation, CGOL, for those LISP S-expressions that encode procedural information. The representation is modeled on McCarthy's M-expression notation; as such it has Smith and Enea's MLISP language as a predecessor. See section 8 for a discussion of similarities and differences between MLISP and CGOL. In an environment that supports both CGOL and LISP external representation, we envisage facilities for mapping between representations as diagrammed below: READ CGOLREAD ------ ========> ----- <========= ------ | LISP | | INT | | CGOL | ------ <======== ----- =========> ------ PRIN1 CGOLPRIN1 This diagram of course generalizes to any set of external representations. The beauty of it is that the entire system described in this manual is no more than the implementation of CGOLREAD and CGOLPRINT. Thus the MACLISP user wishing to experiment with CGOL can immmediately begin by typing (CGOLREAD) to LISP and then typing in an expression to see what it returns. The results of such an exercise are given at the start of section 2. 1.3 Usage If LISP's top-level, a cut-down version of which might be (PRINT (EVAL (READ))) , is replaced by (CGOLPRINT (EVAL (CGOLREAD))) the top level should now expect CGOL notation, and reply in like notation. If you prefer standard notation for output (the default if you type (CGOL) as described below) then clearly you need (PRINT (EVAL (CGOLREAD))) . In MACLISP, the function (CGOL) will set the variable READ to CGOLREAD, achieving the above effect on MACLISP's top, break and edit levels. Alternatively, for users who want to use only CGOL notation, LISP can be invoked from top level via ":L CGOL;" and the LISP so loaded will be already set up to use CGOL notation. Both (CGOLREAD) and (CGOL) (but not CGOLPRINT) are autoloading. Thus if at any time while talking LISP you suddenly want or need to use CGOL notation, you need merely type (CGOL) at LISP and LISP will then expect CGOL expressions. To revert to standard notation, type EXIT$ . (Throughout this document we shall use $ to denote both altmode and dollar which may be used interchangeably in CGOL, except that when typing directly to LISP as opposed to creating a file of CGOL programs altmode should be used to ensure immediate response of the system, which treats only altmode as a "force-feed" character.) This implies that you may have a mixture of CGOL and LISP programs in a single file, provided that each block of CGOL code is preceded by (CGOL)$ (the $ is redundant but useful if for any reason (CGOL) is taken to be in CGOL notation - it is in fact legal CGOL notation provided the $ follows it) and is followed by =EXIT$ (the = forces the exit to be executed even if read but not eval'd by, say, the compiler). In Common-Lisp the function (CGOL) will set the readtable. Since this needs to happen at readtime an entire file of CGOL expressions should start with #.(CGOL) and end with =EXIT$. A more local way of entering CGOL expressions is to use the readmacro "#$" for example: #$ define fact(x) if x<1 then 1 else x*fact(x-1) $ 1.4 Design Considerations. The two principles that serve respectively as lower and upper bounds on the choice of CGOL notation are: (i) The notation should fairly match what the non-LISP community regards as a reasonable notation. In particular, users of ALGOL, FORTRAN, PL/I etc, should not experience difficulty in guessing the meanings of those constructs common to those languages, e.g. assignment, application, arithmetic and relational operations, and the more familiar control constructs. This requirement need not apply to constructs peculiar to LISP, such as LIST, CONS, LAMBDA, EVAL, QUOTE and so on. (ii) The notation should restrict itself to being a notation for LISP abstract objects, and not try to be a full-blown programming language with its own useful set of constructs. (This represents a point of departure from the MLISP philosophy.) Useful constructs should be added to LISP via the same channels (modulo choice of notation) that are used regularly by all LISP users to augment LISP, e.g. (DEFUN ...) or its CGOL equivalent. There is a tension between these two requirements that is at present not appreciated by the bulk of the computer science community. That tension is brought about by the two very different techniques for specifying the syntax of ALGOL-like languages and the semantics of LISP programs. The former is phrase oriented, the latter is token oriented. ("Phrase" and "token" may be replaced respectively by "non-terminal" and "terminal", to use the formal language terminology.) A typical syntax specification uses BNF, i.e. a context-free grammar, whereas LISP programs are specified in terms of the meaning of atoms. The rich non-terminal structure of, say, the syntax of ALGOL is replaced by a trivial non-terminal structure for LISP, namely all non-atomic programs are of the form (FUNCTION ARGUMENTS). Not all ALGOL-like languages are specified by their phrase structure. From time to time attempts are made to use the powerful macro facilities of assembly languages to implement "higher level" languages. Since macros are identified by single tokens, they constitute a token oriented specification. A limitation of the approach is that the macro identifier must usually appear before its arguments. Also a macro interpreter that allows nested calls must be used. The CGOL notation uses a token oriented approach to fit comfortably with constraint (ii). Unlike macro based approaches, a given token may have either zero or one argument preceding it, in addition to any number following it (suitably delimited). This gives rise to the familiar association problem, which we deal with using Floyd's notion of precedence functions. Each argument position of a token has an associated "binding power"; association is resolved in favor of the higher binding power. The binding power idea is due to Floyd, who called the binding powers "operator functions"; the term we use appears to have originated with CPL. The parsing algorithm we use differs from Floyd's in that ours is left-corner whereas Floyd's is pure bottom-up. The original version of the CGOL parser was implemented at Stanford in July 1970; its use of binding powers was copied soon after by Smith for the MLISP system. We discuss the details of the association problem in section 3. 2. EXAMPLES OF CGOL EXPRESSIONS The following examples of CGOL expressions, together with the effect of doing (PRINT (CGOLREAD)) on them (i.e. their LISP translations), are given below. To aid the eye we shall use upper case for LISP and lower case for CGOL. Note that CGOLREAD demands an ALTMODE or DOLLAR (not shown) after each CGOL expression. ALTMODE is better for TTY input since it causes immediate response (force-feed). In files, where force-feeding is irrelevant, DOLLAR is more convenient to insert using TECO. 1+1 (PLUS 1 1) [1, '2+2', sin(.37*x+1)] (LIST 1 '(PLUS 2 2) (SIN (PLUS (TIMES .37 X) 1))) \x,y; 1/sqrt(x**2 + y**2) (LAMBDA (X Y) (QUOTIENT 1 (SQRT (PLUS (EXPT X 2) (EXPT Y 2))))) toplevel := 'print *; eval read' (SSTATUS TOPLEVEL '(PROG2 (PRINT *) (EVAL (READ)))) car m & car m := cdr m (PROG2 NIL (CAR M) (RPLACA M (CDR M))) 'father' of x := 'brother' of relative of y (PUTPROP X (GET (GET Y RELATIVE) 'BROTHER) 'FATHER) father ofq x := brother ofq relative of y % analog of SETQ % (PUTPROP X (GET (GET Y RELATIVE) 'BROTHER) 'FATHER) a(i,j) := 3 (STORE (A I J) 3) if numberp i and -jb then a.((a+1) to b) (DEFUN TO (A B) (COND ((NOT (GREATERP A B)) (CONS A (TO (PLUS A 1) B))))) 3. GRAMMAR A CGOL expression is a string of sub-expressions and tokens. For example the expression "if a=b then 0 else i+1" has six constituents corresponding to the six items in the "construct" if a then b else c In this example those constituents are: the token "if" the sub-expression "a=b" the token "then" the sub-expression "0" the token "else" and the sub-expression "i+1". In turn, the sub-expression "a=b" has three constituents: the sub-expression "a" the token "=" and the sub-expression "b". And the sub-expression "a" has one constituent, the token "a". For those accustomed to BNF, the grammar of CGOL might look like: ::= if then [else ] ::= = ::= () ::= ; and so on. Since will be the only non-terminal, the left hand side of productions may be omitted without loss of information. Substituting variables for , we can then write CGOL rules as: if a then b [else c] a = b (a) a ; b and so on. As they stand, such rules are ambiguous. Does a=b;c mean (a=b);c or a=(b;c) ? The problem is that each of "=" and ";" could take b as its argument. We say that b associates with the one that takes it as argument. Thus if "print a+b" means "print(a+b)", "a" has associated with "+" in preference to "print". In CGOL, all such disputes are resolved by binding powers, a sort of syntactic version of atomic valences. Thus if the right binding power (rbp) of "=" is 10 and the left binding power (lbp) of ";" is 1, then a=b;c is read as (a=b);c because the right hand argument slot of "=" pulls harder on b than does the left hand argument slot of ";" . (Ties are broken by associating to the left.) Left and right binding powers are completely independent. Each is relevant when the expression can have a sub-expression at the left and right hand ends respectively. Thus the left binding power of "car" is irrelevant because "car a" does not have a sub-expression at the left. However, it has one at the right, so its right binding power is relevant. The opposite obtains for the suffix operator "isatom", which has a left binding power but no right binding power. In an expression like "if a then b else c", the right binding power applies to all three arguments even though only the last may actually be fought over by another operator to its right. In addition to the left-right association problem there is the "dangling else" problem, named after an instance of the problem: does "if a then if b then c else d" mean "if a then (if b then c else d)" or "if a then (if b then c) else d" ? This problem is just a variant of the left-right association problem; the argument "else d" could associate with either the first or the second "if". In CGOL, all such disputes are resolved by associating to the closer operator. (For those who liked the atomic-valence analogy, imagine an inverse (square?) law for distance holds.) The above two rules deal with most of the ambiguities that might arise in the CGOL grammar given below. The following table lists the explicitly defined CGOL constructs together with their translation into standard notation. Except where otherwise noted, a,b,...,z denote CGOL expressions and A,B,...,Z their corresponding standard forms. The table has three columns, the CGOL form, the left and right binding powers when relevant (only given once when they are the same, or when only one is relevant), and the translation. ---------BRACKETING OPERATORS--------- (a) 0 A f(a,b,...,z) 25 0 (F A B ... Z) [a,b,...,z] 0 (LIST A B ... C) a[b,c,...,z] 0 (MAPCAR (FUNCTION A) B C ... Z) a{b} 0 (APPLY (FUNCTION A) B) a[{b}] 0 (APPLY (FUNCTION MAPCAR) (CONS (FUNCTION A) B)) ---------QUOTING OPERATORS------- 'a' 0 'A "a" (where a is a string) '|a| ?a (where a is a character) /a #a (where a is any CGOL token) A !A (where A is a LISP S-expr) A ---------DECLARATIVE OPERATORS--------- \ a,b,..,p; q;r;..;z 0 (LAMBDA (A B ... P) Q R ... Z) let a=x,b=y,..,c=z;p;q;..;s 0 ((LAMBDA (A B ... C) P Q ... S) X Y ... Z) let x,y,..,z={a}; b;c..;h 0 (APPLY '(LAMBDA (X Y .. Z) B C .. H) A) (Think of {a} as "unpacking" a. This is consistent with its use above as f{a}. "let x,y={a}, u=b, v,w={c};..." will do what you would expect, translating into (APPLY '(LAMBDA (X Y U V W) ...) (APPEND A (LIST B) C)) .) prog a,b,..,p; q;r;..;z 0 (PROG (A B ... P) Q R ... Z) new a,b,..,p; q;r;..;z 0 (PROG (A B ... P) Q R ... (RETURN Z)) special a,b,...,z (DECLARE (SPECIAL A B ... Z)) ---------CONTROL OPERATORS-------- eval a 1 (EVAL A) a;b;...;z 1 0 (PROGN A B ... Z) a&b 1 0 (PROG2 NIL A B) if a then b [else c] 2 (COND (A B) [(C)]) return a 1 (RETURN A) while a do b 2 (DO NIL ((NOT A)) B) for i in l, j in m do f 2 (MAPC (FUNCTION (LAMBDA (I J) F)) L M) for i in a to b do f 2 (DO ((I A (ADD1 I))) ((GREATERP I B)) F) iter [for i := a step b] 2 (DO ((I A B) (J C D)) (E G) F) [for j := c step d] [until e] [do f] [return g] (The [...] in the above all indicate optional items. Order is immaterial.) --------STORAGE OPERATORS------- a of b := c 25 1 (PUTPROP B C A) a ofq b := c 25 1 (PUTPROP B C 'A) a:=b (a is atomic) 25 1 (SETQ A B) x(a,b,...,z):=y 25 1 (STORE (X A B ... Z) Y) a{b} := c 25 1 (STORE (APPLY (FUNCTION A) B) C) (useful if a is an array whose dimensions are not fixed at compile time) car a := b 25 1 (RPLACA A B) (similarly for cdr) plist a := b 25 1 (SETPLIST A B) arg a := b 25 1 (SETARG A B) ttyread := a 25 1 (SSTATUS TTYREAD A) (etc) a of b 25 24 (GET B A) ---------LIST OPERATORS--------- a.b 14 13 (CONS A B) a@b 14 13 (APPEND A B) ---------RELATIONAL OPERATORS-------- a=b 10 (EQUAL A B) a ne b 10 (NOT (EQUAL A B)) a eq b 10 (EQ A B) ab>...>z 10 (GREATERP A B ... Z) a<=b 10 (NOT (GREATERP A B)) a>=b 10 (NOT (LESSP A B)) (fixed point:) a=#b 10 (= A B) a<#b<#...<#z 10 (< A B ... Z) a>#b>#...>#z 10 (> A B ... Z) (floating point:) a=$b 10 (= A B) (just in case...) a<$b<$...<$z 10 (<$ A B ... Z) a>$b>$...>$z 10 (>$ A B ... Z) a|b 10 (ZEROP (REMAINDER A B)) a isin b 10 (MEMBER A B) a exists 10 (SETQ IT A) (use with caution) ---------LOGICAL OPERATORS------- not a 9 (NOT A) a and b 8 (AND A B) a or b 7 (OR A B) -------STRING OPERATORS---------- a^b^...^z 18 (CAT A B ... Z) ---------ARITHMETIC OPERATORS------- |a| 0 (ABS A) +a 20 A a+b 20 (PLUS A B) -a 20 (MINUS A) a-b 20 (DIFFERENCE A B) a*b 21 (TIMES A B) a/b 21 (QUOTIENT A B) a rem b 21 (REMAINDER A B) a mod b 21 (MOD A B) (courtesy of CGOL) a**b 22 (EXPT A B) (fixed point:) a+#b 20 (+ A B) a-#b 20 (- A B) a*#b 21 (* A B) a/#b 21 (// A B) (floating point:) a+$b 20 (+$ A B) a-$b 20 (-$ A B) a*$b 21 (*$ A B) a/$b 21 (//$ A B) ---------BOOLEAN OPERATORS-------- :n: a not 21 (BOOLE 12 0 a) a :a: b and 21 (BOOLE 1 a b) a :v: b or 20 (BOOLE 7 a b) a :x: b xor 20 (BOOLE 6 a b) a :^: b shift 22 (LSH a b) ---------I/O OPERATORS------- print a 2 (PRINT A) princ a 2 (PRINC A) write a 2 (PROG2 (TERPRI) (PRINC A)) uread a b ... z (a-z are tokens) (UREAD A B ... Z) uwrite a b ... z (UWRITE A B ... Z) ufile a b ... z (UFILE A B ... Z) load a b ... z (a-z are tokens) (FASLOAD A B ... Z) newline (TERPRI) --------MISCELLANEOUS-------- oct(a) (parens mandatory) parses a with IBASE=8 =a 25 Translation is the value of the expression at parse time. Useful in code that would not otherwise be evaluated, e.g. when compiling code. In addition to the above, CGOL "knows" about all the unary functions in LISP. Thus although "car" does not appear in the above table, CGOL knows that CAR is a LISP function with one argument. CGOL treats all such functions f as though they were defined as f a 25 (F A) Note that f and a in the above must have some formatting delimiter between them (space, tab or carriage return); thus car* is not acceptable for car *, which translates as (CAR *). In general CGOL permits atoms to have both values (i.e. to serve as variables) and functional properties. Thus although "last [1,2,3]" will return (3), you can set "last:=2" and find that the variable last now has value 2. The advantage of this is that you do not have to memorize the entire vocabulary of LISP before choosing a variable name. Make sure you don't try to use any of the above-defined operators as variables, or attempt to write them in the "f(x,y,z)" format; for example, + and AND should not be used as variables, while +(x,y) and -(a,b) will not parse because CGOL will not be able to distinguish them from the legal +(x) and -(x) at the time it is reading the + or -. Actually, these are about the only trouble spots, and provided you stick to the algebraic use of + and - you won't encounter this sort of problem. When in doubt you can always drop back into LISP by using "!". However, that should rarely be necessary - it is intended mainly for non-procedural items such as lists for doing MEMBER and ASSOC in. If you can't recall the CGOL form of an expression (F A B ... Z) where F does not itself have any special meaning to CGOL, you won't go wrong by writing f(a,b,...,z). Thus if you forget the form "1+1", you can write "PLUS(1,1)" and it will translate correctly. By the same token, any LISP function not catered for in the above table can be written as "f(a,b,...,z)", e.g. "assoc(a,b)". Care has been taken in the definition of CGOL to avoid using LISP functions as CGOL operators with special syntactic significance as far as possible, to permit PLUS(1,1) and the like. However, some LISP functions remain in CGOL, in particular +, -, * and /, along with PROG, EQ, NOT, AND and OR. While these can to an extent be used as variables, this is not recommended without preceding them with #, the syntax-removing operator. However, except for + and -, all these operators can be used without # as the function f in f{a} or f[a]. When you define functions of your own using DEFINE, you should always invoke the function just as you defined it. If you defined it as define "F" a "ON" b; a+b$ then you should not call it with f(2,3) but only with f 2 on 3 unless you precede the f with #, as in #f(2,3). The only exception is when you define it with define "F"(a,b); a+b$ which attaches no syntactic significance to f, but has the effect of (DEFUN F (A B) (PLUS A B)) . At first sight the binding powers may seem a lot to learn. However, they have been chosen on the basis of the data types their operators take as arguments and return as results in order to minimize the need for parenthesization. If you want to use CGOL notation but don't want to have anything to do with binding powers, simply parenthesize every CGOL expression as though you were writing in LISP. However, if you omit all parentheses (apart from those needed in constructs of the form f(a,b,...,z)) you will not often go wrong. Most often you will want parentheses for grouping in arithmetic expressions when the default priority ranking (|| + - * / mod **) gives the wrong grouping, and when procedural expressions occur as arguments to non-procedural expressions, e.g. "if a then (b;c)", "(print a) + b", "a:=(b:=read)+1" and the like. Some operators have a right binding power one less than their left binding power. This is to make those operators right associative. For example, "a of b of c" is parsed as "a of (b of c)" (since oftener than not that's what was intended), and "a@b@c" as "a@(b@c)" (for efficiency). An interesting pair of right associative operators is ";" and "&". These are duals of each other. They both evaluate their arguments in the same order, but differ in the value they return: a&b returns a, a;b returns b. By making both of them right associative and giving them the same priority, they interact in an elegant way. Suppose you want to evaluate a,b,...,z and to return the value of k. Then the expression a;b;...k&...;z has the desired effect. That is, follow every argument but the last with ";", except for the one you want the value of, which if it is not the last should be followed by "&". A common use for "&" is in tidying up after computing some value, e.g. "x & x:=2" will set x to 2 and return the old value, "a:=(b & b:=a)" will swap the contents of a and b without using a temporary variable, and so on. Note that all this is really a feature of LISP rather than of CGOL; however, the notation makes it easier to see at a glance the intent of what would be relatively difficult to follow in LISP. 4. THE DEFINE FACILITY The CGOL analog of DEFUN is "define". In addition to allowing the user to attach a lambda expression to some functional property of an atom, it gives him some syntactic capabilities as well. If you don't want to take advantage of CGOL's syntactic extensibility, just say define "F"(x,y); x**2 + y**2 $ or whatever it is you want to define. This is equivalent to LISP's (DEFUN F(X Y) (PLUS (EXPT X 2) (EXPT Y 2))) In calling F, the arguments should always be parenthesized, e.g. f(1,2,3) . In this way CGOL can figure out that f is being applied to (1,2,3) even though no syntax has been specified for F, because of the way "(" is treated as an infix operator. If F is supposed to be a fexpr, lexpr or macro then the appropriate word should follow define, e.g. define lexpr "F"(x); arg(1) $ %Note that x is parenthesized, unlike in LISP% If you want syntax for f as well, then the basic form is define [fexpr|lexpr|macro] [,bp [,bp]]; body For example, the following could serve as a definition of "@": define a "@" b, 14, 13; if a then car a . (cdr a @ b) else b $ In this example, the pattern is a "@" b , and gives the rule for this operator, and the two numbers give the two binding powers. The body is what would normally appear as the body of a DEFUN. At present the allowable forms of patterns are few in number. You may write a sequence of variables separated by one or more tokens in string quotes (letters inside string quotes must be capitalized - this is the one place where a distinction is drawn between upper and lower case, in the sense that the reader maps all lower-case letters not in string quotes to upper-case before thinking about what they mean). The variables stand for CGOL expressions and the strings for tokens (recall that a CGOL expression is a sequence of sub-expressions and tokens). The sequence may start and end with either tokens or variables. The first token in the pattern is called the operator, and the remaining tokens are called delimiters. The operator is said to have been defined. An operator may be defined twice only if it takes a left argument in one case and no left argument in the other, this being a criterion CGOL uses when deciding what a particular token in a program means. In the above table, the operators "(", "+" and "-" all have such dual meanings. The delimiters may appear in arbitrarily many definitions, and arbitrarily often in each. A delimiter's binding power is set to 0; thus in general delimiters cannot also be used as LEDs, though they can serve as NUDs (e.g. |, which is both the NUD for ABS and a delimiter). When you specify what delimiter is to be used, you must stick to it; thus you cannot say define "LOG" a "BASE" b; ...$ and invoke it later with "log 3, 2"; you must say "log 3 base 2". The default binding power is 25 if none is specified, and applies to both left and right binding powers. If one binding power is given it is the left and right binding powers. If both are given, they are respectively the left and right ones. Like LISP, CGOL is a one-pass system. This is so that a user can type in a definition and have it take effect immediately. This conflicts with the requirement in any system offering sophisticated syntax that it be possible to use the syntax of an operator before it has been defined. This requirement is nice in general, and essential for mutually recursive function definitions. To get around this problem, you may define the syntax of an operator at any time without giving its semantics, simply by omitting the body of the define command. This is not an elegant solution, and a later version of CGOL may deal with this. (A possible solution is to keep around pieces of unparsable text until they become parsable, and then parse them. Even more dramatic is the solution of not parsing anything until it is to be evaluated, i.e. dropping the unparsability condition from the previous solution.) 5. EXTENDING CGOL It is possible to add to or change the definitions of the rules of CGOL (the ones in Section 2). To see examples of such definitions, look under the heading BASE COMPONENT in the file AI:PRATT;CGOL > . Given the above table, you should be able to get a rough idea of how to write such definitions. The meaning of infix "+" 20 is "PLUS" is as follows. The infix operator "+" is being defined with left and right binding power 20. (Had I said "infixr" it would make "+" right associative by making its right binding power 19, one less than its left.) The translation of "a+b" is then (PLUS A B) . Since the argument positions are "standard" for infix operators, the only information you really need to supply is what the LISP function name is, so you say "is "PLUS"" . You don't have to use "is" - if you want you can spell it out by saying ["PLUS", left, right] , which is a CGOL program to build a list whose three elements are the atom "PLUS", the left hand argument and the right hand argument. Notice the use of this technique when defining infixr "&" 1 ["PROG2", nil, left, right] We can't say 'is "PROG2"' because that would mean '["PROG2", left, right]'. 6. HOW THE *FIX OPERATORS WORK (This treatment assumes an understanding of the basic mechanism of the parser used for CGOL, which treats tokens as either nuds, leds or delimiters. A nud (NUll Denotation) is a lambda-expression of no arguments which is applied to NIL as soon as a token denoting it is encountered by the parser without a left hand argument. A led (LEft Denotation) is the same but with a left hand argument. A delimiter is a token the parser does not access either denotation of (though it may access its left binding power, lbp) because some other denotation caused it to be skipped over (using ADVANCE, often called by CHECK).) The *fix operators take three arguments with no delimiters between them (except for nilfix which takes two and infixd which takes four). arg1 is the name of the defined operator. It need not be string-quoted, but I string-quote it anyway for visibility. It is accessed as the value of token, which in the case of strings is the string itself without quotes. Strings are the one place in CGOL where lower-case is recognized, so if you do stringify this arg, as I do, don't use lower-case. arg2 is the precedence, which is used for both lbp and rbp, except that in infixr the rbp is taken to be one less than arg1. In infixd both lbp and rbp are given. arg3 is the denotation, which is run at the time the parser bumps into the operator. At that moment, token is bound to the token immediately following that operator. Calling advance at that point will rebind token to the next token. Advance returns the new value of token. In the case of LEDs (LEft Denotations, namely infix, infixr, infixd, infixm, suffix), the variable LEFT is lambda-bound (on entry to the denotation) to the translation of the immediately preceding expression. RIGHT is not a variable but a CGOL expression whose translation is (PARSE r) where r is the right binding power as determined by arg2. Thus each mention of RIGHT represents an invocation of the parser. The parser assumes that the input pointer is pointing to the first token of the expression to be parsed (that token being the value of TOKEN). The parser returns the translation, leaving the pointer at the token following the just-translated expression. The IS operator is shorthand for ["FOO", left, right] (in the case of infix) or its analogue for other *fixes, e.g. for nilfix it is ["FOO"], for prefix it is ["FOO",right], for suffix it is ["FOO",left], and for infixm it is a hairy beastie that results in 'a foo b foo c foo d' translating as (FOO A B C D). To some extent you can figure some of this out by reading the definitions of the *fix operators. Deffix takes 5 args: the denotation type (NUD or LED, or XNUD or XLED if you are LEARNing language X) The appropriate IS operator The name of the operator being defined The left binding power The right binding power The formal parameters of DEFFIX are referenced from outside DEFFIX in a couple of cases, which makes it hard to figure out what's going on. The comments in the list of special variables should alleviate that somewhat. Incidentally, for development purposes, the LEARN-SPEAK-FORGET cluster of functions makes definition of a new language quite painless in CGOL. To learn definitions in language X, say LEARN "X" and everything defined from then on using *FIX or DEFINE will be invisible to CGOL. LEARN "" restores the standard effect of *FIX and DEFINE, which is to specify CGOL denotations. When you say SPEAK "X" the definitions learnt while LEARN "X" was in force take effect, over-riding any prior CGOL definitions of the defined tokens. Both LEARN and SPEAK work by rebinding variables. Thus the user can achieve their effect by doing his own binding. If he uses lambda- or prog-binding (CGOL makes lambda-binding more convenient than prog-binding) then the effect of the binding will be undone on exit from the expression doing the binding, whether the exit was normal or abnormal, which has obvious advantages over the side-effect-oriented LEARN and SPEAK functions. The conventions are: LEARN "X" binds CNUD to XNUD, CLED to XLED and CLBP to XLBP. Standard CGOL takes X as the empty string. SPEAK "X" binds NUDL, LEDL and LBPL to lists whose cdr is the previous value before the SPEAK and whose car is XNUD, XLED or XLBP respectively. (Hence SPEAKs may nest notations; thus if you say SPEAK "X"; SPEAK "Y" then notation Y takes priority over notation X which takes priority over standard CGOL notation.) FORGET undoes the effect of the last SPEAK; it is careful not to forget CGOL itself. If the user wishes to bind NUDL etc himself he does not have to nest notations in the way SPEAK does; in particular, he can eliminate CGOL notation altogether, say by binding NUDL to (XNUD), etc. However, care should be exercised here to avoid permanently cutting off LISP from the user in this way, e.g. by ending up with a language not powerful enough to return to CGOL or LISP. These operations have several applications. One is for those defining a whole new language. Another is for when a small and temporary notation change is required; for example, in writing an ALGOL parser, the infix ":" needs a different denotation inside array declarations than elsewhere; thus just before parsing its argument, an array declaration may say SPEAK "ARRAY" which will activate a definition of ":" made when LEARN "ARRAY" was in force. At the end, the array declaration should say FORGET. (Actually, in this example, it would be more appropriate to lambda-bind LEDL and LBPL to (ARRAYLED LED) and (ARRAYLBP LBP) respectively, so that the old bindings will be restored automatically whether exiting normally or abnormally from the array definition.) Other uses of LEARN, SPEAK and FORGET may be found in the files TC >, EG > and MSYN >, all on the directory PRATT; at MIT-AI. 8. LEXICAL EXTENSIONS The general CGOL rules for recognizing a token are that any alphanumeric string forms a token (whence two alphanumeric strings juxtaposed will look to CGOL like a single token), any string of digits with an optional period forms a token, any "..." forms a token, non-printing characters (space, tab, cr, lf) and comments (%...%) act as delimiters between consecutive tokens, and non-alphanumeric printing symbols (glyphs) form tokens by themselves. It is possible to extend the class of tokens to include arbitrary strings using the construct "newtok a,b,..." where a,b,... is a list of such strings, inside string quotes. CGOL itself uses the following: newtok ":=", "<=", ">=", "=#", "=?$", "<#", ">#";, "?$", "<=", ">=", "+#", "-#", "*#", "/#", "\\", "+?$", "-?$", "*?$", "/?$", ":N:", ":A:", ":V:", ":X:", ":^:" $ 7. COMPILATION Compilation of a CGOL file is done by invoking NCOMPL in the same way as for a file in standard notation. That is, at DDT you say :NCOMPL The LISP compiler knows about the incantation (CGOL)$ and evaluates it instead of outputting it. The autoload property of CGOL means that if CGOL is not already loaded into NCOMPL it will be. When =EXIT$ is encountered, the = forces CGOL to evaluate the EXIT immediately, returning NCOMPL to LISP notation. The upshot of all this is that provided each block of CGOL code in your file is preceded with (CGOL)$ and followed by =EXIT$ , NCOMPL will have no trouble understanding CGOL notation, even when mixed in with standard notation. Since you need these commands in the file to interpret it anyway, you should find that compiling a CGOL file requires no more massaging of the file than if you had been writing in LISP notation (e.g. declaring special variables, specifying types of variables). 8. CGOLPRINT All of the above deals with CGOLREAD. CGOLPRINT is also available, and supplies a good way to automatically convert standard notation to CGOL notation. Thus the LISP program (DO () (()) (CGOLPRINT (READ))) will read a sequence of LISP expressions and print them (on whatever output device is selected) in CGOL notation. CGOLPRINT, unlike PRINT, makes an effort to prettyprint the output. It uses an extremely efficient but simple-minded pretty-printing algorithm, avoiding the overhead of the GRIND program. 9. COMPARISON WITH MLISP The similarities between CGOL and Smith and Enea's MLISP are: (i) the use of ALGOL-like notation for LISP S-expressions (ii) the use of numeric operator precedence functions to resolve association problems. (iii) the ability to export LISP translations of MLISP/CGOL programs to sites supporting LISP but not MLISP/CGOL. The differences are: (i) MLISP is a sophisticated programming language offering many facilities not appearing in LISP. These facilities are only visible to the speaker of MLISP, and vanish if he wants to use them while speaking LISP. (Due to the ubiquity of LISP's oblist, the user can get at them from LISP, though they are undocumented and have names starting with & to identify them as sytem names not for general consumption.) Assignment to S-expressions is a particularly complex example. In contrast, CGOL offers nothing but an alternative notation for things already meant for consumption by LISP users. This enables CGOL to be very small, both with respect to its implementation and its manual. (ii) MLISP is a system that the user must call from the monitor, whereas CGOL is a package that can be loaded into LISP when the need arises. Hence a non-CGOL user can read a CGOL file without having to commit himself to a CGOL-oriented system when he loads LISP. In fact, when the I/O details are worked out as in Section 1.3 he may never know that he was reading a CGOL file. (iii) CGOL is considerably more extensible syntactically than MLISP I and considerably more efficient than MLISP II, which though extensible uses backup to an excessive degree. 10. EDITING CGOL PROGRAMS CGOL programs can be edited using the TECO editor just like LISP programs. For ITS users, a recent development in LISP and TECO has allowed LISP to have a copy of TECO as a subjob. The effect of this is as though LISP had its own resident editor with all the power of the TECO editor but with the added advantage that LISP can be more selective about what it reads out of a file after the file has been modified. In particular, TECO users with the appropriate macros (either the EMACS macros or for non-EMACS users the LISPT MACROS file on the .TECO.; directory of any ITS machine) can send a selected fragment of their file to LISP by saying m,nM.ZLISP$ where m and n specify the start and end of the buffer fragment to be sent to LISP. CGOL users can take advantage of this scheme in exactly the same way that LISP users can. Provided LISP's toplevel is expecting CGOL notation at the time the user calls TECO()$ (which invokes TECO if LISPT FASL DSK LIBLSP has been loaded), when the user returns from TECO with a string of CGOL expressions (as determined by the m,n argument to M.Z) LISP will correctly interpret those expressions. (Pending repair of a bug, CGOL users must type a space when they return to LISP, or LISP will do nothing. This should be the only noticeable difference between LISP and CGOL for LISPT users.) 11. MISCELLANEOUS Errors are dealt with exactly as in LISP, with the exception of syntactic errors, which CGOL tries to patch, reporting on the problem and the patch when it does so. Syntactic errors do not cause a breakpoint to be entered, but errors that would do so in LISP do so when using CGOL. While $P is one way (besides ^G) of exiting from a break loop, the CGOL notation requires that the "$" be "querified", the CGOL analog of slashifying. Also you need $ at the end rather than space. Thus you would type ?$p$ *********************************************************************** SAMPLE PROGRAMS The following programs were written by the author for a variety of purposes. Most of them deal with one or another computational complexity problem. (cgol)$ % Sample CGOL programs for matrix multiplication % define a "MM" b; % Mat multiply - square matrices % array a; array b; let d := car dim a; array c(d,d); for i in 1 to d do for k in 1 to d do (let ac = 0; for j in 1 to d do ac := ac + a(i,j)*b(j,k); c(i,k) := ac); c $ define a "MMR" b; % Mat multiply - rectangular matrices % array a; array b; let d1,d2 := {dim a}, d2b,d3 = {dim b}; if d2 ne d2b then "Mismatch" else (array c(d1,d3); for i in 1 to d1 do for k in 1 to d3 do (let ac = 0; for j in 1 to d2 do ac := ac + a(i,j)*b(j,k); c(i,k) := ac); c) $ =exit$ (cgol)$ % Maze-running program. Works for any number of dimensions. Instructions for use: Get a CGOL by saying :L CGOL; to DDT. uread maze > ai pratt Q init [3,3] try source survey Here is what the above commands mean. The "uread" selects the file containing the maze program, and the control Q causes CGOL to read the selected file. This has the effect of loading the maze program. Then "init [3,3]" initializes the maze to a 3 by 3 maze in which ALL paths are legal. "try source" will search for a path from source (defined by init to be (0,0)) to destination (defined by init to be (2,2), the opposite corner). Finally, "survey" causes a sequence of positions to be printed out, forming a path from source to destination. To vary this, it is possible to do, say, init [4,2,5] which will initialize "maze" to a 4 by 2 by 5 array. After doing "init" but before doing "try source", "maze" can be changed. To forbid a path through, say, point (1,2), say to CGOL: maze(1,2):="+" If you want to set up all the elements of maze rapidly to the maze -++ --+ +-- (where + means forbidden and - means allowed), say maze := !'(- + + - - + + - -) where the list of +'s and -'s is what you get by reading the maze left to right and then down. To see what maze looks like, say listarray 'maze' This will print out a list of the contents of the maze array, row by row but without delimiting one row from the next, in the same format as was used above. If you want to save a particular setting of the maze array for future use, just say y := listarray 'maze' To reset maze to that value, say maze := y You can also simply set y (or any other variable) to a maze by saying y := !'(- + + - - + + - -) A particular maze -++- ---+ -+-+ -+-- can be explored simply by saying whynot % '=array(maze,t,1)' $ define "TRY" here; % the maze-searching algorithm % here = destination or not(or{minusp[here]}) and and{lessp[here, dims]} % outside boundary? % and maze{here} = "-" % is this square new and legal? % and (maze{here}:="*"; % mark square to avoid revisiting % iter for one := stepsource step cdr one do % various directions % (for j in [plus[here,one], difference[here,one]] do % "up" or "down" % if try j then (maze{here}:=j; return t); % success: mark maze % if cadr one = 1 then return nil) % failure: return nil %) $ define "INIT" d; % initializes maze as per dimensions d. All moves legal. % dims := d; array{'maze'.t.dims}; stepsource := (\x;0)[dims]; % source of increment vector 1,0,0,... % car stepsource := 1; cdr last stepsource := stepsource; source := (\x;0)[dims]; destination := sub1[dims] $ define "SURVEY"; % prints path from source to destination % let x = source; while x ne destination do print x := maze{x} $ define "WHYNOT"; % sample non-trivial maze % init [4,4]; maze := !'(- + + - - - - + - + - + - + - -); try source; survey $ =exit$ (cgol)$ % Program to solve a system of linear Diophantine equations. To use, say solve(A,b,n) altmode where A is an mxn matrix, b is an m-vector, and n is an integer. Here m is the number of equations and n the number of variables. The result will be an nxp matrix N such that Nv is a solution in x for Ax=b for any integer v with unit first element. Morevover, all solutions are of the form Nv. The basic idea is to use Gauss-Jordan elimination, but to use Euclid's algorithm to guide the process. Euc(a) takes an integer vector a and returns a two-list (M g) where g is the gcd of the elements of a and a"M is a vector with g in the first coordinate and 0's elsewhere. (Here a" denotes the row vector a.) Euc is called once per equation, and the resulting matrices from each call are more or less multiplied together to give the final answer. Matrices are represented as lists of rows, rows in turn being represented as lists of numbers. % define "EUC"(a); % Euclid's algorithm % if null a then [nil,0] % should not be needed below % else if null cdr a then [[[1]],car a] else let M,g = {euc(cdr a)}, h = car a; M := (\i;0)[1 to length cdr a] . M; let c1 = 1 . (\i;0)[1 to length cdr a], c2 = car[M]; let ng, nc1, nc2 = {euc1(h,g,c1,c2)}; [cons[nc1,cons[nc2,cdr[M]]],ng] $ define "EUC1"(a,b,u,v); if b=0 then [a,u,v] else let d = a/:b; euc1(b, a-b*d, v, difference[u,(\x;x*d)[v]]) $ define "SOLVE"(Av, Iv, n); if null Av then (\x;0 . x)[Id(n)] else let a = car Av, i = car Iv; let m,g = {euc(a)}; if g = 0 then if i = 0 then solve(cdr Av, cdr Iv, n) else throw nil else if not g|i then throw nil else (m := scale1(m, i/:g); matmul(m, (1 . (\x;0)[2 to n]) . solve(cdr matmul(Av,cdr[m]), cdr(difference[Iv,car[matmul(Av,list[car[m]])]]), n-1))) $ define "SCALE1"(m, x); % Scale up col 1 of m by factor x % cons[(\y;y*x)[car[m]], cdr[m]] $ define "SCMUL"(a,b); % multiply matrix a by scalar b % (\q;(\p;p*b)[q])[a] $ define "MATMUL"(a,b); (\i;(\j;plus{times[i,j]})[list[{b}]])[a] $ define "ID"(n); % construct the nxn identity matrix % (\i;(\j;if i=j then 1 else 0)[1 to n])[1 to n] $ =exit$ (cgol)$ %FuperFonic Tranfport algorithm - a real treafure. Does FFT multiplication of integers. Though this algorithm is no faster than the Strassen-Schoenhage n log n log log n FFT integer multiplication, it is considerably simpler. % special n, ord, w, wi, g $ newtok "+:", "*:", "**:" $ infixm "+:" 20 is "SUM" $ infixm "*:" 21 is "BY" $ infixr "**:" 22 is "TOTHE" $ define lexpr "SUM"(argno); plus{arg[1 to argno]} rem n $ define lexpr "BY"(argno); times{arg[1 to argno]} rem n $ define "TOTHE"(a,b); if b = 0 then 1 else if oddp b then a *: (a**:(b-1)) else (a*:a)**:(b/:2) $ define "TOPOLY"(m,b,len); if len ne 0 then m mod b . topoly(m/:b, b, len-1) $ define "FROMPOLY"(p,b); if p then car p + b*frompoly(cdr p, b) else 0 $ define "EVENS" x; x and car x . evens cddr x $ % even half of vector % define "ODDS" x; evens cdr x $ % odd half of vector % define "FFT"(a,pw); % FFT of vector a; pw is vector of powers of w % if null cdr a then a else let efft = fft(evens a, evens pw), offt = fft(odds a, evens pw); sum[efft@efft, #mult[pw, offt@offt]] $ define "IFFT"(a,pwi); (\x;g*:x)[fft(a,pwi)] $ define a "MULT" b, 21; % n log n log log n integer multiplication % let m := max(a, b); if not bigp m then a*b else (let x = fix(log(35 * length cdr m)/log(2)); let n = 2**2**(x-1)+1, ord = 2**x, w = 2; let wi = w**:(ord-1), g = (n-1)/:ord*:(n-1); let m = wi; let pw = for i in 1 to ord collect m:=m*:w; let pwi = car pw . reverse cdr pw, dig = ord/:x; frompoly(ifft(#mult[fft(topoly(a,dig,ord),pw), fft(topoly(b,dig,ord),pw)], pwi), dig)) $ =exit$ (cgol)$ % This procedure tests the function defined as a bit string in array A to see if it has the (PISH,TUSH) property, namely that given any PISH variables, one can find at least TUSH functions of the remaining variables by appropriately diddling the PISH variables.% special mask, llv, nb, curr, token, left, pish, tush, n, nw $ % MASK is a 32-bit mask corresponding to a particular choice of variables LLV is a partition of the possible functions for a given choice of variables. Initially discrete, it is progressively refined as evidence arises for distinguishing functions. The partition is represented as a list of the non-singleton blocks, each of which is represented as a list of variables, each of which is represented as (ostensively) 1, 2, 4, 8, ... NB The number of blocks in LLV.% infix "^" 22 is "LSH" $ array(mp, fixnum, 5) $ % array of 32-bit masks % =let ibase = 2 in cgolread()$ % interprets following expr in binary % mp(0) := 10101010101010101010101010101010; mp(1) := 11001100110011001100110011001100; mp(2) := 11110000111100001111000011110000; mp(3) := 11111111000000001111111100000000; mp(4) := 11111111111111110000000000000000$ % Note convenience of not losing meaning of 2,3,4 in above % % Hack: starting from right, read each column as a 5-bit number. Recognize the sequence? These vectors are the columns arising in the method of truth tables for testing propositional tautologies. % define "TEST"; enum(pish, n-1, nil, [0], (-1)^(-4)) $ define "ENUM" (nv, ulim, vl, sl, mask); if nv=0 then (llv := [sl]; nb := 1; prin1(ulim+1); chek(vl, 0, 1^(n-5)); if nb < tush then (write "No." ; tyo(7); err())) else iter for i := ulim step i-1 until i = nv-2 do enum(nv-1, i-1, if i>4 then vl @ [1^(i-5)] else vl, sl @ mapcar('\x;x+1^i', sl), if i>4 then mask else mp(i) :A: mask) $ define "PARTITION"; llv := mapcan('makeq', llv) $ define "CHEK"(vl, ll, ul); if vl then iter for i := ll step i+car vl + car vl while i
    (B -> A)) . No promises are made about the behavior of the algorithm for expressions without well-defined types, as in type '\x;x(x)' . This algorithm is believed to have a bug. However, it works pretty well in general. % #prin1 := "PRINC" $ define "GENTYPE"; ascii(typecnt:=typecnt+1) $ define "PRETTY" x; if not car x then pretty cdr x else if not cdr x then princ(car x) else (princ("("); pretty car x; princ("->"); pretty cadr(x); princ(")")) $ define "UNIFY"(ta,tb); if null car ta then unify(cdr ta, tb) else if null car tb then unify(ta, cdr tb) else if cdr ta and cdr tb then ([unify(car ta, car tb), unify(cadr(ta), cadr(tb))]) else if cdr tb then unify(tb, ta) else (rplaca(tb, nil); rplacd(tb, ta)) $ define "LUNIFY"(ta,tb); if cdr ta then unify(car ta, tb) else (rplaca(ta, tb); rplacd(ta, [[gentype]])) $ define "CHAIN" tx; if car tx then tx else chain cdr tx $ define "RTYPE" lx; if lx isatom then eval lx else if car lx = "LAMBDA" then eval [["LAMBDA", cadr(lx), ["XCONS", ["LIST", 'rtype caddr(lx)'], chain caadr(lx)]], ["QUOTE", [gentype]]] else (new qq; lunify(qq := rtype car lx, rtype cadr(lx)); chain cadr(qq)) $ define "TYPE" lx; typecnt:=64; newline; pretty rtype lx; " " $ =exit$ (cgol)$ %Galil and Seiferas' (G&S) linear-time palstar finder. Here a palstar is a string of palindromes of length at least two (to avoid trivializing the problem). Thus test "ABBACCDEFED" would return ((A B B A) (C C) (D E F E D)) whereas test "ABBACCDEFE" would return NIL because there is no way to parse the input as a string of non-unit-length palindromes. % special n $ % Following takes advantage of CGOL to get around LISP's insistence that arrays start from 0 and index by 1. In the following, A and M start from -1 and R increments by 0.5 (the 0.5 is needed because R is the centroid of a palindrome, and if the palindrome is of even length the centroid will fall between two characters). All arrays can accept real arguments, again not offered by LISP. % prefix "A" 25 subst(right, "X", '#a(fix(x+1.1))') $ prefix "R" 25 subst(right, "X", '#r(fix(2*x+1.01))') $ prefix "L" 25 subst(right, "X", '#l(fix(x+.1))') $ prefix "U" 25 subst(right, "X", '#u(fix(x+.1))') $ prefix "V" 25 subst(right, "X", '#v(fix(x+.1))') $ prefix "M" 25 subst(right, "X", '#m(fix(x+1.1))') $ prefix "LINK" 25 subst(right, "X", '#link(fix(x+.1))') $ define "PALFIND" x, 1; % Manacher's on-line palindrome finder - used by G&S % x:=explodec x; array(#a, t, (n:=length x)+2); fillarray("A", nil.x@[[nil]]); array(#r, t, 1+2*length x); n := float n; r(-.5):= -.5; r 0 := 0.0; let c=.5, lt=0.0, rt=1.0; iter( while lt>-1 and rt 1 then let p = i + l i -1; if p 1 and m(i-1) then ( let x = i + l i - 1; if not m x then (m x := t; link x := fix i-1); if u i then (x := i+2*(l i)-2; if not m x then (m x := t; link x := fix i-1)); if v i then (x := i+2*(l i) ; if not m x then (m x := t; link x := fix i-1)))) $ define "PALSTAR" x, 10; % if x is palstar, returns its parse, else nil % palfind x; setl; setuv; mark; if m(n-1) then let i=n-1, z=nil; while i>=0 do (z := implode((\x;a x)[1+link i to i]) . z; i := link i); z $ % Following is useful in coming up with random test inputs % define "RANDSTR" n "ALPHABET" s, 19; % random string, length n, alphabet size s % implode((\x;ascii(65+random s))[1 to n]) $ =exit$ (cgol)$ %Salamin's elliptic integral algorithm for computation of pi. To get n digits of pi, say PI n . The answer will be an integer which is pi times some power of ten. The last digit may be off by 1. The algorithm is not coded for efficiency - it serves only to exhibit the principle of Salamin's algorithm, and to churn out a few paltry digits of pi for those who don't trust tables. As implemented below it is an O(n**2) algorithm, and takes about a minute on the AI machine to get 400 decimal digits. % %Auxiliary routine for integer square root % define a "ISQ" b, 19; new x; x := (b+a/b)/2; if |b-x| < 2 then x else a isq x $ % Integer square root of a - silly LISP's sqrt demands floating point! % define "ISQRT" a; a isq 2**(\base;flatsize(a))(4) $ % Summation of 2**j * c[j+1] ; also computes agm % define a "SIG" b, 19; if |a-b| < 2 then (agm := a; 0) else ((a-b)/2)**2 + 2*((a+b)/2 sig isqrt(a*b)) $ % print pi to n significant digits % define "PI" n, 1; new sum, agm; sum := 10**n sig isqrt(10**(2*n)/2); 1 + agm**2*10**(n-1)/(10**(2*n)/4-sum) $ =exit$ (cgol)$ % Rabin's Composite Detector. You can set k to whatever you want, and the probability that PRIME will return T erroneously is 1/2**k. It will never return NIL erroneously. Running time is cubic in the length of the input. % special q,prod,tw,a,olda,qp,left,n,k,sw,w,hw,stopwatch $ ?*nopoint t; stopwatch:=0; qp := 821; k := 20; w := 2**35; sw:=w-1; hw := w/2 $ alloc(!'(list (10000 12000 0.25) fixnum (10000 12000 0.90) bignum (10000 12000 0.9))) $ gc?-overflow := '\x;nil' $ define "TIM"; -stopwatch + stopwatch := runtime()/1000000.0 $ define a "BY" b, 21; a*b mod n $ define a "TOTHE" b, 22; % a**b mod n % if zerop b then 1 else if oddp b then a by (a tothe (b-1)) else (a by a) tothe (b/2) $ define a "TOTHEL" b, 22; % The list [a**b, a**(b/2), ..., a**(2x+1)] % if a=1 or b=0 then [1] else if oddp b then [a by (a tothe (b-1))] else new x; x := a tothel (b/2); if car x = 1 then x else car x by car x . x $ define "BRND" x; % Auxiliary routine for computing big random numbers% if bigp x then random(sw) + w*brnd(x/hw) else random(x) $ define "BIGRANDOM" x, 25; brnd(x) mod x $ define "CARMICHAEL" x ,12; if x then (car(x)-1 gcd n) ne 1 or Carmichael cdr x $ define "PRIME" n, 12; % Returns T if n is prime % if n<30 then n isin !'(2 3 5 7 11 13 17 19 23 29) else (n gcd 6469693230) = 1 and new kk,looksprime; kk:=0; looksprime:=t; while looksprime and kk prod of primes < x+1 % new prod; prod := 1; for i in 2 to x do if prime i then prod := prod*i; new tw,q; tw:=prod+qp; q:=1; tim; while t do (if testwin tw then (write "Prod(" cat x cat ")*" cat q cat "+" cat qp; if prime tw+6 then (princ " *"; if prime tw+8 then princ " *"); write tim cat " secs"); tw := tw+prod; q:=q+1) $ define "DOWN" n,1; % Searches down from 2**n for primes % new nn; nn:=2**n-1; while not prime nn do nn:=nn-2; write nn cat " = 2**" cat n cat "-" cat 2**n-nn $ =exit$ (cgol)$ %Linear time Sieve of Eratosthenes program. % % Due to Ross Gale and Vaughan Pratt. % % The sieve is in the array sv, each word of which is 36 bits, although only the rightmost 32 bits hold information. If 2*i+1 is composite then a 1 appears in position (i mod 32) of sv(i/32). (Position 0 is the least significant bit.) % % :A: is bitwise and, :V: is bitwise or, :^: is leftshift % define "PRIME" x,10; % predicate that looks up sieve % x=2 or oddp x and sv(x:^:-6):^:(35-(x:^:-1 :A: 31)) >= 0$ define "SIEVE" n,1; % construct a sieve for primes < n % array(sv,#fixnum,n/64+1); % the sieve, initially all 0's % sv(0):=1; % 1 is composite by convention % let s=[1]; % singleton set of numbers sieved so far % for i in 3 to n/2 by 2 do % enumerate those i % if prime i then % that are prime % (let ts=nil; % temporary accumulator for new s % for j in s do % j has been sieved, so j*i is composite for j1 % (let k=j; % k = j*i**z for some z % while k % % The following solves a problem raised by DVM shortly before I became almost unable to type this. A man thinks of two distinct numbers between 2 and 100. He tells the product to A and the sum to B, and asks them to reconstruct the original numbers. (i) A says he doesn't know. (ii) B then announces that he knew A didn't know. (iii) A retorts that now he knows. (iv) B's comeback is that now he knows too. What was the bus-driver's name? % % Here's the WWI ace practically unable to type this. What is he thinking? % % Answer: what indeed % % (Thinks). What numbers could B have been given? At least 5. 52,3, so not (ii). 62,4 so again not (ii). 72,5 or 3,4 , but 2,5 would give it away to A straight away. Hence B seeing n=7 couldn't be sure A didn't know. So we are looking for a number n no binary partition of which is into two distinct primes. 8=3+5, 9=2+7, 10=3+7, 11=???, 12=5+7, 13=2+11, 14=3+11, 15=2+13, 16=3+13, 17=???... In fact if n is odd and n-2 is not prime, we are clearly have such a number. Conversely, if n is odd and n-2 is prime we have failed. Now consider just even n. 18=5+13, 20=3+17, 22=3+19, 24=5+19, 26=3+23, 28=5+23, 30=7+23, 32=3+29,... We observe that 3-5-7 form a solid block, the only way to beat which is to find three consecutive odd non-primes. Weelll,... 91 115 117 119 121 141 143 183 185 are all that are in this category. But 94=11+83, 118=11+107, 120=11+109, 122=13+109, 124=17+107, 144=13+131, 146=19+127, 186=13+173, 188=31+157 (a sort of local success for Goldbach's conjecture). So this rules out the possibility of even n. Hence B must have seen odd n such that n-2 is not prime, or equivalently n such that n-2 is an odd composite. Thus n is one of 11 17 23 27 29 35 37 41 47 53 ... Now if A now knows the answer, it could have been (2,9), (3,8), (4,7),... which we note have the property that their products are each associated only with one n in the above list. (E.g., (5,6) is excluded because its product 30 is associated with not only 11=5+6 but 17=2+15.) But now B knows the answer, which would be impossible if it were any of (2,9), (3,8) or (4,7), since B's knowing the sum is 11 doesn't help here. More generally, we are looking for an n with exactly one binary partition whose product is not the product of the binary partition of some other n (always n is 11 17 23 27 ... as above.) So the following should work: for each n such that n-2 is odd composite do if there exist none or more than one partition of n such that no other factorizations of product of this partition have their sum-2 odd composite then reject n. Checking this for the first few values of n, we find: n=11: eliminated by argument above about (2,9), (3,8) and (4,7). n=17: 2*15=5*6, 3*14=2*21, 5*12=3*20, 6*11=2*33, 7*10=2*35, 8*9=3*24. This leaves only (4,13) for n=17, so (4,13) has to be one answer. We now check that there are no further answers as follows. % fasload:=nil $ load rab$ % prime number package % delim "BY" $ array(g,t,200); array(awful,fixnum,9901) $ define "SEARCH"; for n in 11 to 199 by 2 do if not prime(n-2) then (g(n) := nil; for j in 2 to n/2 do (jtnmj := j*(n-j); g(n) := jtnmj . g(n); awful(jtnmj):=awful(jtnmj)+1)); for n in 11 to 199 by 2 do if not prime(n-2) then (p := 0; for j in 2 to n/2 do if awful(j*(n-j))=1 then (p:=p+1; x:=j); if p=1 then (print n; princ x)) $ % Running this program yielded only n=17, x=4, so the pair of numbers had to be (4,13) % =exit$ (cgol)$ % Repetition finding algorithm; Pratt's variant of Weiner's algorithm % % The following paragraph defines some slick syntax which allows this code to be read in conjunction with my paper on this variant of Weiner's algorithm, available from the author at the MIT AI Lab. % "TAILS" of ":" := nil; newtok ":=" $ % so w:a is not confused with :a: % infixr "." 26 [["GET", left, ["QUOTE", "ONE"]], right] $ % the w.a property % infixr ":" 26 [["GET", left, ["QUOTE", "TWO"]], right] $ % the w:a property % prefix "*" 26 [["GET", xx:=right; check ":";right, ["QUOTE", "ABOVE"]], xx] $ % the *a:w property % literal suc, loc, len $ % Avoids having to write "SUC" etc % define "GENERATE"; %creates a new node in the graph% let x=ncons nil; "ONE" of x := array(nil,t,6); % the w.a property % "TWO" of x := array(nil,t,6); % the w:a property % "ABOVE" of x := array(nil,t,6); % the *a:w property % x $ define "CREATEVERTEX"; % what to do when a node to be visited isn't there % wa := generate; len of wa := 1 + len of w; loc of wa := w.a; w:a := wa; % connect w to wa via w's :a link % u := if w = eps then eps % u is to be wa's longest proper suffix in the graph % else (tt := suc of w; while null tt:a and tt ne eps do tt := suc of tt; tt:a or eps); c := a(loc of wa - len of u - 1); x := *c:u; suc of wa := u; % link wa to u % *c:u := wa; % link u to wa % if x then (suc of x := wa; % if x exists link it to wa % d := a(loc of x - len of wa - 1); *d:wa := x); if null x then wa.a(w.a) := w.a + 1 else for b in 0 to 5 do wa.b := x.b $ define "SCAN" x; % Main routine. X is a list of numbers in range [0,5] % n := length x; array(a,t,1+n); fillarray("A", cons(0, x)); % input % eps := generate; len of eps := 0; loc of eps := 1; suc of eps := eps; w := eps; i := 1; % initialization % while i <= n do % scan whole string % (a := a(i); % a abbreviates a(i) % if null w.a then % seen wa before? % (w.a := i+1; % no, record its position % if w ne eps then w := suc of w % then push [ pointer % else i := i+1) % or push both [ and ] % else (if null w:a then createvertex; % new node if only 2nd wa % w := w:a; i := i+1); % push ] pointer % print i; princ len of wi cat " " cat loc of w) $ % trace i,w % =exit$ (((((((((((((((((((((((((((((((((FIN)))))))))))))))))))))))))))))))))