From: ···········@mac.com
Subject: crossing FORMAT with Java
Date: 
Message-ID: <16a22b93-4d5d-46f5-87d3-30e8373a6026@x41g2000hsb.googlegroups.com>
One of the interesting aspects of CLforJava development is choosing
what to build in Lisp and what in Java. The results occasionally
provide a different look at an old function. The problem is the
processing of FORMAT strings - a dirty job, but someone has to do it.

The Hyperspec has descriptions of the format operations, but it does
not provide the basic components of any programming language - the
lexing and parsing rules. Here's some Java code that's mostly about
lexing a format string. In the comments, I sketched a possible BNF for
format strings. It's likely that the Java lexing code will go into
CLforJava, and the parsing component will be written in Lisp.

Aside from the drubbing the post will get (since it's not just Lisp),
I'd appreciate folks looking at the regex (it's a bit complex) and the
BNF. Thanks!

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
/
******************************************************************************
// Basic Output //
~{(:)(@)}C -- Character 1, (mod 4) (4 5-6) or (7 8-9)
~{n}% -- newline 10, (mod 11)
~{n}& -- fresh-line 12, (mod 13)
~{n}| -- new page 14, (mod 15)
~{n}~ -- tilda 16, (mod 17)

// Radix Control //
~{(radix(,mincol(,padchar(,commachar(,comma-interval)))))(:)(@)}R
~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}D
~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}B
~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}O
~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}X

// Floating Point Printers //
~{(width(,digits-after(,scale-factor(,overflowchar(,padchar)))))(:)
(@)}F
~{(width(,digits-after(,scale-factor(,overflowchar(,padchar,
(exponentchar))))))(:)(@)}E
~{(width(,digits-after(,scale-factor(,overflowchar(,padchar,
(exponentchar))))))(:)(@)}G
~{(d(,n(,w(,padchar(,sign)))))(:)(@)}$

// Printer Operations //
~{(mincol(,colinc(,minpad(,padchar))))((:)(@))}A
~{(mincol(,colinc(,minpad(,padchar))))((:)(@))}S
~{((:)(@))}W

// Pretty Printer Operations //
~{((:)(@))}_
--> logical block handled with ~< ~:>
~{n(:)}I
~/ --> closed by a /

// Layout Control //
~{(colnum(,colinc))((:)(@))}T
~{(colnum(,colinc(,minpad(,padchar))))((:)(@))}<
~>

// Control Flow Operations //
~{n((:)(@))}*
~{(:)(@)}[
~]
~{((:)(@))}{
~{(:)}}

// Miscellaneous Operations //
~{((:)(@))}(
~)
~{((:)(@))}P

// Miscellaneous Pseudo-Operations //
~;
~{(n(,m(,o)))((:)(@))}^
~{((:)(@))}\n

// and for the rest
 (.*)

*************************************************************************
 * BNF for the format directives
 *
 * FormatDirective := SimpleDirective | CompoundDirective | Text
 * SimpleDirective := Character | Newline | FreshLine | NewPage |
Tilda |
 *                    RadixDirective | FloatingPointDirective |
PrinterOpDirective |
 *                    Tabulate | IgnoredNewline | Recursion | GoTo |
 *                    ConditionalNewline
 * RadixDirective := GeneralRadix | DecimalRadix | OctalRadix |
BinaryRadix | HexRadix
 * FloatingPointDirective := FixedFloat | ExponentFloat | GeneralFloat
| CurrencyFloat
 *
 * PrinterOpDirective := Aesthetic | Standard | Write
 *
 * CompoundDirectives := Justification | Conditional | Iteration |
LogicalBlock
 *
 * Justification := TildaLessThan JustificationSegments
TildaGreaterThan
 * TildaLessThan := '~<'
 * TildaGreaterThan := '~>'
 * JustificationSegments := RegularJustificationSegments |
SpecialJustificationSegments
 * RegularJustificationSegments := FormatDirective ( TildaColonOption
FormatDirective )*
 * SpecialJustificationSegments := TildaColonSemiColon
RegularJustificationSegments
 * TildaSemiColon := '~;'
 * TildaColonSemiColon := '~:;'
 * TildaColonOption := TildaSemiColon | TildaColonSemiColon
 *
 * LogicalBlock := TildaLessThan PPrintLogicalBlock
TildaColonGreaterThan
 * TildaColonGreaterThan := '~:>'
 *
 * Conditional := ((ConditionalStartN ConditionalClauses
OptionalClause?) |
 *                 (ConditionalStart2 ConditionalClause
ConditionalClause) |
 *                 (ConditionalStart1 ConditionalClause))
ConditionalEnd
 * ConditionalStartN  := '~['
 * ConditionalStart2  := '~:['
 * ConditionalStart1  := ··@['
 * ConditionalClauses := (ConditionalClause TildaSemiColon)*
ConditionalClause
 * ConditionalClause  := FormatDirective
 * ConditionalEnd     := '~]'
 *
 * Interation := TildaOpenBrace FormatDirective TildaCloseBrace
 * TildaOpenBrace  := '~{' // note: This may have prefixes, : @
 * TildaCloseBrace := TildaSimpleCloseBrace | TildaColonCloseBrace
 * TildaSimpleCloseBrace := '~}'
 * TildaColonCloseBrace  := '~:}'
 *
 * Text := [^~]*
 *
*************************************************************************/

import java.util.regex.Pattern;
import java.util.regex.Matcher;

import java.util.HashMap;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;

public class FormatPatterns {
    // Parameterized args
    static final String FromArglist = "(V)";
    static final String ArglistCount = "(#)";
    static final String VSharpParameter = FromArglist + '|' +
ArglistCount;

    // unsigned number
    static final String UnsignedNumberString = "\\p{javaDigit}+";
    static final String SignedNumberString = "(\\+|-)?" +
UnsignedNumberString;
    static final String DecimalNumberString = "(" + SignedNumberString
+ ")?";
    static final String ParameterizedNumberString = "(" +
DecimalNumberString + '|' + VSharpParameter + ')';
    static final String OptionalDecimalNumberString = "(," +
ParameterizedNumberString + ")?";

    static final String ColonString = "(:)?";
    static final String AtSignString = "(@)?";
    static final String EitherColonAtSign = "((:)|(@))?";
    static final String AtSignColonString = "(" + ColonString + "|" +
AtSignString + "|(·@·@:)?)";
    static final String Comma = ",";
    static final String Tilda = "~";
    static final String CharacterString = "(,((\'(.))|" +
ParameterizedNumberString + ")?)?";

    static final String RadixString         =
ParameterizedNumberString;
    static final String MincolString        =
ParameterizedNumberString;
    static final String InnerMincolString   =
OptionalDecimalNumberString;
    static final String ColincString        =
OptionalDecimalNumberString;
    static final String MinpadString        =
OptionalDecimalNumberString;
    static final String CommaIntervalString =
OptionalDecimalNumberString;
    static final String PadCharString       = CharacterString;
    static final String CommaCharString     = CharacterString;
    static final String PPrintNString       =
ParameterizedNumberString;
    static final String ColnumString        =
ParameterizedNumberString;

    static final String WidthString        =
OptionalDecimalNumberString;
    static final String InnerDigitsAfterString  =
OptionalDecimalNumberString;
    static final String ScaleFactorString  =
OptionalDecimalNumberString;
    static final String OverFlowCharString = CharacterString;
    static final String ExponentCharString = CharacterString;
    static final String DigitsAfterString  =
ParameterizedNumberString;
    static final String DigitsBeforeString =
OptionalDecimalNumberString;
    static final String InnerWidthString   =
OptionalDecimalNumberString;
    static final String SignCharString     = CharacterString; // an
extension


    // Basic Output
    //~{(:)(@)}C -- Character
    //~{n}% -- newline
    //~{n}& -- fresh-line
    //~{n}| -- new page
    //~{n}~ -- tilda
    static final String CharacterFormatString = "(" + Tilda +
ParameterizedNumberString + AtSignColonString + "C" + ")";
    static final String NewlineString = "(" + Tilda +
ParameterizedNumberString + "%" + ")";
    static final String FreshLineString = "(" + Tilda +
ParameterizedNumberString + "&" + ")";
    static final String NewPageString = "(" + Tilda +
ParameterizedNumberString + "\\|" + ")";
    static final String TildaString = "(" + Tilda +
ParameterizedNumberString + "\\~" + ")";

    // Radix Control
    //~{(radix(,mincol(,padchar(,commachar(,comma-interval)))))(:)
(@)}R
    //~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}D
    //~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}B
    //~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}O
    //~{(mincol(,padchar(,commachar(,comma-interval))))(:)(@)}X
    static final String RadixFormatString =
            "(" + Tilda + RadixString + InnerMincolString +
PadCharString + CommaCharString +
            CommaIntervalString + AtSignColonString + "R" + ")";
    static final String DecimalFormatString =
            "(" + Tilda + MincolString + PadCharString +
CommaCharString + CommaIntervalString + AtSignColonString + "D" + ")";
    static final String BinaryFormatString =
            "(" + Tilda + MincolString + PadCharString +
CommaCharString + CommaIntervalString + AtSignColonString + "B" + ")";
    static final String OctalFormatString =
            "(" + Tilda + MincolString + PadCharString +
CommaCharString + CommaIntervalString + AtSignColonString + "O" + ")";
    static final String HexFormatString =
            "(" + Tilda + MincolString + PadCharString +
CommaCharString + CommaIntervalString + AtSignColonString + "X" + ")";

    // Floating Point Printers //
    //~{(width(,digits-after(,scale-factor(,overflowchar(,padchar)))))
(:)(@)}F
    //~{(width(,digits-after(,scale-factor(,overflowchar(,padchar,
(exponentchar))))))(:)(@)}E
    //~{(width(,digits-after(,scale-factor(,overflowchar(,padchar,
(exponentchar))))))(:)(@)}G
    //~{(d(,n(,w(padchar(,sign)))))(:)(@)}$
    static final String FloatingPointFString =
            "(" + Tilda + WidthString + InnerDigitsAfterString +
ScaleFactorString +
            OverFlowCharString + PadCharString + AtSignColonString +
"F" + ")";
    static final String FloatingPointEString =
            "(" + Tilda + WidthString + InnerDigitsAfterString +
ScaleFactorString +
            OverFlowCharString + PadCharString + AtSignColonString +
ExponentCharString + "E" + ")";
    static final String FloatingPointGString =
            "(" + Tilda + WidthString + InnerDigitsAfterString +
ScaleFactorString +
            OverFlowCharString + PadCharString + AtSignColonString +
ExponentCharString + "G" + ")";
    static final String FloatingPointCurrencyString =
            "(" + Tilda + DigitsAfterString + DigitsBeforeString +
InnerWidthString +
            PadCharString + SignCharString + AtSignColonString + "\\$"
+ ")";

    // Printer Operations //
    //~{(mincol(,colinc(,minpad(,padchar))))((:)(@))}A
    //~{(mincol(,colinc(,minpad(,padchar))))((:)(@))}S
    //~{((:)(@))}W
    static final String PrinterAString =
            "(" + Tilda + MincolString + ColincString + MinpadString +
PadCharString + AtSignColonString + "A" + ")";
    static final String PrinterSString =
            "(" + Tilda + MincolString + ColincString + MinpadString +
PadCharString + AtSignColonString + "S" + ")";
    static final String PrinterWString = "(" + Tilda +
AtSignColonString + "W" + ")";

    // Pretty Printer Operations //
    //~{((:)(@))}_
    //--> logical block handled with ~< ~:>
    //~{n(:)}I
    //~/ --> closed by a /
    static final String ConditionalNewlineString = "(" + Tilda +
AtSignColonString + "_" + ")";
    static final String IndentString = "(" + Tilda + PPrintNString +
ColonString + "I" + ")";
    static final String CallFunctionString = "(" + Tilda + "/(.+)/" +
")";

    // Layout Control //
    //~{(colnum(,colinc))((:)(@))}T
    //~{(colnum(,colinc(,minpad(,padchar))))((:)(@))}<
    //~>
    static final String TabulateString = "(" + Tilda + ColnumString +
ColincString + AtSignColonString + "T" + ")";
    static final String JustificationString =
            "(" + Tilda + ColnumString + ColincString + MinpadString +
PadCharString + AtSignColonString + "<" + ")";
    static final String EndJustificationString = "(" + Tilda + ">" +
")";


    // Control Flow Operations //
    //~{n((:)(@))}*
    //~{(:)(@)}[
    //~]
    //~{((:)(@))}{
    //~{(:)}}
    static final String GoToString               = "(" + Tilda +
ParameterizedNumberString + AtSignColonString + "\\*" + ")";
    static final String ConditionalExprString    = "(" + Tilda +
EitherColonAtSign + "\\[" + ")";
    static final String EndConditionalExprString = "(" + Tilda + "\\]"
+ ")";
    static final String IterationString          = "(" + Tilda +
AtSignColonString + "\\{" + ")";
    static final String EndIterationString       = "(" + Tilda +
ColonString + "\\}" + ")";

    // Miscellaneous Operations //
    //~{((:)(@))}(
    //~)
    //~{((:)(@))}P
    static final String CaseConversionStartString = "(" + Tilda +
AtSignColonString + "\\(" + ")";
    static final String CaseConversionEndString   = "(" + Tilda + "\
\)" + ")";
    static final String PluralString              = "(" + Tilda +
AtSignColonString + "P" + ")";

    // Miscellaneous Pseudo-Operations //
    //~;
    //~{(n(,m(,o)))((:)(@))}^
    //~{((:)(@))}\n
    static final String ClauseSeparatorString = "(" + Tilda + ";" +
")";
    static final String EscapeUpwardString = "(" + Tilda + "\\^" +
")"; // has other options ??
    static final String EndOfLine = "(" + Tilda + "$" + ")";

    // Error directives
    static final String AllDirectiveLetters = "C%&|~RDBOXFEG
$ASW_IT*<>[]{}()P;^\n";
    static final String ErrorTildas = "(" + Tilda + AtSignColonString
+
            "[^" + "C%&|~RDBOXFEG\\$ASW_IT/*<>\\[\\]{}()P;\\^\\n" +
"]" + ")";

    // And all of the rest of the text
    static final String TextString = "([^~]+)";

    // Now to 'string' everything together
    static final String FormatRegExString =
            CharacterFormatString + "|" +
            NewlineString + "|" +
            FreshLineString + "|" +
            NewPageString + "|" +
            TildaString + "|" +
            RadixFormatString + "|" +
            DecimalFormatString + "|" +
            BinaryFormatString + "|" +
            OctalFormatString + "|" +
            HexFormatString + "|" +
            FloatingPointFString + "|" +
            FloatingPointEString + "|" +
            FloatingPointGString + "|" +
            FloatingPointCurrencyString + "|" +
            PrinterAString + "|" +
            PrinterSString + "|" +
            PrinterWString + "|" +
            ConditionalNewlineString + "|" +
            IndentString + "|" +
            TabulateString + "|" +
            JustificationString + "|" +
            EndJustificationString + "|" +
            GoToString + "|" +
            ConditionalExprString + "|" +
            EndConditionalExprString + "|" +
            IterationString + "|" +
            EndIterationString + "|" +
            CaseConversionStartString + "|" +
            CaseConversionEndString + "|" +
            PluralString + "|" +
            ClauseSeparatorString + "|" +
            EscapeUpwardString + "|" +
            EndOfLine + "|" +
            CallFunctionString + "|" +
            ErrorTildas + "|" +
            TextString;

    private static final HashMap<Integer, ParseToken> map = new
HashMap<Integer, ParseToken>();

    private static Pattern pattern =
        Pattern.compile(FormatRegExString, Pattern.UNICODE_CASE +
Pattern.CASE_INSENSITIVE + Pattern.CANON_EQ);

    private static final int CALIBRATION[];
    static {
        int[] calibration = new int[AllDirectiveLetters.length() + 3];
        int index = -1;
        for (index = 0; index < AllDirectiveLetters.length(); index++)
{
            Matcher matcher = pattern.matcher("~" +
AllDirectiveLetters.charAt(index));
            matcher.find();
            int groupCount = matcher.groupCount();
            for (int group = 1; group <= groupCount; group++) {
                if (matcher.group(group) != null) {
                    calibration[index] = group;
                    break;
                }
            }
        }
        // fill in 3 non-singular ~m
        Matcher matcher = pattern.matcher("~" + "/x/");
        matcher.find();
        int groupCount = matcher.groupCount();
        for (int group = 1; group <= groupCount; group++) {
            if (matcher.group(group) != null) {
                calibration[index++] = group;
                break;
            }
        }
        matcher = pattern.matcher("~" + "v");
        matcher.find();
        groupCount = matcher.groupCount();
        for (int group = 1; group <= groupCount; group++) {
            if (matcher.group(group) != null) {
                calibration[index++] = group;
                break;
            }
        }
        matcher = pattern.matcher("some text");
        matcher.find();
        groupCount = matcher.groupCount();
        for (int group = 1; group <= groupCount; group++) {
            if (matcher.group(group) != null) {
                calibration[index++] = group;
                break;
            }
        }

        CALIBRATION = calibration;
    }

    // Defining a set of enums for naming the parsed tokens
    public enum ParseToken {
        CharacterFormat         (CALIBRATION[0]), // ·······@,
        Newline                 (CALIBRATION[1]),
        FreshLine               (CALIBRATION[2]),
        NewPage                 (CALIBRATION[3]),
        Tilda                   (CALIBRATION[4]),
        RadixFormat             (CALIBRATION[5]),
        DecimalFormat           (CALIBRATION[6]),
        BinaryFormat            (CALIBRATION[7]),
        OctalFormat             (CALIBRATION[8]),
        HexFormat               (CALIBRATION[9]),
        FloatingPointF          (CALIBRATION[10]),
        FloatingPointE          (CALIBRATION[11]),
        FloatingPointG          (CALIBRATION[12]),
        FloatingPointCurrency   (CALIBRATION[13]),
        PrinterA                (CALIBRATION[14]),
        PrinterS                (CALIBRATION[15]),
        PrinterW                (CALIBRATION[16]),
        ConditionalNewline      (CALIBRATION[17]),
        Indent                  (CALIBRATION[18]),
        Tabluate                (CALIBRATION[19]),
        Justification           (CALIBRATION[20]),
        EndJustification        (CALIBRATION[21]),
        GoTo                    (CALIBRATION[22]),
        ConditionalExpr         (CALIBRATION[23]),
        EndConditionalExpr      (CALIBRATION[24]),
        Iteration               (CALIBRATION[25]),
        EndIteration            (CALIBRATION[26]),
        CaseConversionStart     (CALIBRATION[27]),
        CaseConversionEnd       (CALIBRATION[28]),
        Plural                  (CALIBRATION[29]),
        ClauseSeparator         (CALIBRATION[30]),
        EscapeUpward            (CALIBRATION[31]),
        EndOfLine               (CALIBRATION[32]),
        CallFunction            (CALIBRATION[33]),
        Error                   (CALIBRATION[34]),
        Text                    (CALIBRATION[35]);

        private final int group;

        ParseToken(int group) {
            this.group = group;
            FormatPatterns.map.put(group, this);
        }

        public int getGroup() {
            return group;
        }
    }

    public static ParseToken getByGroupNumber(int number) {
        return FormatPatterns.map.get(number);
    }

    private static ParseToken setup = ParseToken.Text;

    public static void printRegex(String input) {
        Matcher matcher = pattern.matcher(input);
        while (matcher.find()) {
            int groupCount = matcher.groupCount();
            System.out.println("---->");
            for (int group = 1; group <= groupCount; group++) {
                if (matcher.group(group) != null) {
                    System.out.println("* " +
getByGroupNumber(group));
                }
                String content = matcher.group(group);
                if (content != null) {
                    System.out.println("Group number: " + group);
                    System.out.println("Content: " + content);
                }
            }
        }
    }

    public static void main(String[] args) {
        if (args.length > 0) {
        } else {
            BufferedReader br = null;
            try {
                br = new BufferedReader(new
InputStreamReader(System.in));
                String test = br.readLine();
                while (test.length() > 0) {
                    printRegex(test);
                    test = br.readLine();
                }
            } catch (IOException ex) {
                System.out.println("Oops: " + ex);
            }
        }
    }
}

From: ···········@mac.com
Subject: Re: crossing FORMAT with Java
Date: 
Message-ID: <3c80a345-5a1d-4623-a170-9248e5dc5c5a@x35g2000hsb.googlegroups.com>
One thing to note about the regex. It uses the Java 5 regex module. It
has the ability to deal with Unicode. In particular, when the regex is
expecting a number, it is using the Java isDigit function that
recognizes digits in other Unicode blocks than ASCII. So it's possible
to specify widths for example in Hindi.
From: Kent M Pitman
Subject: Re: crossing FORMAT with Java
Date: 
Message-ID: <uod6bay1m.fsf@nhplace.com>
···········@mac.com writes:

> The Hyperspec has

(The HyperSpec is just a private/proprietary webbing of the ANSI
standard.  ANSI CL is the thing that contains the information you're
commenting on with any sense of authority.)

> descriptions of the format operations, but it does
> not provide the basic components of any programming language - the
> lexing and parsing rules.

Lisp generally may not do lexing and parsing in the way you imagine,
which is why it doesn't publish such rules.  Personally, I would not
implement FORMAT as you have described.  It's fine to offer code that
chooses as an optional matter to view the rules of FORMAT in this way,
but it's not fine to suggest that they are essential and that their
omission is some kind of obvious gaffe.  

I personally don't like the notion that something can be called a basic
component when it's not even required at all.

The Lisp Reader also doesn't have a lex grammar either and that's
because it is not parsed in the classical sense, it is parsed by READ.

Indeed, I might be misremembering (it's been a while since I looked and I
was too lazy to look it up, so I'll let someone correct me if I'm wrong),
but I don't think there's anything in the language that forbids you
from putting in syntactically invalid expressions that are unreachable.
For example, I would be careful about the parsing approach in format 
strings because it seems to suggest that there is a parse-phase that
must succeed before execution, and I don't recall any such requirement.
e.g., I think

(format t "~:[foo: ~A~;bar: ~`~]" nil 3)

should succeed even though ~` is undefined.  So should

(format t "~:[foo: ~A~;bar: ~,,,,,,,,,,,,,,,,,,,,,,,,,$~]" nil 3)

I have seen implementations warn about such things as style warnings.
And I think that's ok as long as when it comes time to implement the
code, the right thing happens.

> Here's some Java code that's mostly about
> lexing a format string. In the comments, I sketched a possible BNF for
> format strings. It's likely that the Java lexing code will go into
> CLforJava, and the parsing component will be written in Lisp.
> 
> Aside from the drubbing the post will get (since it's not just Lisp),

It doesn't bother me that it's not Lisp.  It bothers me that the code
you offer, even if you'd offered it in Lisp syntax, doesn't implement
what I think of as the computational model that Lisp has for this.
(Of course, I'm doing this from memory as I head off to bed--I didn't 
research this, recently at least. So maybe I'm wrong.)

Of course, you could say you don't care and that you're going to just 
define it this way anyway.  And that might be ok for your purpose.
What caught my eye was just the criticism that something basic had been
left out.  Probably basic things were left out of the standard.  I just
don't think this is one of them.
From: vanekl
Subject: Re: crossing FORMAT with Java
Date: 
Message-ID: <g2j85r$pur$1@aioe.org>
···········@mac.com wrote:
> One of the interesting aspects of CLforJava development is choosing
> what to build in Lisp and what in Java. The results occasionally
> provide a different look at an old function. The problem is the
> processing of FORMAT strings - a dirty job, but someone has to do it.
> 
> The Hyperspec has descriptions of the format operations, but it does
> not provide the basic components of any programming language - the
> lexing and parsing rules. Here's some Java code that's mostly about
> lexing a format string. In the comments, I sketched a possible BNF for
> format strings. It's likely that the Java lexing code will go into
> CLforJava, and the parsing component will be written in Lisp.
> 
> Aside from the drubbing the post will get (since it's not just Lisp),
> I'd appreciate folks looking at the regex (it's a bit complex) and the
> BNF. Thanks!
snip

if you want to see how it's done, why not look at the code?
you can check out sbcl source and see how FORMAT's parsed in
   src/code/target-format.lisp