What Is a Legal Java Identifier?

<-back


Java may use the character a-z, A-Z and underscore '_' as first letter and also 0-9 as non-first letter, but not just these characters. Java may use a lot of unicode characters that are illegal in other programming languages.

To print all legal characters that may be used as first character in an identifier and all legal characters in non-first characters, use this program:

public class TestUnicode {
 
    public static void main(String[] args) {
        boolean oldIsJavaIdentifierStart = false;
        boolean currentIsJavaIdentifierStart = false;
        int nOldCodePoint = 0;
        int nStartCodePoint = 0;
        System.out.println("JavaIdentifierStart:");
        for ( int nCodePoint=0; nCodePoint<Short.MAX_VALUE; ++nCodePoint ) {
            currentIsJavaIdentifierStart = Character.isJavaIdentifierStart(nCodePoint);
            if ( !oldIsJavaIdentifierStart && currentIsJavaIdentifierStart ) {
                nStartCodePoint = nCodePoint;
            }
            if ( oldIsJavaIdentifierStart && !currentIsJavaIdentifierStart ) {
                char[] startChars = Character.toChars(nStartCodePoint);
                char[] partChars = Character.toChars(nOldCodePoint);
                System.out.println("   \\u" + toHex(nStartCodePoint) + " " + startChars[0] + " ... \\u" + toHex(nOldCodePoint) + " " + partChars[0]);
            }
            nOldCodePoint = nCodePoint;
            oldIsJavaIdentifierStart = currentIsJavaIdentifierStart;
        }
 
        oldIsJavaIdentifierStart = false;
        currentIsJavaIdentifierStart = false;
        nOldCodePoint = 0;
        nStartCodePoint = 0;
        System.out.println("JavaIdentifierPart:");
        for ( int nCodePoint=0; nCodePoint<Short.MAX_VALUE; ++nCodePoint ) {
            currentIsJavaIdentifierStart = Character.isJavaIdentifierPart(nCodePoint);
            if ( !oldIsJavaIdentifierStart && currentIsJavaIdentifierStart ) {
                nStartCodePoint = nCodePoint;
            }
            if ( oldIsJavaIdentifierStart && !currentIsJavaIdentifierStart ) {
                char[] startChars = Character.toChars(nStartCodePoint);
                char[] partChars = Character.toChars(nOldCodePoint);
                System.out.println("   \\u" + toHex(nStartCodePoint) + " " + startChars[0] + " ... \\u" + toHex(nOldCodePoint) + " " + partChars[0]);
            }
            nOldCodePoint = nCodePoint;
            oldIsJavaIdentifierStart = currentIsJavaIdentifierStart;
        }
    }
 
    public static String toHex( int nCodePoint ) {
        StringBuilder sHex = new StringBuilder( Integer.toHexString(nCodePoint) );
        if ( sHex.length() == 0 ) {
            sHex.insert(0, "0000");
        }
        if ( sHex.length() == 1 ) {
            sHex.insert(0, "000");
        }
        if ( sHex.length() == 2 ) {
            sHex.insert(0, "00");
        }
        else if ( sHex.length() == 3 ) {
            sHex.insert(0, "0");
        }
        return sHex.toString();
    }
}

That would produce something like:

JavaIdentifierStart:
   \u0024 $ ... \u0024 $
   \u0041 A ... \u005a Z
   \u005f _ ... \u005f _
   \u0061 a ... \u007a z
   \u00a2 ¢ ... \u00a5 ¥
   \u00aa ª ... \u00aa ª
   \u00b5 µ ... \u00b5 µ
   \u00ba º ... \u00ba º
   \u00c0 À ... \u00d6 Ö
   \u00d8 Ø ... \u00f6 ö
   \u00f8 ø ... \u02c1 ˁ
   \u02c6 ˆ ... \u02d1 ˑ
   \u02e0 ˠ ... \u02e4 ˤ
   \u02ec ˬ ... \u02ec ˬ
   \u02ee ˮ ... \u02ee ˮ
   \u0370 Ͱ ... \u0374 ʹ
   \u0376 Ͷ ... \u0377 ͷ
   \u037a ͺ ... \u037d ͽ
   \u0386 Ά ... \u0386 Ά
...
JavaIdentifierPart:
   \u0000
   \u000e  ... \u001b 
   \u0024 $ ... \u0024 $
   \u0030 0 ... \u0039 9
   \u0041 A ... \u005a Z
   \u005f _ ... \u005f _
   \u0061 a ... \u007a z
   \u007f  ... \u009f Ÿ
   \u00a2 ¢ ... \u00a5 ¥
   \u00aa ª ... \u00aa ª
   \u00ad ­ ... \u00ad ­
   \u00b5 µ ... \u00b5 µ
   \u00ba º ... \u00ba º
   \u00c0 À ... \u00d6 Ö
   \u00d8 Ø ... \u00f6 ö
   \u00f8 ø ... \u02c1 ˁ
   \u02c6 ˆ ... \u02d1 ˑ
   \u02e0 ˠ ... \u02e4 ˤ
   \u02ec ˬ ... \u02ec ˬ
   \u02ee ˮ ... \u02ee ˮ
   \u0300 ̀ ... \u0374 ʹ
   \u0376 Ͷ ... \u0377 ͷ
   \u037a ͺ ... \u037d ͽ
   \u0386 Ά ... \u0386 Ά
   \u0388 Έ ... \u038a Ί
...

<-back

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License