XGC Ada supports the standard character sets defined in Ada 95 as well as several other non-standard character sets for use in localized versions of the compiler.
The basic character set is Latin-1. This character set is defined by ISO standard 8859, part 1. The lower half (character codes 16#00# ... 16#7F#) is identical to standard ASCII coding, but the upper half is used to represent additional characters. This includes extended letters used by European languages, such as the vowels with umlauts used in German, and the extra letter A-ring used in Swedish.
For a complete list of Latin-1 codes and their encodings, see the source of library unit Ada.Characters.Latin_1. You may use any of these extended characters freely in character or string literals. In addition, the extended characters that represent letters can be used in identifiers.
XGC Ada also supports several other eight-bit coding schemes:
Latin-2 letters allowed in identifiers, with uppercase and lowercase equivalence.
Latin-3 letters allowed in identifiers, with uppercase and lowercase equivalence.
Latin-4 letters allowed in identifiers, with uppercase and lowercase equivalence.
This code page is the normal default for PCs in the USA. It corresponds to the original IBM PC character set. This set has some, but not all, of the extended Latin-1 letters, but these letters do not have the same encoding as Latin-1. In this mode, these letters are allowed in identifiers with uppercase and lowercase equivalence.
This code page is a modification of 437 extended to include all the Latin-1 letters, but still not with the usual Latin-1 encoding. In this mode, all these letters are allowed in identifiers with uppercase and lowercase equivalence.
Any character in the range 80-FF allowed in identifiers, and all are considered distinct. In other words, there are no uppercase and lower case equivalences in this range. This is useful in conjunction with certain encoding schemes used for some foreign character sets (e.g. the typical method of representing Chinese characters on the PC).
No upper-half characters in the range 80-FF are allowed in identifiers. This gives Ada 83 compatibility for identifier names.
For precise data on the encodings permitted, and the uppercase and lower case equivalences that are recognized, see the file csets.adb in the XGC Ada compiler sources. You will need to obtain a full source release of XGC Ada to obtain this file.
XGC Ada allows wide character codes to appear in character and string literals, and also optionally in identifiers, using the following possible encoding schemes:
In this encoding, a wide character is represented by the following eight character sequence:
Where a, b, c, d are the four hexadecimal characters (using uppercase letters) of the wide character code. For example, ["A345"] is used to represent the wide character with code 16#A345#. This scheme is compatible with use of the full Wide_Character set, and is also the method used for wide character encoding in the standard ACVC (Ada Compiler Validation Capability) test suite distributions.
In this encoding, a wide character is represented by the following five character sequence:
Where a, b, c, d are the four hexadecimal characters (using uppercase letters) of the wide character code. For example, ESC A345 is used to represent the wide character with code 16#A345#. This scheme is compatible with use of the full Wide_Character set.
The wide character with encoding 16#abcd# where the upper bit is on (in other words, a is in the range 8 to F) is represented as two bytes, 16#ab# and 16#cd#. The second byte may never be a format control character, but is not required to be in the upper half. This method can be also used for shift-JIS or EUC, where the internal coding matches the external coding.
A wide character is represented by a two-character sequence, 16#ab# and 16#cd#, with the restrictions described for upper-half encoding as described above. The internal character code is the corresponding JIS character according to the standard algorithm for Shift-JIS conversion. Only characters defined in the JIS code set table can be used with this encoding method.
A wide character is represented by a two-character sequence 16#ab# and 16#cd#, with both characters being in the upper half. The internal character code is the corresponding JIS character according to the EUC encoding algorithm. Only characters defined in the JIS code set table can be used with this encoding method.
Note: Some of these coding schemes do not permit the full use of the Ada 95 character set. For example, neither Shift JIS, nor EUC allow the use of the upper half of the Latin-1 set.