Variables: Symbols representing a charset. | |
Each of the following symbols represents a predefined charset. | |
MSymbol | Mcharset_ascii |
Symbol representing the charset ASCII. | |
MSymbol | Mcharset_iso_8859_1 |
Symbol representing the charset ISO/IEC 8859/1. | |
MSymbol | Mcharset_unicode |
Symbol representing the charset Unicode. | |
MSymbol | Mcharset_m17n |
Symbol representing the largest charset. | |
MSymbol | Mcharset_binary |
Symbol representing the charset for ill-decoded characters. | |
Variables: Parameter keys for mchar_define_charset(). | |
These are the predefined symbols to use as parameter keys for the function mchar_define_charset() (which see). | |
MSymbol | Mmethod |
MSymbol | Mdimension |
MSymbol | Mmin_range |
MSymbol | Mmax_range |
MSymbol | Mmin_code |
MSymbol | Mmax_code |
MSymbol | Mascii_compatible |
MSymbol | Mfinal_byte |
MSymbol | Mrevision |
MSymbol | Mmin_char |
MSymbol | Mmapfile |
MSymbol | Mparents |
MSymbol | Msubset_offset |
MSymbol | Mdefine_coding |
MSymbol | Maliases |
Variables: Symbols representing charset methods. | |
These are the predefined symbols that can be a value of the Mmethod parameter of a charset used in an argument to the mchar_define_charset() function.
A method specifies how code-points and character codes are converted. See the documentation of the mchar_define_charset() function for the details. | |
MSymbol | Moffset |
Symbol for the offset type method of charset. | |
MSymbol | Mmap |
Symbol for the map type method of charset. | |
MSymbol | Munify |
Symbol for the unify type method of charset. | |
MSymbol | Msubset |
Symbol for the subset type method of charset. | |
MSymbol | Msuperset |
Symbol for the superset type method of charset. | |
Defines | |
#define | MCHAR_INVALID_CODE |
Invalid code-point. | |
Functions | |
MSymbol | mchar_define_charset (const char *name, MPlist *plist) |
Define a charset. | |
MSymbol | mchar_resolve_charset (MSymbol symbol) |
Resolve charset name. | |
int | mchar_list_charset (MSymbol **symbols) |
List symbols representing charsets. | |
int | mchar_decode (MSymbol charset_name, unsigned code) |
Decode a code-point. | |
unsigned | mchar_encode (MSymbol charset_name, int c) |
Encode a character code. | |
int | mchar_map_charset (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg) |
Call a function for all the characters in a specified charset. | |
Variables | |
MSymbol | Mcharset |
The symbol Mcharset . |
The m17n library distinguishes the following three concepts:
unsigned
is used to represent a code-point. An invalid code-point is represented by the macro MCHAR_INVALID_CODE
.#define MCHAR_INVALID_CODE |
The macro MCHAR_INVALID_CODE gives the invalid code-point.
The mchar_define_charset() function defines a new charset and makes it accessible via a symbol whose name is name. plist specifies parameters of the charset as below:
The value specifies the method for decoding/encoding code-points in the charset. It must be Moffset, Mmap (default), Munify, Msubset, or Msuperset.
The value specifies the dimension of code-points of the charset. It must be 1 (default), 2, 3, or 4.
The value specifies the minimum range of a code-point, which means that the Nth byte of the value is the minimum Nth byte of code-points of the charset. The default value is 0.
The value specifies the maximum range of a code-point, which means that the Nth byte of the value is the maximum Nth byte of code-points of the charset. The default value is 0xFF, 0xFFFF, 0xFFFFFF, or 0xFFFFFFFF if the dimension is 1, 2, 3, or 4 respectively.
The value specifies the minimum code-point of the charset. The default value is the minimum range.
The value specifies the maximum code-point of the charset. The default value is the maximum range.
The value specifies whether the charset is ASCII compatible or not. If the value is Mnil (default), it is not ASCII compatible, else compatible.
The value specifies the final byte of the charset registered in The International Registry. It must be 0 (default) or 32..127. The value 0 means that the charset is not in the registry.
The value specifies the revision number of the charset registered in The International Registry. It must be 0..127. If the charset is not in The International Registry, the value is ignored. The value 0 means that the charset has no revision number.
The value specifies the minimum character code of the charset. The default value is 0.
If the method is Mmap or Munify, a data that contains mapping information is added to the m17n database by calling the function mdatabase_define() with the value as an argument extra_info, i.e. the value is used as a file name of the data.
Otherwise, this parameter is ignored.
If the method is Msubset, the value must is a plist of length 1, and the value of the plist must be a symbol representing a parent charset.
If the method is Msuperset, the value must be a plist of length less than 9, and the values of the plist must be symbols representing subset charsets.
Otherwise, this parameter is ignored.
If the dimension of the charset is 1, the value specifies whether or not to define a coding system of the same name whose type is Mcharset. A coding system is defined if the value is not Mnil.
Otherwise, this parameter is ignored.
MERROR_CHARSET
The mchar_resolve_charset() function returns symbol if it represents a charset. Otherwise, canonicalize symbol as to a charset name, and if the canonicalized name represents a charset, return it. Otherwise, return Mnil.
int mchar_list_charset | ( | MSymbol ** | symbols | ) |
The mchar_list_charsets() function makes an array of symbols representing a charset, stores the pointer to the array in a place pointed to by symbols, and returns the length of the array.
int mchar_decode | ( | MSymbol | charset_name, | |
unsigned | code | |||
) |
The mchar_decode() function decodes code-point code in the charset represented by the symbol charset_name to get a character code.
unsigned mchar_encode | ( | MSymbol | charset_name, | |
int | c | |||
) |
The mchar_encode() function encodes character code c to get a code-point in the charset represented by the symbol charset_name.
int mchar_map_charset | ( | MSymbol | charset_name, | |
void(*)(int from, int to, void *arg) | func, | |||
void * | func_arg | |||
) |
The mcharset_map_chars() function calls func for all the characters in the charset named charset_name. A call is done for a chunk of consecutive characters rather than character by character.
func receives three arguments: from, to, and arg. from and to specify the range of character codes in charset. arg is the same as func_arg.
MERROR_CHARSET
Any decoded M-text has a text property whose key is the predefined symbol Mcharset
. The name of Mcharset
is "charset"
.
The symbol Mcharset_ascii has name "ascii"
and represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).
The symbol Mcharset_iso_8859_1 has name "iso-8859-1"
and represents the charset ISO/IEC 8859-1:1998.
The symbol Mcharset_unicode has name "unicode"
and represents the charset Unicode.
The symbol Mcharset_m17n has name "m17n"
and represents the charset that contains all characters supported by the m17n library.
The symbol Mcharset_binary has name "binary"
and represents the fake charset which the decoding functions put to an M-text as a text property when they encounter an invalid byte (sequence).
See Code Conversion for more details.
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
Parameter key for mchar_define_charset() (which see).
The symbol Moffset has the name "offset"
and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR
where, MIN-CODE is a value of Mmin_code parameter of the charset, and MIN-CHAR is a value of Mmin_char parameter.
The symbol Munify has the name "unify"
and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up and offsetting. The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous character code space for all characters is assigned.
If the map has an entry for a code-point, the conversion is done by looking up the map. Otherwise, the conversion is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE
where, MIN-CODE is a value of Mmin_code parameter of the charset, and LOWEST-CHAR-CODE is the lowest character code of the assigned code space.
The symbol Msubset has the name "subset"
and, when used as a value of Mmethod parameter of a charset, it means that the charset is a subset of a parent charset. The parent charset must be given by Mparents parameter. The conversion of code-points and character codes of the charset is done conceptually by this calculation:
CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET
where, PARENT-CODE is a pseudo function that returns a character code of CODE-POINT in the parent charset, and SUBSET-OFFSET is a value given by Msubset_offset parameter.