|
ActiveTcl User Guide |
|
- NAME
- encoding - Manipulate encodings
- SYNOPSIS
- encoding option ?arg arg ...?
- INTRODUCTION
- DESCRIPTION
- encoding
convertfrom ?encoding? data
- encoding convertto
?encoding? string
- encoding
names
- encoding system
?encoding?
- EXAMPLE
- SEE ALSO
- KEYWORDS
encoding - Manipulate encodings
encoding option ?arg arg ...?
Strings in Tcl are encoded using 16-bit Unicode characters.
Different operating system interfaces or applications may generate
strings in other encodings such as Shift-JIS. The encoding
command helps to bridge the gap between Unicode and these other
formats.
Performs one of several encoding related operations, depending on
option. The legal options are:
- encoding convertfrom ?encoding?
data
- Convert data to Unicode from the specified
encoding. The characters in data are treated as
binary data where the lower 8-bits of each character is taken as a
single byte. The resulting sequence of bytes is treated as a string
in the specified encoding. If encoding is not
specified, the current system encoding is used.
- encoding convertto ?encoding?
string
- Convert string from Unicode to the specified
encoding. The result is a sequence of bytes that represents
the converted string. Each byte is stored in the lower 8-bits of a
Unicode character. If encoding is not specified, the current
system encoding is used.
- encoding names
- Returns a list containing the names of all of the encodings
that are currently available.
- encoding system ?encoding?
- Set the system encoding to encoding. If encoding
is omitted then the command returns the current system encoding.
The system encoding is used whenever Tcl passes strings to system
calls.
It is common practice to write script files using a text editor
that produces output in the euc-jp encoding, which represents the
ASCII characters as singe bytes and Japanese characters as two
bytes. This makes it easy to embed literal strings that correspond
to non-ASCII characters by simply typing the strings in place in
the script. However, because the source command always reads files
using the current system encoding, Tcl will only source such files
correctly when the encoding used to write the file is the same.
This tends not to be true in an internationalized setting. For
example, if such a file was sourced in North America (where the
ISO8859-1 is normally used), each byte in the file would be treated
as a separate character that maps to the 00 page in Unicode. The
resulting Tcl strings will not contain the expected Japanese
characters. Instead, they will contain a sequence of Latin-1
characters that correspond to the bytes of the original string. The
encoding command can be used to convert this string to the
expected Japanese Unicode characters. For example,
set s [encoding convertfrom euc-jp "\xA4\xCF"]
would return the Unicode string "\u306F", which is the Hiragana
letter HA.
Tcl_GetEncoding
encoding
Copyright © 1998 by Scriptics Corporation.
Copyright © 1995-1997 Roger E. Critchlow Jr.