trans130 Introduction


Copyright (c) 1993-2000 by Kostis Netzwerkberatung
Talstr. 25, D-63322 Rödermark, Tel. +49 6074 881056, FAX 881058
kosta@kostis.net (Kosta Kostis), http://www.kostis.net/

This information may be used free of charge at your own risk.


trans Character Encoding Converter Generator Package

This is a snapshot of a work in progress. Things may change, other things may be added/removed. Please wait for the release. Sorry, no date scheduled, yet.

You may use this package free of charge at your own risk but you may not sell this package.

Should you be interested in using any component of this package in a commercial package you must contact me first to find terms. Don't worry, it won't cost you a leg and an arm, I just want to make sure that if there is money made out of this it doesn't pass me by completely. ;)

Currently there are 80 different Character Encoding Description Files supplied with this package, not counting the following files:

iso6429, iso646 (which are included by many other files)
iso10646, iso10646.mes (these are here for reference purposes)

trans covers 7-bit encodings such as ISO 646, 8-bit encodings such as many MS-DOS Codepages (also for IBM OS/2), Microsoft Windows Codepages, ISO 8859, HP, Adobe, Apple Macintosh, Atari, NeXTSTEP Character Encodings, some EBCDIC Encodings, koi8-r and a few more...

Should your favourite Character Encoding be missing, please contribute!

Where to get updates

The latest version of this package should be available at:

http://www.kostis.net/freeware/

How to create a Character Encoding Converter

To create translators, use make to compile, link and install all trans tools, first. Installing means moving the executables into a directory included in your command search path. In order to do that, check Makefile which uses "/usr/local/bin" as the default directory for installing binaries which you probably want to change.

Example: U*IX (e. g. Linux)

#
# use gcc
#
make # compile trans executables

Makefile offers a couple of options which you may want to use:

make install # this copies executables to /usr/local/bin
make clean # this deletes objects and executables - never mind warnings

make check # check cedf files (create error.log)

make html # create HTML tables from cedf files (check destination in Makefile)
make list # create list of cedf files (create encoding.lis)

make date # for my personal use only ;)
make pack # for my personal use only ;)
make uni # for my personal use only :) (unierror.lis)

SunOS using gcc seems to require

make COPTS='-DFILENAME_MAX=200 -DNO_STRUPR -O6'

After that, please change your working directory to the "bin" directory.

Please set an enviroment variable TRANS that points to the directory where this package resides on your computer *including* the trailing directory separator character

e. g.: TRANS="/usr/local/src/charsets/trans130/"

All Character Encoding Description Files reside in the cedf subdir.

If you don't set a variable TRANS the default location "/usr/local/lib/trans/" will be assumed (see file "tab.h", DIR_TRANS).

To test the translator generator after have done at least "make" and "make install", type

cd "$TRANS"bin
one

This should generate two translators between ISO 8859-1 and MS-DOS Codepage 850. Each translator consists of three files (e.g.):

isox850.c   the main program
isox850.h   the header file
isox850.tab   the translation table file

Each translator will #include the files

trans.c   the main invariant program
trans.h   the main invariant header file

You should be able to compile and link isox850.c and 850xiso.c easily. Read transtab.man to learn more about the syntax for transtab.

Have a look at maketabs respectively to get an inspiration for program names.

This package is written in ANSI-C using the two non-ANSI functions strdup () and strupr (). Sources for these functions are supplied should your compiler/library not contain them. Should you encounter any problems while trying to compile this package, your compiler is very likely not ANSI-C compliant. Should your compiler be ANSI-C compliant and still report warnings and/or errors, please let me know. I'll need the following data in order to to help you:

Directory tree for this package

The directory tree for this utility should look like this:

directory   file   description
./       contains the complete package
         
    index.htm   this file
    Makefile   sample makefile for U*IX using gcc
    encoding.lis   list of Character Encoding Description Files
    error.log   output created by checkall
    unierror.log   diffs between cedf and selected Unicode files
         
src/       contains the translation table generator source
         
    Makefile   makefile for gcc (eg. Linux)
         
    comptran.c   compute translation table and output
    comptran.h   header file for comptran.c
    datatype.h   handy data types
    gettrans.c   get TRANS directory
    gettrans.h   header file for gettrans.c
    head_c.h   generic translator main program
    head_h.h   generic translator header file
    head_tab.h   generic translator table file header
    head_u.h   generic translator Unicode FormatA file header
    loadtab.c   read xlt binary table and Unicode FormatA
    loadtab.h   header file for loadtab.c
    os-stuff.h   OS/compiler dependent definitions
    readtab.c   read character encoding description file
    readtab.h   header file for readtab.c
    scanflag.c   parse program parameters and flags
    scanflag.h   header file for scanflag.c
         
    strdup.c   in case your compiler doesn't have it
    strdup.h   header file for strdup.c
    strupr.c   in case your compiler doesn't have it
    strupr.h   header file for strupr.c
         
    tab.h   table constants
    taberr.h   trans error codes and messages
         
    checkiso.c   checks character encoding description names
    checkiso.h   header file for above program
    checkiso.man   man page for above program
    checkuni.c   compares cedf file with Unicode Format A table
    checkuni.h   header file for above program
    checkuni.man   man page for above program - for internal use
    transiso.c   translator generator to ISO 10646 main program
    transiso.h   header file for above program
    transiso.man   man page for above program
    transtab.c   translator generator main program
    transtab.h   header file for above program
    transtab.man   man page for above program
    transce8.c   translator program (8-bit) main program
    transce8.h   header file for above program
    transce8.man   man page for above program
    transhtm.c   program that displays HTML tables
    transhtm.h   header file for above program
    transhtm.man   man page for above program
         
    checkall   check all tables
    chkuni   for internal use only
    mklist   create list of all tables
    mkhtml   create HTML table (mkxlt may be required before running this one)
    mkxlt   create XLT files (binary translation files)
         
bin/       contains the translator main program (invariant part) and a few scripts to create translators
         
    compile   compile one program
    makeall   compile all programs
    maketabs   create many translator sources
    one   create one translator
         
    trans.c   invariant main translator program
    trans.h   invariant main translator header file
         
    utf.c   convert from/to plain 16-bit Unicode/UTF
    utf.h   header for utf.c
         
    utimbuf.h   helps to keep file date stamps
         
htm/       contains information in HTML format about the description files and other more general information
         
cedf/       contains Character Encoding Description Files
         
    adobeiso   Adobe ISOLatin1Encoding Encoding Vector
    adobestd   Adobe StandardEncoding Encoding Vector
    adobesym   Adobe Symbol Encoding Vector
         
    applecro   Apple Macintosh Croatian
    applegk2   Apple ][ Greek extended for Macintosh
    applegrk   Apple Macintosh Greek
    appleice   Apple Macintosh Icelandic
    applerom   Apple Macintosh Roman
    applerum   Apple Macintosh Romanian
    appletur   Apple Macintosh Turkish
         
    atarist   Atari ST/TT
         
    cp1250   Microsoft Windows Codepage 1250 (EE)
    cp1251   Microsoft Windows Codepage 1251 (Cyrl)
    cp1252   Microsoft Windows Codepage 1252 (ANSI)
    cp1253   Microsoft Windows Codepage 1253 (Greek)
    cp1254   Microsoft Windows Codepage 1254 (Turk)
    cp1255   Microsoft Windows Codepage 1255 (Hebr)
    cp1256   Microsoft Windows Codepage 1256 (Arab)
    cp1257   Microsoft Windows Codepage 1256 (BaltRim)
    cp1258   Microsoft Windows Codepage 1256 (Viet)
         
    mslinedr   Microsoft Windows MS LineDraw
    symbol   Microsoft Windows Symbol Encoding Vector
    wingding   Microsoft Windows Wingdings Encoding Vector
         
    cp437   IBM Codepage 437 (US)
    cp737   IBM Codepage 737 (Greek defacto Standard)
    cp775   IBM Codepage 775 (BaltRim)
    cp850   IBM Codepage 850 (Multilingual Latin 1)
    cp851   IBM Codepage 851 (Greece) - obsolete
    cp852   IBM Codepage 852 (Multilingual Latin 2)
    cp853   IBM Codepage 853 (Multilingual Latin 3)
    cp855   IBM Codepage 855 (Russia) - obsolete
    cp857   IBM Codepage 857 (Multilingual Latin 5)
    cp860   IBM Codepage 860 (Portugal)
    cp861   IBM Codepage 861 (Iceland)
    cp862   IBM Codepage 862 (Israel)
    cp863   IBM Codepage 863 (Canada (French))
    cp864   IBM Codepage 864 (Arabic)
    cp865   IBM Codepage 865 (Norway)
    cp866   IBM Codepage 866 (Russia)
    cp869   IBM Codepage 869 (Greece)
    cp874   IBM Codepage 874 (Thai)
    cp895   IBM Codepage 895 (Czech Kamenicky)
         
    decmcs   DEC Multinational Character Set (DEC MCS)
         
    ebc037   EBCDIC Codepage 037
    ebc500   EBCDIC Codepage 500
    ebc875   EBCDIC Codepage 875 (Greek)
    ebc1026   EBCDIC Codepage 1026 (Turkish)
    ebc1047   EBCDIC Codepage 1047
         
    hp48   HP 48 Character Set
    hproman8   HP Roman-8
         
    iso10646   ISO 10646 (sorted by name, 16-bit)
    iso10646.mes    
         
    iso6429   ISO 6429 Control Characters (00-1F, 7F)
    iso646   ISO 646 (common character base)
         
    iso646.ca   ISO 646 (French Canadian)
    iso646.ch   ISO 646 (Swiss)
    iso646.de   ISO 646 (German)
    iso646.es   ISO 646 (Spanish)
    iso646.fi   ISO 646 (Finnish)
    iso646.fr   ISO 646 (French)
    iso646.gb   ISO 646 (United Kingdom)
    iso646.irv   ISO 646 (International Reference Version)
    iso646.it   ISO 646 (Italian)
    iso646.nl   ISO 646 (Dutch)
    iso646.no   ISO 646 (Norwegian/Danish)
    iso646.pt   ISO 646 (Portuguese)
    iso646.se   ISO 646 (Swedish)
         
    iso8859.1   ISO 8859-1 (Latin 1)
    iso8859.2   ISO 8859-2 (Latin 2)
    iso8859.3   ISO 8859-3 (Latin 3)
    iso8859.4   ISO 8859-4 (Latin 4)
    iso8859.5   ISO 8859-5 (Latin/Cyrillic)
    iso8859.6   ISO 8859-6 (Latin/Arabic)
    iso8859.7   ISO 8859-7 (Latin/Greek)
    iso8859.8   ISO 8859-8 (Latin/Hebrew)
    iso8859.9   ISO 8859-9 (Latin 5)
    iso8859.10   ISO 8859-10 (Latin 6)
    iso8859.13   ISO 8859-13 (Latin 7 - Baltic Rim)
    iso8859.14   ISO 8859-14 (Latin 8 - Celtic)
    iso8859.15   ISO 8859-15 (Latin 9)
         
    koi8-r   Cyrillic encoding as defined in RFC-1489
         
    nextstep   NeXTSTEP Encoding Vector
         
    tex-dcr.in   TeX dcr input (contains non-ISO 10646 names)
    tex-dcr.out   TeX dcr output (contains non-ISO 10646 names)
         
xlt/       contains binary conversion tables (default is little endian)
        all files listed in cedf/ should be here, except for iso6429, iso646, iso10646, iso10646.mes

Should you not have a "little endian" CPU (Intel i386, i486, Pentium and many other brands), please do a "make bintab" to create the very same tables using your native byte order. This will most likely only work on U*IX (like) systems.