Next: Final Things Up: For the Developer Previous: Symbols Phil Ensures Contents

Subsections

Language
C
C++
- Function Prototypes
Java
shell
Makefiles
Documentation

Tips for Writing Portable Code

Language

The easiest way to write portable software is to do it in a language that is itself portable. That way, the language designers & implementors do all the portability work for you.

C & C++ aren't such languages. Languages where the implementors have done the portability work for you include Java, Perl, Python, TK & TCL, Lisp, vanilla SQL without stored procedures, & probably some others.

Having said that C & C++ aren't portable, I'm now going to take back that assertion & replace it with another, more precise claim: Used properly, C & C++, as languages, can be portable, but in practice, you run into many portability problems due to non-portable libraries & header files. (Mostly, it's the header files.)

C & C++ aren't bad for writing portable software, & since you are reading the documentation for Phil, a C/C++ portability library, I presume you are writing software in C/C++, so there's little point in suggesting other languages. I made my non-portability claim about C & C++ mostly to be complate, & partly to ruffle feathers which were so ready & waiting to be ruffled.

C

Function Prototypes

Not all compilers understand function prototypes.

Function prototypes when combined with pointesr to functions cause many amazing, annoying, & non-portable things to happen. Compilers differ in what amount of detail they allow in prototypes of functions to which pointers point. fixme: add detail, examples

Built-in Types

Programmers are often concerned that the built-in data types they use are portable. It's a valid concern. The obvious solution is to create types like int16, int32, uint32, & others, but that's not the cure-all that it appears. In fact, I say that in most cases it has the opposite effect; it makes your code less portable.

Range

What does ``int16'' mean? That an integer of this type of 16 bits wide? Do you care that an integer is 16 bits wide? Probably not. You care about the range of the integer, that it's at least the range of a 16-bit, two's compliment, signed integer; that range is -32,768 to 32,767 inclusive. This range is covered by ``16-bit'' integers only if your system implements them as two's compliment & signed, but you don't know how your system implements integers. We're talking about writing portable software, so you want to assume as little as possible; you don't know that integers are implemented with two's compliment. You don't know how many bits are in them, either. Hell, you don't even know that bytes are 8 bits.^10.1

The same goes for a data type like ``uint32''. You don't care that it's 32 bits wide. What you really care is that its range is at least 0 through 4,294,967,295 inclusive.

So type names like ``int16'' and ``uint32'' are mosnomers because you don't care about their sizes; you care about their ranges.

There are other problems with type names like ``int16''.

Does ``int32'' guarrantee exactly the range of a 32-bit, two's compliment, signed integer? Exactly the range? If so, then what happens when you port your software to a system on which the native, natural integral type is 64 bits? You have to write extra code to limit the range of natural ints on that system to the range of an int32. Extra code.

To avoid writing that extra code, maybe ``int32'' means an integer with a range at least that of a 32-bit, two's compliment, signed integer, but then when you port your software to a 64-bit architecture, will you have problems with sign extension? Will you have problems with wrapping from lowest value to highest, or vice versa? Will you have problems when you asign your ``int32'' (which is now 64 bits) to an int or a short? Will you be able to assign a long to an ``int32''? These conversion issues will matter if you try to mix built-in types with your ``portable'', width-specific types, which you will do if you use functions from a library that uses built-in types. What hypothetical library? How about the Standard C library's labs? How about scanf? What format specifier will you use to get scanf to read a value into your ``int32''? Is it an int, which requires a ``%d'' specifier, or is it a long, which requires a ``%ld'' specifier. Wait, we might be on a 64-bit system, in which case an ``int32'' might be a short (``%hd''). If you try to read the value into a short, int, or long & then assign that value to your ``int32'', you're making an assumption that the fundamental type you chose can store the value you read, & that questions the whole point of using the bit-specific types in the first place.

How will your ``int16'' & related types interract with another library's ``portable'' types such as ``Int'' and ``Short''? Or what if another library has its own ``int16'' & related types? And what if it decided that ``int16'' means exactly 16 bits, whereas youre ``int16'' means at least 16 bits? I'm thinking of a word. I can't see it clearly. It starts with a C, ends with an S, & the letters in between are H-A-O.

Built-in Types Are Portable

So what to do about this? How do we achieve portable built-in data types, at least with the integral ones?

First, let's take a look at the C language. In fact, let's look at an old version of the C language, since older compilers are more likely to implement it than they are the ANSI/ISO Standard C.

K&R, section 2.2 Data Types and Sizes, page 36, tells us that:

shorts typically have the range of a 16-bit, two's compiliment number (-32,768 to 32,767 inclusive),
longs typically have the range of a 32-bit, two's compliment number, &
ints must have at least the range of shorts and no more than the range of longs.

So at the worst case, ints have a 16-bit, two's compliment range, & we need to use longs when we want 32-bit, two's compliment range.

Dependencies

Notice that most uses for integers, like loops & such, aren't going to use all the range of even a short, not nearly. So in these cases, why not use an int? It has the range you need, & it's portable.

In these cases, where even a worst-case, minimum-range (16 bits of two's compliment power) int is sure to be adequate, there's clearly no need for a bit-specific type. Look at it this way: If I can write my code so that it relies on int (or short or another built-in type) instead of my bit-specific types, then I've definitely written some really portable code; my code depends only on some types that are built into all C/C++ compilers.

In cases where a worst-case int isn't guaranteed to do, consider using a long. It's worst-case is 32 bits of two's compliment computing power. Again, if I can get away with writing code that depends on long, which is built into all C compilers, then I've written some damn portable code.

If I really, truly, definitely, inescapably need to write code that requires my bit-specific types, then I'm in the unfortunate situation of writing code that isn't as portable as code that uses just int and long. It may be necessary, but it ain't portable.

Bit Vectors

Sometimes people use bit-specific types such as ``uint16'' and ``uint32'' for bit masks. This is non-portable in the same way that using those types to guarrantee ranges is non-portable, but it's also a bad idea because those types aren't being used as integers; they're being used as bit vectors (or bit masks, whichever you prefer).

A better, more readily understandable, way to do this is to create bit vector types. In C, instead of ``uint16'' and ``uint32'', you might create BitVector16 & BitVector32 types. In C++ with the STL, you have the portable & flexible luxury of using class std::vectorbool.

Byte order

Don't write data to files or sockets in binary, native form unless you're sure their going to be read on the same system. Not just the same type of system; make sure it's for the same system.

If they might be read on another system, specify the format of your data independant of your system. Text files are great, but your circumstances might dictate binary. If you have to write them in a binary form, then specify the width & byte order for all types.

You read that right, I said ``width & byte order''. Width & byte order is appropriate for external data representation, whereas I maintain that it's inappropriate for guarranteeing ranges.

Text file formats

``Text'' does not mean ``ASCII''.

There are character encoding standards other than ASCII, & in the future, they will become more common than ASCII. So for your software to be portable, you should not assume ASCII.

It's easy to make your programs reasonably independant of ASCII or any other character encoding standard which uses one octect per character. The main thing you have to do is avoid hard-coding values for characters. Instead, use character constants in your code, like ' a'. It's also good to avoid assumptions about which character ordering; I'd say it's reasonable to assume that strcmp & other string comparison functions do the right thing when it comes to character ordering. Don't assume that an end-of-line is a single character outside of your program, though when a Standard C function reads it into your C/C++ program, it will translate it to a single newline (' n') character.

Multi-byte & wide character encoding standards are another matter. From lack of experience, I don't have good solutions for this. It looks to me that the Standard C multi-byte character & string functions are inadequate, & also the C++ STL wstring class looks like a nice try but not quite what's needed. I'd say the world needs a language independant, encoding independant, character & string library.

Some Specific Functions

Some functions are common enough to tempt us into treating them as portable, but in the end, they turn out to be non-portable. Sometimes, the non-portability is subtle & shows its ugly head under special circumstances only, which makes things worse because by the time that happens, you've coded the function into your application all over the place.

So here we have a list of specific C functions which taunt us to treat them as portable but which are not.

basename

The basename function is not part of Standard C, but it is common enough in Unix & Unix-like C libraries that I find myself thinking it's portable. In fact, it's not.

Different implementations treat certain special cases differently. For example, in response to basename ("."), some systems return ".", while others return the empty string.

The most important non-portable special-case behaviour I've seen to date^10.2is a core dump on Sun Sparc Solaris 2.8 in response to calling basename with any argument ending in "/". Yup, that's a crash in response to a valid input, or at least to input that I believe should be valid. Phil's test program test0013 demonstrates this behaviour if you un-comment the {"/", "/"}, {"ends-with-slash/", "ends-with-slash"}, lines in the S_special array.

So what's the big picture about basename? Can you consider it portable, use it freely in your code, or not?

Phil includes its own implementation of basename & links it into the Phil library if the host system does not have a basename. So you'd think it'd be a portable function, but I recommend avoiding it. That's because of the different run-time behaviours (including at least one crash). Instead, write your own version of basename & give it a new name so it won't conflict with the host system's.

If I were going to write a full-tilt replacement for basename, I'd make it more portable than basename. After all, basename pretty much assumes you're on a Unix file system, with slash (/) characters separating directory components & with different devices represented by their own pseudo-files in a single file system. That's a nice way to organize a file system, methinks, but not all systems do that.

A replacement for basename that wanted to be really portable might provide some kind of filename object that allowed for host names, device or drive names, path names, file names, file name extensions, & versions.^10.3It would provide services to

convert platform-specific path name strings to the portable, internal objects, &
to convert the portable, internal objects to platform-specific path name strings, &

Should it provide a service to save & load the portable, internal path name objects? That might not be necessary because the path names are not necessarily portable. In this case, I mean ``portable'' in the least-technical sense. For example, on my computer, I have a file called /home/gene/library/budget/mpg/003. Do you have such a file on your computer? If not, how could a path name object that was portable with respect to operating systems be meaningfully or usefully portable when it comes down to finding a file?

Wrapper libraries

Wrapper libraries are a good way to bundle already-implemented functionality into a portable interface. Even if your implementation of your wrapper library needs conditional compilation to select implementation, possibly from various system-dependent libraries under it, at least the programs that use your wrapper library can be written without conditional compilation. If you do it write, those programs won't need to be aware of any kind of system-specific details around your library.

gdbm, ndbm, dbm

You have to ask yourself ``Is gdbm an non-portable implementation that's been ported to many platforms, or is it a portable interface that hasn't yet been ported to all platforms?''

If gdbm is a non-portable implementation that happens to be available on many platforms, you'll want to create your own wrapper library around gdbm, ndbm, dbm, or whatever other indexing system is available on the host system.

If gdbm is a portable interface, then write your programs so that they assume gdbm is available. Don't put any conditional compilation around your gdbm-related code. Code directly for gdbm & specify that having gdbm installed on the system is a prerequisite to installing your program.

So is gdbm a widely available, non-portable implementation, or is it a portable interface? I don't know the answer to that, but I hope it's a portable interface. People sure don't treat it that way, though. Many applications, including sendmail, specifically require Berekely's dbm. I really wish people would pick the interface from one of those dbm-like libraries & declare it the standard, portable interface. Then all the implementations, whether Gnu's, Berkeley's, or yours, would be implementations of the same interface, & all programs would be written without worrying about which one of those indexing libraries was installed.

C++

The tips for portable C apply to C++ except for function prototypes.

Function Prototypes

Function prototypes in C++ pose an inherent portability obstacle. What it comes to is this: In C++, all functions must be prototyped.

For functions that you create, that's fine because you provide the definition & the prototype, but what about system calls & functions in the Standard C library. On older operating systems, the header files often made use of the C compiler's assumption that any function not declared & not prototyped returned an int. Now in C++, you need to prototype those functions (not just declare them). If the operating system's headers declare or prototype it, your prototype could disagree with the operating system's, which will give you a compiler or linker error, & you'll be sunk.

Java

The Java language is (or can be, if used properly) portable, but how to run the Java compiler & the Java VM differs from installation to installation.

Phil will figure out how to run the Java compiler & the VM. It'll stuff the compilete pathnames of those programs into some make macros.

shell

Use Bourne Shell (/bin/sh), not ksh, not bash, not csh, for shell programs.^10.4

``You're crazy! My favorite shell is ksh. It has a lot of features Bourne doesn't have, & I'm not going back!''

I don't suggest using Bourne as an interactive shell. I don't use Bourne for my interactive shell, & I wouldn't want to, either. Bourne is missing way too many convenience features to make it good for a modern interactive shell, but you don't need those features in a shell program. I'm suggesting that you use Bourne for shell programs.

If you write your shell programs in Bourne, they'll be portable all the way back to version 7 or so. Bourne lacks only a few features that modern shells have, & most of those features exist to reduce typing. You can obtain their functionality in Bourne shell with some extra code. It's a small-insignificant!-price to pay to make your shell programs run on virtually any Unix-like system.

bash & ksh will run Bourne shell programs, so if your Unix system has bash or ksh, don't worry. Just write your shell programs in Bourne, & they'll run just fine with your bash or ksh. (The opposite is not true.)

To write programs in Bourne, do these things.

The First Line

The first line of your Bourne program should be ``#! /bin/sh''. Notice the space character between the ! and the /bin/sh. That's suggested by the Gnu ``standards'' in info.

Use test

Use the test program for conditionals. Don't use ``bracket bracket ...bracket bracket''. That's not portable. You're not losing any functionality; ``bracket bracket expr bracket bracket'' translates to ``test expr'', anyway, & ``test'' is easy to type.

Don't use tilde

Don't use the home-of (tilde) character. Bourne shell doesn't have it.

Instead, use the homeof program that comes with Phil. Embed a homeof inside the single-back-quote operator, like this: ```home-of joe`''.

Note: Verify that Bourne shell supports the single-back-quote operator. In general, determine when the single-back-quote operator entered shell (of any kind), & record that here. Also log the source of the information.

Don't Use Functions

Don't use functions; old releases of Bourne shell didn't support them. Instead, place the code for what would be your shell function in a file by itself to make it a program. That'll promote code re-use, too.

Note: Verify that Bourne shell didn't support functions. Learn & document when functions entered shell.

Makefiles

Awesome as they are, the pattern rules that Gnu make allows aren't portable. Also, you can't assume that any but the most common implicit rules have been defined (such as .c.o).

So for maximum portability, you have to use explicit rules. Yup, it makes for potentially huge makefiles, but the rules are straightforward, & if you use the upcoming CyberTiggyrmaker program, which creates & maintains the makefile for your project, it's not much of an inconvenience.

Documentation

Use L^ATEX, *.info, *.html, or some other open standard, text-based, old-&-stable, system-independant format for your documentation, for christ's sake. Don't use Microsoft Word, Wordpad or any other proprietary, binary, system-dependant standard that could change at the whim of a manufacturer.

You'll probably find that with any of the open standard documentation systems, in which documents are stored as text files (though they don't have to display in only text), which can be edited with your favorite text editor (with which you are intimately aquainted because you are a programmer) are easier to use than any word processor.

Next: Final Things Up: For the Developer Previous: Symbols Phil Ensures Contents

Gene Michael Stover 2002-02-14