The magazine of the Melbourne PC User Group

Linux: The Big Picture
Lars Wirzenius
 


Lars Wirzenius was at university with Linus Torvalds in 1991 when Linux was born, and he ran an early copy on his computer. In this article Lars shares some of his knowledge of the history and development of Linux during the past 12 years — a period that has seen a young man’s idea become a significant International trend...

The history of computer operating systems begins in the 1950s, with simple schemes for running batch programs efficiently; minimizing idle time between programs. A batch program is one that requires no interaction with a user. It reads all its input from a file (possibly a stack of punch cards) and sends all its output to another file (possibly to a printer). This is how all computers used to work.

Then in the early 1960s interactive use started to gain ground. Not only interactive use, but several people using the same computer at the same time from different terminals. These were known as time-sharing systems and compared to the batch systems they were quite a challenge to implement.

During the 1960s there were many attempts at building good time-sharing systems. Some of these were university research projects, others were commercial. One such project was Multics, it was innovative at the time. For example it had a hierarchical file system, something we take for granted in modern operating systems. However, the Multics project did not progress very well. It took years longer than anticipated to complete and never got a significant share of the operating system market. One of the participants, Bell Laboratories withdrew from the project.

The Bell Labs people who had been involved then created their own operating system and named it UNIX.

Initially UNIX was distributed free and gained much popularity in universities. Later, it was given an implementation of the TCP/IP protocol stack and was adopted as the operating system of choice on many early workstations.

By 1990, UNIX had a strong position in the server market and was especially strong in universities. Most universities had UNIX systems and computer science students were exposed to them. Many of them wanted to run UNIX on their own computers as well. Unfortunately by that time UNIX had become commercial and rather expensive. About the only cheap option was Minix, a limited UNIX-like system written for teaching purposes by Andrew Tanenbaum. There was also 386BSD, a precursor to NetBSD, FreeBSD and OpenBSD, but that wasn't mature yet and required hardware more formidable than many had available at home.

Into this scene came Linux, in October, 1991. The author, Linus Torvalds had used UNIX at the University of Helsinki, and wanted something similar on his PC at home. Since the commercial alternatives were far too expensive, he started out with Minix, but wanted something better and soon began to write his own operating system. After its first release it attracted the attention of several other hackers. While initially Linux was not really useful except as a toy, it soon gathered enough features to attract the interest of many people, even those generally uninterested in operating system development.

Linux itself is only the kernel of an operating system. The kernel is the part that makes all other programs run. It implements multitasking, manages hardware devices and generally enables applications to do their thing. All the programs that the user (or system administrator) actually interacts with are run on top of the kernel. Some of these are essential. For example, a command line interpreter (or shell), which is used both interactively and to run shell scripts, ie. files corresponding to batch (.BAT) files.

Linus did not write those programs himself; he used existing free versions instead. This greatly reduced the amount of work required to get a working environment. In fact often he changed the kernel to make it easier to get the existing programs to run on Linux, instead of the other way around.

Most of the critically important system software, including the C compiler, came from the Free Software Foundation's GNU project. Started in 1984, the GNU project aims to develop an entire UNIX-like operating system that is completely free. To credit them, many people like to refer to a Linux system as a GNU/Linux system. (GNU has its own kernel as well.)

During 1992 and 1993, the Linux kernel gathered all the necessary features it needed to work as a replacement for UNIX workstations, including TCP/IP networking and a graphical windowing system (the X Window System). Linux also received plenty of industry attention, and several small companies were started to develop and distribute Linux. Dozens of user groups were founded and the Linux Journal magazine appeared in early 1994.

Version 1.0 of the Linux kernel was released in March, 1994. Since then the kernel has gone through many development cycles, each culminating in a stable version. Each development cycle has taken a year or three, and has involved redesigning and rewriting large parts of the kernel to deal with changes in hardware (for example, new ways to connect peripherals, such as USB) and to meet increased speed requirements as people apply Linux to larger and larger systems (or smaller and smaller ones: embedded Linux is becoming a hot topic).

From a marketing and political point of view, after the 1.0 release the next huge step occurred in 1997 when Netscape decided to release its Web browser as free software (the term "open source" was created for this). This was the occasion that first brought free software to the attention of the whole computing world. It has taken years of work since then, but free software (or "open source") has not only become generally accepted but also it's often the preferred choice for many applications.

The Social Phenomenon

Apart from being a technological feat, Linux is also an interesting social phenomenon. Much through Linux the free software movement has gained attention and recognition. On the way there it got an informal marketing department and brand, "open source". It is baffling to many outsiders that something as successful as Linux could be developed by a bunch of unorganised people in their free time.

The major factor here is the availability of all the source code of the system, plus a copyright license that allows modifications to be made and distributed. When the system has many programmers among its users, if they find a problem they can fairly easily fix it. Additionally if they think a feature is missing they can add it themselves. For some reason that is a challenge many programmers like to take on, even when they're not paid for it: they have an itch (a need), so they scratch (write the code to fill the need).

It is necessary to have at least one committed developer who puts in a lot of effort. After a while, once there are enough programmer-users sending in small changes and improvements you get a snowball effect. Lots of small changes result in a rapid total development time. This attracts more users, some of whom will be programmers. The wheel spins faster.

For operating system development specifically, the input from this large group of programmer-users results in two important types of improvements, bug fixes and device drivers. Operating system code often has bugs that occur only rarely and it can be difficult for the developers to reproduce them. When there are thousands or more users who are also programmers, the result is a very effective testing and debugging army.

Most of the code volume in Linux is device drivers. The core, which implements multitasking and multiuser functionality, is small in comparison. Most device drivers are independent from each other and interact only with the operating system core via well defined interfaces. Thus it is fairly easy to write a new device driver without having to understand the whole complexity of the operating system. This also allows the main developers to concentrate on the core functionality and they can allow those people who actually have the devices, to write the device drivers.

It would be awkward just to store the thousands of different sound cards, Ethernet cards, IDE controllers, motherboards, digital cameras, printers, and so on that Linux supports. The Linux development model is distributed and spreads the work around quite effectively.

The Linux model is not without problems. When a new device comes onto the market it can take a few months before a Linux programmer somewhere is interested enough to write a device driver. Also, some device manufacturers for their own reasons do not wish to release programming information for their devices. This can prevent a Linux device driver from being written at all. Fortunately with the growing global interest in Linux such companies are becoming fewer in number.

So What Is Linux?

Linux is a UNIX-like multitasking, multiuser 32 and 64 bit operating system for a variety of hardware platforms and licensed under an open source arrangement. This is a somewhat brief description and I'll spend the rest of this article expounding on it.

Being UNIX-like means emulating the UNIX operating system interfaces so that programs written for UNIX will work for Linux merely by recompiling the code. It follows that Linux uses mostly the same abstractions as the UNIX system. For example, the way processes are created and controlled is the same in UNIX and Linux.

There are a number of other operating systems in active use: from Microsoft's family of Windows versions, through Apple's MacOS to OpenVMS. Linus Torvalds chose UNIX as the model for Linux partly for its aesthetic appeal to system programmers and partly because of all the operating systems with which he was familiar, it was the one he knew best.

The UNIX heritage also gives Linux the two most important features: multitasking and multiuser capabilities. Linux, like UNIX, was designed from the start to run multiple processes independently of each other. Implementing multitasking well requires attention at every level of the operating system. It is hard to add multitasking to an operating system afterwards. That's why the Windows 95 series and MacOS (before MacOS X) did multitasking somewhat poorly: multitasking was added to an existing operating system, not designed into a new one. That's also why the Windows NT series, MacOS X, and Linux do multitasking so much better.

A good implementation of multitasking requires, among other things, proper memory management. The operating system must use memory protection support in the processor to protect (running) programs from interfering with each other. Otherwise a buggy program (that is, almost any program) can easily corrupt the memory area of another program, or of the operating system itself, causing anything from weird behaviour to a total system crash with likely loss of data and unsaved work.

Supporting many concurrent users is easy after multitasking works. You label each instance of a running program with a particular user and prevent the program from tampering with other user's files.

Portable And Scalable

Linux was originally written for an Intel 386 processor, and naturally works on all successive processors. After about three years of development, work began to adapt (or port) Linux to other processor families as well. The first one was the Alpha processor, then developed and sold by the Digital Equipment Corporation. The Alpha was chosen because Digital graciously donated a system to Linus. Soon other that, porting efforts followed. Today, Linux also runs on Sun SPARC and UltraSPARC, Motorola 68000, PowerPC, PowerPC64, ARM, Hitachi SuperH, IBM S/390, MIPS, HP PA-RISC, Intel IA-64, DEC VAX, AMD x86-64 and CRIS processors. See http://kernel.org for details.

Most of those processors are not very common on people's desks. For example, S/390 is IBM's big mainframe architecture. Here mainframe means the type of computer inside which you could put your desk, rather than the type that fits on your desk.

Some of those processors are 32 bit, like the Intel 386. Others are 64 bit, such as the Alpha. Supporting such different processors has been good for Linux. It has required designing the system to use proper modularity and good abstractions and this has improved code quality.

The large range of supported processors also shows off Linux's scalability: it works on everything from very small systems, such as embedded computers, handheld devices, and mobile phones, to very large systems such as the IBM mainframes.

Using clustering technology, such as Beowulf http://www.beowulf.org, Linux even runs on supercomputers. For example the US Lawrence Livermore National Laboratories bought a cluster with 1920 processors, resulting in one of the five fastest supercomputers in the world with a theoretical peak performance of 9.2 teraFLOPS or 9.2 trillion calculations per second. http://lwn.net/Articles/4759/.

Using Linux

The operating system itself is pretty boring to most people. Applications are necessary to get things done. Traditionally, Linux applications were the same types of applications used with UNIX: scientific software, databases, and network services. Also of course, all the tools programmers want for their craft.

Much of that software seems rather old-fashioned by today's desktop standards. User interfaces are text based, or they might not exist at all. Indeed most software has usually been non-interactive and has been of the command line, batch processing variety. Since most users have been experts in the application domain, this has been good enough.

Thus, Linux first found corporate employment as a file server, mail server, Web server, or firewall. It was a good platform for running a database, with support from all major commercial database manufacturers.

In the past few years Linux has also become an interesting option with user-friendly desktops. The KDE http://www.kde.org and Gnome http://www.gnome.org projects both develop desktop environments and applications that are easy to learn and effective to use. There are now many desktop applications that people with Windows or MacOS experience will have no difficulty using.

There is even a professional grade Office Software package. OpenOffice http://www.openoffice.org, based on Sun's StarOffice, is free, is fully featured, and file-compatible with Microsoft Office. It includes a word processor, spreadsheet, and presentation program, competing with Microsoft's Word, Excel, and PowerPoint.

Linux Distributions

Before installing Linux you must choose a Linux distribution. A distribution is the Linux kernel plus an installation program plus a set of applications to run on top of it. There are hundreds of Linux distributions, all serving different needs.

All distributions use pretty much the same actual software, but they are different in which software they include, which versions they pick (a stable version known to work well, or the latest version with all the bells, whistles and bugs), how the software is pre-configured, and how the system is installed and managed. For example, OpenOffice, Mozilla (Web browser), KDE and Debian (desktop environments), and Apache (Web server) will all work on all distributions.

Some distributions aim to be general purpose, but most of them are task specific: they are meant for running a firewall, a Web kiosk, or meant for users within a particular university or country. Those looking for their first Linux experience can choose from the three biggest, general purpose distributions - Red Hat, SuSE, and Debian.

The Red Hat and SuSE distributions are produced by companies by the same names. They aim at providing an easy installation procedure, and for a pleasant desktop experience. They are also good as servers. Both are sold in boxes with an installation CD and printed manual. Both can also be downloaded via the Internet.

The Debian distribution is produced by a volunteer organization. Its installation is less easy - you have to answer some questions during the installation; questions about things the other distributions deduce automatically. Nothing complicated as such, but requiring an understanding of, and information about hardware that many PC users usually don't want to be concerned with. On the other hand, after installation Debian can be upgraded to each new release without reinstalling anything.

The easiest way to try out Linux is to use a distribution that works completely off a CD-ROM. This way, you don't have to install anything. You merely download the CD-ROM image (.iso file) from the Net and burn it to a disc, or buy a mass-produced one via the Net. Insert disc in drive, then reboot. Not having to install anything on the hard disk means you can easily switch between Linux and Windows. Also, since all the Linux files are on a read-only CD-ROM, you can't accidently break anything while you're learning.
 

Further Information

About the Author
Lars Wirzenius
http://liw.iki.fi/liw/ designs and implements embedded telematic software for Oliotalo http://www.oliotalo.fi at work, and develops Debian at home.


Reprinted from the May 2003 issue of PC Update, the magazine of Melbourne PC User Group, Australia

[ About Melbourne PC User Group ]