The magazine of the Melbourne PC User Group

A Comparison of Two Programming Languages
Ross Hall

If you are at all interested in learning computer programming then now is a great time. The Internet offers the budding programmer vast resources from free language compilers to sample code, home-spun tutorials and of course the ubiquitous newsgroups where one can usually find an answer to most questions. Also magazine cover CDs have helped distribute numerous programming tools and compilers including some of the Borland products. More about that later. 

This article is neither intended as a tutorial nor an in-depth analysis of a couple of popular programming languages, but more an overview and how a simple example problem can be addressed by different languages. I'm going to deal with two compiled languages, C and Java but before we do that, if you're unfamiliar with programming have a brief look at some of the introductory terminology in Figure 1. 

What is compiled language? Well the program source code always begins life as a plain text file. The compiler program reads the source code and if everything in the program is correct it outputs an object file. In this simple example the object file will be an executable. If there is a syntax error in your source code the compiler will print out an error message giving the line number where the error is located and the error type. If the program has compiled with no syntax errors, the only errors you have to worry about are logic errors. 

Logic Errors 

You might write some beautifully constructed program code, with perfect syntax, no errors whatsoever, but the program does not perform what you wanted it to. This falls under the heading of logic errors. Two aspects of every computer program must always be correct: the logic and the syntax. They rank in that order of priority. When designing a program, you plan it to perform specific tasks and make sure the logic is correct, then as you write the program code, you make sure the syntax is right. An interesting aspect of this is that while the syntax must always be absolutely correct, the logic can vary. In other words, there can be several, sometimes many different ways of solving a given problem. 

Some special terminology we will encounter in the first few paragraphs.
  • Source
  • Compiler
  • Object code
  • Syntax
Source - is the program code, written in a language you and I can learn, read and understand. See Figures 2 and 3.

Compiler - A compiler is a program that reads the source code and converts it into object code. 

Object code - a series of (mostly obscure) characters that instruct the computer to perform certain operations. Often referred to as machine code.

Syntax - is the grammar required to be used when writing in a particular programming language. Similar to sentence construction and spelling when writing in English, or Dutch, or French etc. If your spelling in a letter is incorrect, humans may not understand what you have written or they might be misled, but they might also be able to work it out - human intelligence enables us to do this. If the syntax in your computer program is wrong, the compiler definitely will not understand your source code and will not be able to work it out either. The syntax must be absolutely correct.

Figure 1. Some of the introductory terminology

The Task 

We wish to read a text file containing a collection of numbers and then output some simple statistics. The text file is of the form:

45.7 
23.7 
23.6 
.... etc 

Each number is on a separate line. We will use standard in and standard out for input and output. What this cryptic statement means is that we use command line redirection operators to do all file reading etc. For example if our program was named STATS.EXE, and our number file was named NUMBERS.TXT, we would type at the DOS command line:

STATS < NUMBERS.TXT 

This directs the contents of the numbers file into our program file and then the output is displayed on the screen. The statistics this program will output are:

how many numbers there are, 
their average 
their sum. 

Now I know you can easily do this in Microsoft Excel, but this program will execute with 100 lines of data or 100 thousand lines. 

The C Language 

The C programming language evolved from two previous languages B and BCPL. BCPL was developed in 1967 as a language for writing operating systems software and compilers. B was used to create early versions of the UNIX operating system in 1970. C was developed by Dennis Ritchie at Bell Laboratories in 1972 and initially it was known as the development language for the UNIX operating system. Today virtually all operating systems are written in either C or its object oriented successor C++. 

Because of C's widespread acceptance, C programs are easier to port between operating systems and different hardware. As an operating system programming language it has access to the hardware device level and is extremely flexible in its methods of viewing and manipulating data. This flexibility does have its drawbacks in that obscure bugs can appear in the code. C++ has gone part way to alleviating this problem, but in extreme programming tasks for example nuclear power stations and military applications, C is still regarded as not having the required stability. 

The first rendition of our sample program is written in the C language.


Figure 2. The program written in C.

You will recall that we discussed above how the program will be read and processed by a complier. There are many different compilers but apart from usually minor differences, they all perform essentially the same task. A complier processes a source code file, one line at a time. 

This is what each line of our sample C program does: 

The first lines between /* and the */ are comments. In the C language, anything between these character pairs is ignored by the compiler. The comments can be on one line or a continuous block of multiple lines, and are used for general record keeping and descriptions of the operation of the programming code.

#include <stdio.h> 

This instructs the compiler to include the text from the file STDIO.H into the program. In effect the compiler will see the included text as part of the program. The file STDIO.H contains variable declarations and definitions in reference to files. We include this include file when doing any file reading or writing; this includes writing to the screen. 

main() 

Every C program includes main(). The parenthesis after main indicate that it is a function and if the program has any command line parameters, they are passed to main inside the parentheses. In this case there are no command line parameters so the parentheses are empty. Every C program begins executing at main().

int number_of_lines 

This tells the compiler that we want to reserve a location in memory for an integer or whole number that will be of size "int". In today's PCs an int (integer) is 4 bytes long (32 bits) and can store any number ranging in value from -2,147,483,647 to 2,147,483,648. This variable that I've named number_of_lines will hold the number of lines that our program reads from the input file.

double this_number 
double sum 
double average 

These three statements reserve locations in memory for three double-precision numbers. A double-precision is a floating point number that is accurate to about 14 decimal places for numbers ranging up to 17 followed by 307 zeros. A double reserves 8 bytes of memory and is used for any floating point quantity like an average or an interest rate. As the numbers we are reading in have decimal points, we will store each one as it is read into "this_number" and add it to our running total "sum". Finally the result of the average calculation will be stored in the variable named "average"

number_of_lines = 0 
sum = 0 

Now that the variables have been declared, the program can actually start to do something. The first thing we do is set the value of the line counter variable "number_of_lines" to zero and set the value of sum to zero. If this was not done, the memory locations that these variables refer to could contain any random collection of bits. When a program starts to run, the operating system simply gives it a chunk of system memory to use. It doesn't prepare that memory in any way, the memory used will contain all sorts of bytes collected over the time since the computer was last turned on. Not initialising variables is one of the major causes of errors.

while (!feof(stdin)) 

The "while()" statement executes a loop of statements while the expression inside the parentheses is true. The expression in this case is a bit complex, but it could be something like while(counter is greater than one) written as 
while (counter > 1), 
or, while( hour is less than 12) written as while(hour < 12) and so on, 
and might loop many thousands, perhaps millions of times during the running of a program. 

If the expression is true, then we execute the next statements between the curly braces {}, otherwise ignore them. In this case, our expression is feof(stdin). This function is part of the library that comes with the C compiler to assist with file manipulation. It looks at the lines of text coming into the program (from the file redirection, stdin) and if it sees an end of file marker (meaning there is no more input) it returns true. Now we want our while expression to only operate when there remains input to read, so we must flip the result from feof(stdin) to its opposite. We place the not symbol, an exclamation mark !, in front of it. So the while expression will execute only if the end of file marker has not yet been encountered.

scanf("%lf",&this_number) 

This expression reads a line of the file, converts it to floating point number and stores it in the variable "this_number". The reason for all this complexity for what seems like a simple operation is that the lines of the file do not contain numbers but collections of number characters and decimal point characters. The scanf function then formats the string of characters to the data type determined by the "%lf" expression. In this case we are telling it that the characters on each line should be formatted as a long floating point number (ie. double). If the characters on the line read "FRED" then scanf would not be able to format these characters as a number and would return an error. Another chance for a logic error. You rightly assume that your datafile is a long list of numbers, but if there are a few alphabetical characters in it, the program will give unexpected results as scanf fails to format the characters correctly. The & symbol in front of the this_number variable instructs scanf to store the formatted number at the variables address in memory.

if (feof(stdin)) break 

After the scanf statement the variable this_number should be equal to the number at the current line of the file. If we have reached the end of the file at the previous scanf, feof(stdin) will be equal to true and the break statement will be executed. This will break us out of the while loop and transfer us to the first statement past the end brace. If we haven't reached the end of the file, continue on in the while loop.

number_of_lines++ 

We haven't reached the end of the file so the ++ following the variable name tells the program to increment the "number_of_lines" variable by one.

printf("%f\n",this_number) 

Print the number we have read from the file, to the screen. The "%f\n" in part tells printf to format the number as a floating point, and after writing it to the screen, to write a newline (the \n character). 

sum += this_number

This is a shortcut way of saying sum = sum + this_number. Increment the sum variable by the amount this_number. 

The while loop continues until the data is all read into memory.

average = sum/number_of_lines 

The first statement out of the while loop calculates the average of all the data by dividing the sum of the data by the number of lines. A logic error could occur here if the file was empty. The number of lines will be zero and any division by zero will cause the program to fail with a fatal error. A well written program (good logic) would detect a file with no data and graciously stop the program to display an error message. 

For simplicity I've left out that bit.

printf("Number of lines:

printf("Sum of the numbers:

printf("Average of the numbers: 

%d\n", number_of_lines); 

%f\n", sum); 

%f\n", average);

These three statements output the values of the collated statistics. The sum and average are both floating point numbers so they are formatted with the "f" specifier. The number_of_lines variable is an integer so we use the "d" specifier to format it. Each printf statement also prints a newline after the number.

exit(0) 

We have reached the end of the program and although this is not essential, the exit(0) statement passes the value of zero back to the operating system upon completion. If we were to halt the program because there was an error, for example if the file was empty, we could call the exit function with a value other than zero to indicate to the operating system that the program halted because of an error. The operating system can then indicate to the user that a program has failed and indicate what caused the failure. 

The program ends with a closing curly brace and another comment line. 

Java 

The other example is the same program written in Java. Java was first released in 1995 by Sun Microsystems. It is an object oriented programming language with strong typing and advanced features such as garbage collection. What does all this mean? First of all an object oriented language combines data with functions to make objects or classes. A function is a small subprogram that does a specific task such as draw a window or draw a character on the screen. The data might be the size of the window or the type of font and the character to draw. By combining these two entities together the program does not run any faster but the design becomes more logical and with large software projects reusable components are easily utilised. Strong data typing ensures that data types are not incorrectly reassigned. For example a byte might represent a letter of the alphabet or it could be a number from 0 to 255. A strongly typed language will not allow a character variable to be assigned a numerical value. If that was attempted the compiler would flag an error. This catches a lot of subtle bugs at the compiling stage. Garbage collection is a feature normally seen in very high level programming languages such as Smalltalk. Normally the programmer is responsible to free up system memory that is used in the execution of a program. The program might access a megabyte of system memory to (say) load an image and manipulate it. When that megabyte of memory is no longer required. it is the programmer's responsibility to free the memory so that it is again available to the operating system. Not performing this memory release causes what is known as memory leaks. The longer the program runs, the lower the system's memory resources, until eventually the system crashes. Garbage collection keeps track of what memory is being used by the program and when there are no longer any references to it, the memory is returned to the operating system. Memory management is controlled by the Java runtime environment, not by the programmer. 

Apart from these advanced programming features, Java's real forte is as a cross platform programming language. To run a Java program you run it in the Java virtual machine (JVM). There is a JVM written for each computer system whether it be Macintosh, Linux, Windows or whatever. When a Java program is compiled the object code produced is called byte code. This byte code is translated by the JVM into native instructions for that particular architecture. What this means is that a program compiled on a Windows PC can be placed on disk and transferred to a Macintosh and executed as though it were a Macintosh application. 

The Java version of our program is shown in Figure 3.


Figure 3. The program written in Java

Comments in a Java program can be // on a line-by-line basis, or C's /* .. */ characters to cover multiple lines. Java also includes a third comment indicator, /** */, which is interpreted by the JAVADOC program to produce linked HTML pages for program documentation.

import java.io.* 

This means that all the classes in the java.io package are available for use in this program. As in the C program the io (input/output) package has existing routines for reading from and writing to files.

class stats 

In Java everything is a class or object, even the program itself. The class name must be the same as the program source code file name.

public static void main(String args[]) 

Public means that this piece of code is visible everywhere throughout the program. Static means there can be only one version of this method (main) per class. String args[] is an array or list of strings which are the command line parameters. This part of the Java syntax is very similar to C.

int number_of_lines 
double this_number 
double sum 
double average 

These four variables are the same as C with int being a four byte whole number and double an eight byte floating point number.

String thisline

This states that the variable named thisline is of type String. The String class is used to store a series of adjacent characters such as "Hello" or "45.67". The class has methods that allow strings to be compared and manipulated. For example to compare a string s1 with a string s2 we would write:

if (s1.equals(s2)) { ..... } 

The equals method of the s1 String class compares itself to the String s2.

BufferedReader br FileReader fr 

The FileReader class translates bytes from a file into a stream of characters. The BufferedReader class breaks the stream of characters into block of characters, for example lines of a file.

number_of_lines = 0 
sum = 0 

As with C, initialise the counting and summing variables to zero. 

try { 

This is one of the features of Java, exception handling. The next section of code within the curly braces reads input data. The "try" keyword signifies that if there is an error reading the input data then an exception will occur and the program will pass control to the "catch" section (further down the code). The statements in the catch section will then be executed. This normally includes a message referring to the error type that has occurred.

fr = new 
FileReader(FileDescriptor.in) 

Create an instance of the FileReader class with standard input (the redirected number file) as the file from which to read a stream of characters.

br = new BufferedReader(fr) 

Create a new instance of the BufferedReader class using our new FileReader class instance as the source of the characters to buffer.

thisline = br.readLine()

The BufferedReader class has a method readline() which reads a line of characters. We assign the line of characters to the String variable named "thisline".

while (thisline != null ) { 

If we have reached the end of the file "thisline" will equal null. Null is a memory location of zero. So in effect, if the BufferedReader cannot supply a string of characters (the end of the file is reached and there are none left) then it assigns zero to "thisline". The while statement says keep executing the group of statements between the braces while thisline is a proper String.

this_number = Double.parseDouble(thisline) 

The C program used scanf() to read from the input file and convert the characters to a floating point number in one single statement. This Java program takes two steps. The parseDouble method takes a String of characters and formats them as a double-precision number. The result is assigned to "this_number".

number_of_lines++ 

Same as in C, increment the counter variable named number_of_lines.

System.out.println(this_number) 

This line matches the printf() in C. Here, System refers to the Java package which contains operating specific routines. The "out" portion pertains to the fact that output will be to standard out (ie. the screen). We don't need a format specifier as "this_number" will provide the code to write itself to the screen.

sum += this_number 

Same as C again. Add this_number to the sum.

thisline = br.readLine(); 

Get the next line from the input stream. If it is not null, then we continue through the loop again.

catch (IOException ioe) 
{ System.out.println("Input/Output error:" + ioe); } 

This is the catch section and it is activated if an error occurs during the data reading operation. The "try" section "throws" an exception and this part "catches" it. If there is an error, a message is written to the screen together with an error number.

average = sum/number_of_lines 

The average of all the numbers in the file is the sum divided by the number of lines - same as the C program.

 system.out.println("Number of lines: "+ number_of_lines) 
System.out.println("Sum of the numbers: "+ sum) 
System.out.println("Average of the numbers: "+ average) 

Use the println() method to output the values of the three variables. Note the use of the "+" sign to add character strings together, to form one large string.

System.exit(0) 

Same purpose as in the C program. The program has completed successfully, return 0 (zero) to the operating system. 

The program ends with two curly braces, one to signify the end of the main() method and the other to end the "stat" class. Finally there is a comment line. 

Resources 

Compilers for C and Java are freely available on the Net and are also available on magazine cover CDs. Borland has a free downloadable version of its C++ Builder compiler at: http://www.borland.com/bcppbuilder/freecompiler/

Also early versions of the standard C++Builder Integrated Development Environment (IDE) come up regularly on magazine cover disks. This is an entire package which includes an editor, compiler, debugger and a form builder similar to that of Visual Basic. A similar product called JBuilder is available for Java. Borland is to be commended for this initiative as it allows users to work with industrial quality software building environments without the enormous cost outlay. The only drawback is that the software is the base edition and usually a few versions behind the current. Unless you are using very advanced features this is not a concern. 

If you wish to use something a bit simpler there is the DJGPP compiler, which runs from the DOS command line: http://www.delorie.com/djgpp/. This compiler has also appeared on magazine cover disks from time to time. 

For a Java compiler it is best to go direct to the source: http://java.sun.com/j2se/1.3/. Here they have the SDK (Software Development Kit) for Windows, Linux and Solaris. Often Java books will include a CD with the SDK included. They can be a little out of date but it saves a large download from the Net. There is also a Sun product called Forte which is an integrated development environment for Java. The community version of Forte is free. This is an excellent product but it is extremely memory intensive. The recommended system memory is 256 megabytes. 

Although the compilers may be free, it can become expensive if you start to purchase reference books. I suggest with C to try the academic secondhand bookshops. C programming has been a standard at most universities for many years and there are numerous textbooks available. Another advantage of Linux is that it goes well with university texts as they are usually aimed at a UNIX environment. Java versions are increasing fairly rapidly so there are usually some older reference books about at reduced prices. When looking for Java references I suggest looking at Java version 1.1 or later. Version 1.1 included some major changes to graphics handling which has made much of version 1.0 redundant. Currently Java is at version 1.3. 

If you are interested in programming, my own opinion is that Linux offers the best environment. Linux comes standard with a C and C++ compiler and there are numerous editors and programming utilities. With some of the larger distributions the full Java SDK is supplied and some include Forte for Linux. 

In conclusion, Java has a number of advantages over C due to the fact that it was developed as an object oriented language, it includes advanced memory management techniques and because of its strong type casting, tends to be more stable. Java is also more Internet friendly with a specific Net package to facilitate network connections and utilise common Web protocols. Its cross platform ability is also important for software that needs be executed on different architectures. The major drawback of Java is its performance due to it being partially interpreted code. For normal applications using moderm processors with lots of RAM this is not a problem. But graphics programs, especially those using 3D graphics, like games, are hampered by its performance characteristics. 

C's history as an operating system language means that it produces fast, lean code. It originates from a time when memory and processor speed were worth many times what they are today. By using C++ you can have the advantage of the object oriented methodology, while keeping the performance. Also the design of C++ uses some of Java's advanced features such as stronger data typing and exception handling. This explains why much software is now written in C++. 

Reprinted from the August 2001 issue of PC Update, the magazine of Melbourne PC User Group, Australia