The magazine of the Melbourne PC User Group

"I Lost Sleep Rather Than Lose My pride"
Alistair Lloyd

Many members are familiar with the wide range of services offered by the Group's BBS. There are now well over 3000 users registered on the board, which currently runs under the Maximus BBS Software. The Maximus programs enable flexible and powerful customisations to be made to a system setup. It has become one of the leading applications in its field.

All important information regarding the Board's users is stored in a single structured file, USER.BBS. This is a data table of great complexity. It manages to squeeze an amazing 46 fields plus seven constant values into each 240 byte record.

The First Problem

Finding software that was capable of extracting meaningful information from such an intricate table proved to be difficult. One night in early April saw me writing (please forgive me!) a QBasic program which did a sequential byte-by-byte read. From that I was able to extract names and passwords. But obviously there were deeper things afoot. Melb PC has a current project to redesign the way membership information is stored and processed. With a rapidly growing membership the need to accurately identify individual members is a necessity.

Humans are not designed as Ethernet cards and the like - we do not have serial numbers scribbled across our foreheads nor bar codes adorning our noses. If that was the case then using current computer technology to prove one's identity would be all too easy.

Alas, some things are beyond society's acceptances. To overcome this, Melb PC members are given a unique membership number. That is the best way we have of differentiating between our two Graeme Harts, or three Robert McKenzies.

Meanwhile, back in BBS land it was decided that all users would have their membership number inserted in the alias field that is included in the Maximus UserBase. This field is often used on boards to allow the user to log on under a pseudonym. However, as only real names are permitted on the Melb PC board, we had a spare field to use. This would provide an accurate cross-reference between the membership database and the userbase on the BBS.

The next problem. Who was going to insert over 3000 membership numbers into the Maximus UserBase? Not this little black duck, that was for sure.. Wait! Barry! I'll do it! Put down that knife...!

The Maximus User Base Script Language

My bacon was saved when Sysop Barry McMenomy spotted an upload containing the Maximus User Base Script Interpreter Language, aka MUL. Here was a package designed specifically to extract and process the information kept in such a UserBase. It is written by CodeLand - a Perth based software team headed by Colin Wheat of the Perth Library BBS (Fidonet address 3:690/613). Colin is the author of other useful Maximus utilities such as the external User Editor UEdit.

I took this home and skimmed through the documentation. MUL has been written as a sub-set of the standard C library. Much of the hard work in writing meaningful code has been eliminated by a wide range of useful functions. These are tailored to reading, writing, sorting and processing records within the data table. I liberated the disc space containing my QBasic program, and sat down to learn the C programming language.

One week later - I'm tired. My hands are shaking. I'm running low on coffee. A woman shows up on my doorstep and insists she is my girlfriend. I've been teaching myself C.

It is easier than it looks you know. Having had some exposure to JCL code in a previous job, any other language is easy. I sat down with some good references. However, the best clues I had came from the no-nonsense documentation and sample scripts that arrived with the MUL package. These had been compiled by a Perth group, Crossroads Computer Documentation. They provided the best indicators of MUL's abilities.

The third problem. The next stage was to prove much more heart-rending than I expected. After consultation with other members plans for a haphazard seek-and-destroy program set were thrown to the winds. I started from scratch and built a set of algorithms to extract, sort, match, update and report on the entire process.

I had a large amount of information to play with. On one hand the Userbase, with its 3000-odd users (well, some of them are odd... <grin>) and on the other an extract in text format, of the entire membership table, which included members' names and their unique ID numbers. The task was how to get one merged into the other.

It was a case of ensuring the data being compared was in the same format. This meant a fair amount of manipulation was necessary to reformat the input data.

The first file tackled was the membership list. This being a straight-forward text file it was extracted by doing a sequential read from top to bottom and separating the wheat from the chaff. I learnt the hard way that names starting with a prefix such as "de" must have their trailing spaces removed to stop the system thinking it had a third name. This was done by having the program delete a space if it was detected as the third byte. When the program came across a valid line it would take the name and membership number, convert them to upper-case and output them to a second file with surname and name first, an asterisk delimiter then the membership number. An interim step was to add leading zeroes if the length of the number was less than five bytes. Thus:
Name:  Number  Suburb
John Member 16000  South Melbourne
Jill de Member  10101 Middle Park
Jane Alsomember 690  J East Melbourne

would become:

MEMBER JOHN*16000 
DEMEMBER JILL*10101 
ALSOMEMBER JANE*00690

A similar trick was done with the Maximus Userbase. Using the BaseRead() function in MUL enabled me to run a sequential scan through the file, stripping out the names and real record numbers of the users. This information was output to yet another file as:

MEMBER JOHN*3010 
DEMEMBER JILL*2020 
ALSOMEMBER JANE*42

Next, both files had to be sorted in ascending order. This done, the files were then ready to be compared by matching the strings that were followed by the asterisk (*).

The Matching Process

This used an algorithm that was by no means perfect. I'm told there are countless lines of code available that will professionally and accurately match data. That's reassuring to know, isn't it? Pushing my DOS Window in OS/2 to its limits, I finally got the program to do what I wanted. I backed everything up twice, hid the discs and began.

Opening the sorted ID and Userbase extract files the program started to scan through them. This was done by starting at the first UserBase extract record and searching down through the ID extract file. If a match was found the program would strip off the real record number, read the information from the UserBase, clear the alias field and then rewrite the record. A hit counter was updated and the process continued.

The program would test regularly whether or not it had overstepped the mark in searching for the required string. If this had occurred it would consider the string to be missing and would rewind the ID extract back to the last successful hit. The missing string would be recorded elsewhere and the miss counter updated.

And so the process continued. With each comparison an audit record was kept, so that if anything disastrous occurred it could be traced to the source. I set up a series of batch files to control operations and sat back with a large beer.

Several crashes and one or two (?!) beers later everything looked good. I had set up a reporting job either side of the process that dumped the status of the Maximus User Base. This gave a printout of the user's name, alias, suburb and telephone number. At the end of the process 2217 users out of just over 3000 had been successfully matched. It was 4 am. I went to bed and slept for sixteen hours.

Finally. Through the use of the Maximus UserBase script language and some fairly primal C code the actual jobs took only a short time to run. Or so I would have thought. When the time came for the live implementation the only suitable machine to install and run from was a 386SX16. Run time?
I'm not telling. (see Aside)

This project taught a great deal about the importance of ensuring that data tables contain consistent information at all times. Keep a close eye on such information and never let it become irregular or fall too far out of date. A good plan is a necessity when working through a matching system such as this.

Keep several copies of test data and remember that the output results should be the same every single run.

Reprinted from the July 1994 issue of PC Update, the magazine of Melbourne PC User Group, Australia

[About Melbourne PC User Group]