John Stumbles' Work page |
---|
Until spring 2002 I worked at BlueArc (a company developing a new generation of network file servers) where I was a senior validation engineer (i.e. I tested things).
During March 2001 I worked at PruTech, the technology side of the Prudential insurance company, managing their network. This job ended abruptly (it's a long story :-()
Before that I worked at the I.T. Services Centre of the University of Reading, where I was officially called a Computer Programmer.
I have hacked around in a variety of programming languages though I trained in Electrical and Electronic Engineering (at University College Wales, Swansea) so long ago that, after a mandatory Fortran course in the first year, I wasn't allowed near the computers afterwards! Since then I've progressed(?) from analogue through digital electronics, to computer networking, and from cabling, connectors, interface circuits: RS232 etc., through to X.25, IP, TCP/IP, PC-NFS, SNMP, H.323 ... etc)I was here since 1994, during which time I learned more than I really wanted to know about configuring PCs through setting up PC-NFS on users' PCs (and showing them how to use email etc., and dealing with "it was OK before you touched it" type problems). During that period I wrote a lot of DOS .BATch files to speed up the installation and configuration process, and some utilities to supplement existing freeware and DOS utilities. (If I'd known then what I know know, however, I'd have done it in Perl!)
I tended to call myself a Network Manager as my job was mostly managing the network (natch :-))
When I started at the University the idea was that I should get the Netcomm NMS3000 (which they'd bought before I arrived) working. This was a sort of me-too HPOV-type node monitor with discovery, MIB-variable monitoring and alarms/alerting etc. and it ran (like cold molasses :-) on a SPARCstation 1.Apart from the (lack of) speed, I could never quite get my brain round how it monitored MIB variables and its alert mechanism, and found its habit of reporting as an alert the disappearance of any host which it had ever discovered (most of which were users' PCs being switched off when they went home at the end of the day) quite un-useful. Also at this time, although I had (as mentioned) learned quite a bit about DOS, I was severely unclueful about Unix, and about SNMP, and also about much to do with Ethernet and IP etc. (I like to thinks that's all different now :-).
Along the line since then I got some way up the learning curves and played with a few monitoring and management tools. Some (BTNG, Beholder, DNPAP, javasnmp, advent and some commercial packages including HPOV) I looked at but passed by, and there were some home-brew tools we used to use but no longer do: one checked parts of the network by pinging groups of hosts known (from more-or-less hand-generated lists) to be in one area so that if none responded one could surmise that there might be a network problem (this was when we had practically no managed devices besides our core routers). Another tool used an snmpwalk script to gather interface utilisation stats from our routers and plot graphs (as .gif images) of each one (as done by MRTG, but without the by-week and by-year, etc, plots of that tool).
By the time I left the University we had a set of tools which, while by no means complete or integrated, gave us useful information:
- NET-SNMP (formerly ucd-snmp, from University of California, Davis) is an SNMP library, a development of CMU SNMP. Apart from the standalone snmpwalk, snmpget etc programmes, I used the perl SNMP.pm module interface to UCD-SNMP with myNMS.
- Scotty is an extension to the Tcl/Tk language and graphical toolkit, for writing network applications. It includes (yet another!) IP discovery, monitoring and mapping tool (Tkined) which is sadly underdeveloped, another Tk-based graphical tool for MIB tree display and investigation, and allows one to write scripts in Tcl for doing e.g. snmp get/walk/set etc functions (it also has extensions to lookup DNS, rpcinfo etc). I used the MIBtree application and the standalone snmpwalk programme for investigating devices' MIBs via SNMP, and initially used a version of the snmpwalk script within a Perl script for gathering SNMP info for the NOCinfo (as I called it then: now myNMS) database. There was still a Tcl script using scotty to gather SNMP info from our Accelar routing switches, which I'd intended to port to Perl/SNMP.pm/net-snmp, partly to make the our NMS tools dependent on fewer external applications (only NET-SNMP instead of Scotty + Tcl) and partly to free myself from coding in Tcl, which I did too little of to do well.
- MRTG We ran the Multi-Router Traffic Grapher, in a rather old version which someone else installed and set up and which I never took the time to understand how to manage. Much of this set of tools would be useful to integrate with myNMS.
- NOCOL is a tool for checking (using ping and other tools) the status of various devices and services, reporting via a console or web page. We used this to provide a rough health-check of our network, for ourselves (NOC) and for support and helpdesk staff as it gives an at-a-glance display (ideally "No Problems", otherwise some info on what is down). The latest version uses an RRD database to hold historical data, like MRTG, and has some other enhancements at present or under development (such as dependency checking).
- Argus We used this intrusion-detection system to monitor traffic between our network and the outside world.
- HP Top Tools for Hubs and Switches comes with the HP Switches which we used almost exclusively. It requires an NT server to run on and a web browser running Java and JavaScript to view. Amongst other functions it discovers SNMP devices on a network, builds an inventory, and monitors HP devices, presenting an instantaneous dial-type display of the segments with highest traffic, broadcast, multicast and error rates amongst all switch segments on the entire network: this is very useful for pinpointing problems. Using the instrumentation built into HP network devices (a sort of statistical sampling giving an RMON2-like view of traffic) Top Tools can show top talking stations and pairs (conversations) and protocol type distributions. This has on one occasion helped us locate a serious network problem: a station emitting a constant stream of (unsolicited) ARP responses which was killing a routing switch at the core of the network.
This application does however have some extremely frustrating 'features' (which, HPTT being closed source, we are at the mercy of HP to fix when and if they ever feel like doing so). Examples:
IMHO a pity that the code is closed (proprietary): it simply adds value to HP switches and if they'd made it open-source it could have added further value to their network devices, as well as kudos to HP themselves.
- the 'speedometer' dial-type displays take up an inordinate amount of screen space and although they have a high-water mark the display area would be better occupied with graphs showing the last, say, 5 minutes data.
- Worse, in the version (4.5) which we are running devices are labelled, seemingly arbitrarily, with their IP address, IP hostname, even MAC address, but never with their SNMP sysName - even though the application must be using SNMP to polling the devices for other data!
- It stores data in an Access database (it runs only on an NT station) which resisted attempts of a colleague who managed it to update the naming.
- MUTiny
At Networkshop 28 (the annual gathering of .ac.uk network professionals) in March 2000 at Heriot-Watt University in Edinburgh, Dave McClenaghan and George Neisser of Manchester Computing Centre presented a paper on MUTiny NEM: Manchester University Tiny Network Element Monitor.Dave (not sure about George) had, I believe, previously set up HPOV NNM to monitor the UK academic network where I gather he was impressed (negatively) by the amount of effort required for the return, and was inspired to build a simpler, public-domain network management system. MUTiny is written in Perl, with Perl/Tk for graphical display functions, and using UCD-SNMP. It runs on a PC running Lignux, FreeBSD or Solaris. Dave's presentation showed MUTiny's topology display (a graphical tree structure, like that familiar in file explorer applications etc) sample displays of device status ( Pingable | SNMP OK | CPU load | Memory Util | Disk Util | Critical Process status ) GUI screens for adding nodes to the system, setting polling parameters, event and alarm configuration, data collection, traffic reporting and MUTiny's self monitoring.
The system, though unfinished, was said to be in production use at the National Janet Web Caching Service in Manchester. Many delegates to the conference (myself included) expressed interest in obtaining MUTiny, and a mailing list was set up. The last I heard from Dave was towards the end of May 2000 that "we are setting up a licencing mechanism so that we can release MUTiny without compromising copyright/IPR". An enquiry in September drew a blank, but I now gather that 'they' (whether Dave or the University I am not clear) are attempting to sell the system as a commercial product. (It reminds me a bit of that US University of Wherever-it-was where Mosaic was developed. Sad.)
- myNMS is the set of tools which I developed for collecting data from various sources and correlating it to produce useful reports for various audiences. It is also available at SourceForge.
We were all set on buying HPOV - at least NNM and possibly NetMetrix - a year or so ago. It's the industry standard, it's supposed to be able to do anything that's capable of being done, and it looks good on a CV! We had an eval version running on a Solaris box which I was supposed to be working on. At the same time a couple of us were going to HPOV users' group meetings, and I was on the OV users' mailing list, and what I was hearing was that setting up OV and running it were a major chore. Also, whilst it would discover networks at layer 3 it didn't seem to have the ability to map layer 2 topology, which is a major component of our network (the layer 3 map is a simple star which any of us can see in our mind's eye: the layer 2 topology is far harder to remember). Some of the things it did, such as reporting 2 or more hosts (with different MAC addresses) using the same IP address, I already had home-brewed tools to do which I saw the possibility (now realised) of extending in useful ways (e.g. to report the names of the users of the machines involved). Other Network-Management information which we wanted and found useful was more in the user domain which OV - at least the components we'd been looking at - did nothing to provide. We were also looking for a cable management system, to keep track of connectivity at the physical level: wall ports, installed cables, patch panels etc. It seemed sensible to find something which would integrate with HPOV, sharing overlapping information so that one could correlate, say, the network identity of a host with its wall port, through the patch panel to the hub port, to the VLAN, subnet or network. We asked speakers from HP at a user group meeting, and elsewhere, but it didn't seem that this sort of integration was available. Also we were running MeterWare, which is an application for use with RMON probes, but which has a network node discovery and monitor function built in. This gave us an NNM-like picture of our network, with icons for devices which went red when the device was down, and this gave us an at-a-glance view of the health of our network. We had started to use HP Top Tools for Hubs and Switches, which is bundled on a CD-ROM with the HP switches we were buying: this application discovers devices on the network and shows at-a-glance the worst segment on any of the HP switches in terms of utilisation, broadcasts, errors, etc. A colleague had set up MRTG which graphed medium and long-term utilisation on various segments of our network. Between these and our home-grown tools the value to be gained from OV seemed not to be worth the effort - not to mention cost - involved. |