Thursday, 1 November 2012

Spreadsheet processing

The actual spreadsheet processing is rather tedious.

I create a temporary sheet and populate it with all the entries from each sheet, each line of the temporary sheet is amended to make sure there is a date, either already there or inherited from the last known date.

For missing take-off and landing times I have had to make assumptions, so I tend to start at 9am, and land at 5pm since that seems to be their normal day. I suspect that isn't right in all cases, but hopefully good enough.
The really frustrating bit is the fact that Google Earth needs to have co-ordinates to work with, and the entries in the spreadsheet can be pretty varied in their spelling and/ or existence - there have been many places that I just haven't been able to find, which is something I can't really do anything about.

The way the spreadsheet lists all the planes means that I can keep track of which spitfire (for instance) is doing what. The Lanc, Hurricane and Dakota are less of an issue since they are singular.

Anyway for each location I try and find the most likely location in Google Earth and then copy the eastings and northings to a worksheet that contains the name and location of each place in the spreadsheets.
This is a time-consuming task, especially since spelling mistakes are rife - there are 9 different spellings and abbreviations for 'Coningsby' and they are based there!
There are so far 849 locations - though of course some are duplicates.

For the actual 'programming' for ease of 'portability' I decided to do all this in an Excel macro since I wasn't too sure how things were going to develop and didn't want whoever took this thing over having to install ASP/Perl/Python or whatever else might be other valid ways of doing things,

Essentially the Excel macro creates a file and fills it with the standard information (declaration, styles etc.) and the for each plane/month it loops through a filtered copy of each sheet and populates two strings that hold the date/times and co-ordinates respectively.
Once each plane/month is outputted to the file it moves onto the next plane in the month.

The use of the temporary sheet means that I have one source to deal with, which makes the eventual output a lot easier, though of course at the end of the process I delete the sheet.

All of this worked well enough and produced a KML file that I felt was rather useful when opened in Google Earth.

I also took the opportunity to list out all the errors that I could find - missing dates, times, un-found places etc. and produce an errors file - the original intention was to feed this back to the BBMF so that they could improve their data, but it also came in handy in error checking as I went along.
The error file has loads of the following data:

Take off but no time - Sheet: Apr, Line: 39. 
Venue but no slot time - Sheet: Apr, Line: 39. 
Landing but no time - Sheet: Apr, Line: 39. 
Venue but no slot time - Sheet: Apr, Line: 42. 
A/C mentioned but not specified - Sheet: Apr, Line: 42. 
Take off but no time - Sheet: May, Line: 5. 
Venue but no slot time - Sheet: May, Line: 5. 
Landing but no time - Sheet: May, Line: 5. 


Most of the output is expected - mainly due to blank lies or red text - which I take to be unconfirmed events.

No comments:

Post a Comment