Some numbers in the PNNL case

I’ve been working for several days on the log files requested in the interrogatories for the PNNL lawsuit (no, I’m not converting them to EBCDIC although I gave it a few seconds of thought just for the amusement value). I only had a subset of the logs they wanted in a reasonably well organized manner. They wanted more than what I had organized, some of which I didn’t have, lots of logs weren’t in my preferred organization, many of the websites were hosted simultaneously on two different servers at the same time for a while, and the logs came from at least four different servers. It’s been a lot of work but I’m nearly done with that part of it.

I obtained more log files from my old web-hosting provider last night and although there will be a few tweaks to the numbers the following is pretty close:

  • Websites: ~30
  • Directories: ~170
  • Files: ~170,000
  • Size: ~6 Gbytes

I realize they asked for this mostly to cause me pain, almost for certain they won’t bother to do their own analysis, their own internal email and testimony will make this irrelevant, but I have been wanting to organize this stuff for several years anyway. I want to finish my web log analysis program (NoDooce) and this data set will be very useful for both development and testing. And besides, being the Aspergers type, I actually enjoy this type of work. I hope they find some expert witness to look at everything, pay him outrageous amounts of money, and he has as much fun as I have had with the data.

This is just the log files. There are also hundreds of pages of notes, documents I have obtained, and emails. I can’t wait to start digging into the stuff they have to supply to me.


2 thoughts on “Some numbers in the PNNL case

  1. If you have a big ol’ box of floppies kicking around, you could archive across about 4200 of ’em… ?

    Or at least use CDs rather than DVDs. About 9 CDs vs. 2 DVDs. Oh and make sure that they all have to be mounted, by archiving the files across them, rather than drag and drop copies of all of the in dividual files, which they could index and actually “divide and conquer” the pile of data.

    Make sure that you include every possible view of pages and all of the diagnostic/error 404 type pages as well. They are useless to their work, but they will have to spend time looking at them.

    I am a stinker…

Comments are closed.