Saturday, May 5, 2007

Census 2000 Summary File 3 now online, Summary File 1 completed

With the new server online (see previous post), I finally have the disk space I need to really start building out the available data sets - I'm going from about 55GB on my old machine to just about 1100GB on the new one. (Well, technically, the "old" machine (my desktop) is newer than the "new" one, but since it was the first gCensus server, let's keep calling it "old").

The first beneficiary of this storage largesse has been Summary File 1. I've expanded the coverage from California, Oregon, and Pennsylvania to cover all 50 states plus the District of Columbia.

The second major step was to add Summary File 3, which covers a lot of interesting economic and housing statistics such as median and aggregate income, housing prices, and housing facilities. Fortunately, the file structure between SF1 and SF3 is very similar, so I was able to re-use most of the import code that I had already written for SF1. The coverage for SF3 is the same as for SF1 - all 50 states + DC.

It turns out in the end that my estimates of disk consumption were off - way off. I had originally believed that it would take 750-1000GB to store all of Summary File 1. Instead, it's taking about 410GB to store both SF1 and SF3. While my friends in the theory group would call that "a small constant factor", I prefer to think of it as "a whole lot of space". Consequently, if anyone has ideas for large (nationwide or even worldwide) data sets that would be cool to import, I'd like to hear about them.

New gCensus server online!

I have a big update here, so I'm breaking it into several pieces. The first part - the new gCensus hardware is finally up and running! I got the replacement motherboard from Intel and, luckily enough, everything actually came up on the first try.

Thanks to the generous donation by Ken Schmidt of Steel in the Air, I now have a fourth 400GB hard drive in the gCensus server, for a total of 1.2TB RAID5 storage. That brings the current specs of the machine up to the following:

- Intel D955XBK motherboard
- Intel Pentium EE 955 CPU (dual-core 3.46GHz with Hyperthreading)
- Zalman CNPS9500 cooler
- 4x400GB Seagate HDD (3xSATA, 1xPATA)
- PC Power and Cooling Turbo-Cool 510ATX-SLI power supply
- Diamond Stealth 64 VRAM graphics

Everything except the PATA hard drive and the video card was a donation. I'd like to thank everyone who's helped me out with this equipment - I couldn't have done it without you!