It is almost zen like. "There is no spoon." sort of thing. Read on if you are interested in details of a technical nature.
This has been a very busy week at work. We had some unintended and unexplained outages late last week and this week has been filled in trying to explain at least the network side of some of the outages/interruptions. The short version is that when we went looking for an explanation for the outages we saw network traffic that did not make sense. So I spent the week trying to make sense of what we found. What we saw was network switches not keeping the hardware address of the device directly connected to it. The switches should have that information, in case you are curious. The behavior did not change when we changed switches, IP address, or hardware addresses. My boss was started to get very concerned as the only things left to change were the server and the primary router on campus. Rebooting the router effects every network connection on the campus and as such is pretty much the very last thing you want to do. And the time we can reboot the router without it effecting students is quickly going away, as student return this weekend. So I have been working with the company that makes the switches, sending them configuration files, collecting samples of all the network traffic going in and out the ports in question and all of the other fun and games of troubleshooting. And after a week, and escalting within their service I managed to figure out what was happening. I wish it was one of those "ah ha" moments but it was more one explanation kept coming up and no matter what I did I could not disprove it. And last night I finally found a way to prove this possible explanation was in fact the correct explanation. Know what it was? That everything was working correctly on both the switches and the server, we were just seeing some very unusual, but not technically wrong, behavior based on how the server was configured.
No one believes me when I say its not the network.
More a reminder that I really need to get some traffic analysis going for our network. At least I got the syslog problem fixed and we are finally sending our switch logs to a syslog server. Small steps in the right direction after a week of running all over the place.
And half of our group have been sick all week. Mostly variations on the flu/cold type of thing. My boss let me know he had gone through 3/4ths of a box of tissues in one day. I have had a very sore throat that is not strep and not a cold or flu since last week. I live off of throat spray and cough drops. *blah* It is no fun to be in pain but otherwise functional.