In the wee hours of Thursday there came a bump in the night at our server host. The details of the outage are explained below.
In short, Cox Communications – an upstream internet provider for our hosting company – did something stupid by allocating a block of IP address to themselves. This essentially redirected the IP addresses to Cox’s network preventing the traffic from reaching the hosting company.
=====
At approximately 3:20 AM EDT, CARI.net internal monitoring began reporting problems with DNS resolution. The problem was immediately escalated to our on call senior network admins. Due to the nature of the problem, remote access was not possible to resolve the issue, onsite access would be required. Once onsite it was established that two of our upstream Bandwidth providers (Level3 and COX) were not passing traffic, however the connections themselves were functional. Both providers were contacted and tickets were opened with tier 1 support. Working with Level3, we were able to jointly identify that the problem was originating from the COX network.
COX was apparently routing 3 of CARI.net’s 5 IP allocations incorrectly causing traffic to be dropped in the COX network.
At 5:30 AM COX’s on call Hi-Cap engineer contacted us. Since this was a routing problem he had to transfer the issue to the routing group. At 6:06 AM the on call COX routing engineer contacted us and confirmed what we already knew and stated that he would work on the problem and call us back. At 6:40 AM CARI.net internal monitoring indicated that DNS was once again functioning and some traffic was once again flowing to Level3. At 7:05 AM COX called back indicating that the problem was fixed.
COX will be working to create a full report of the incident. We will not be using the COX service until we receive this report. During the outage, all of CARI.net’s services were internally functioning normally.
=====