should you ever encounter a messed up domain controller in your environment where the usual fixes do not help and you don’t have a virtualization platform to simply deploy a new one, this guide is for you.
It will guide you how to be effective and recover the server to full power, even if it is a remote machine, without on-site visit in shortest amount of time with minimal downtime for users.
Note: This guide applies to forests with multiple domain controllers only.
First of all, I’ll list some of the common event viewer IDs below which are related to domain controller being out of order and can be observed in similar scenarios.
Event ID: 4
The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server #####. The target name used was ldap/#####/#####.local@#####.LOCAL. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Please ensure that the target SPN is registered on, and only registered on, the account used by the server. This error can also happen when the target service is using a different password for the target service account than what the Kerberos Key Distribution Center (KDC) has for the target service account. Please ensure that the service on the server and the KDC are both updated to use the current password. If the server name is not fully qualified, and the target domain (#####.LOCAL) is different from the client domain (#####.LOCAL), check if there are identically named server accounts in these two domains, or use the fully-qualified name to identify the server.
Event ID 1311
The Knowledge Consistency Checker (KCC) has detected problems with the following directory partition.
There is insufficient site connectivity information for the KCC to create a spanning tree replication topology. Or, one or more directory servers with this directory partition are unable to replicate the directory partition information. This is probably due to inaccessible directory servers.
Perform one of the following actions:
– Publish sufficient site connectivity information so that the KCC can determine a route by which this directory partition can reach this site. This is the preferred option.
– Add a Connection object to a directory service that contains the directory partition in this site from a directory service that contains the same directory partition in another site.
If neither of the tasks correct this condition, see previous events logged by the KCC that identify the inaccessible directory servers.
Event ID 4000
The DNS server was unable to open Active Directory.
Event ID 5719
This computer was not able to set up a secure session with a domain controller in domain ##### due to the following:
There are currently no logon servers available to service the logon request.
This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.
If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.
In order to resolve this issue, instead of wasting crazy amount of time trying to dig through AD database, inspecting all DS connections, replication status and overall health of the server, you should try doing the following.
First, create a VPN tunnel towards the working domain controller in your forest if you already don’t have one. More on optimal VPN tunnels and configs to come in next blog posts.
After that make sure to update your client’s computers DNS settings to use the live and healthy DNS servers in your environment, in case they were using the corrupted DC as their DNS server.
You can use this simple cmdlet to do that across machines. Just make sure to use the correct Index and DNS servers.
Set-DnsClientServerAddress -InterfaceIndex 12 -ServerAddresses (“10.0.0.1”,“10.0.0.2”)
The next step is to forcefully decommission the corrupted DC. From Server 2012+ you can do it through Server Manager. You need to start uninstall of ADDS services to be able to demote it.
For Server 2008 or below you should run command “dcpromo /forceremoval”.
Once the demote completes, reboot the server and rejoin it to the domain, followed by another reboot.
On your PDC, delete the ADDS site in which affected DC was stored. Create a new site with slightly different name.
Promote your affected DC again, using the SM or dcpromo command. Tick advanced mode in case you use dcpromo.
You should now be having a healthy domain controller within 20 minutes.