First day back at work I already had the chance to get my hands dirty with an ADFS issue at a customer. The customer had an INTERNAL.contoso.com domain and an EXTERNAL.contoso.com domain. Both were connected with a two-way forest trust. The INTERNAL domain also had an ADFS farm. Now they wanted both users from INTERNAL and EXTERNAL to be authenticated by that ADFS. Technically this is possible through the AD trust. Nothing special there, the catch was that they wanted both INTERNAL and EXTERAL users to authenticate using @contoso.com usernames. Active Directory has no problems authenticating users with an UPN different with that from the domain. You can even share the UPN suffix namespace in more than one domain, but… you cannot route shared suffixes cross the forest trust! In our case that would mean the ADFS instance would be able to authenticate firstname.lastname@example.org but not email@example.com as there would be no way to locate that user in the other domain.
Alternate Login ID to the rescue! Alternate Login ID is a feature on ADFS that allows you to specify an additional attribute to be used for user lookups. Most commonly “mail” is used for this. This allows people to leave the UPN, commonly a non public domain (e.g. contoso.local), untouched. Although I’m mostly advising to change the UPN to something public (e.g. contoso.com). The cool thing about Alternate Login ID is that you can specify one or more LookupForests! In our case the command looked like:
Some more information about Alternate Login ID: TechNet: Configuring Alternate Login ID
Remark: When alternate login ID feature is enabled, AD FS will try to authenticate the end user with alternate login ID first and then fall back to use UPN if it cannot find an account that can be identified by the alternate login ID. You should make sure there are no clashes between the alternate login ID and the UPN if you want to still support the UPN login. For example, setting one’s mail attribute with the other’s UPN will block the other user from signing in with his UPN.
Now where’s the issue? We could authenticate INTERNAL users just fine, but EXTERNAL users were getting an error:
The Federation Service failed to issue a token as a result of an error during processing of the WS-Trust request.
Activity ID: 00000000-0000-0000-5e95-0080000000f1
Request type: http://schemas.microsoft.com/idfx/requesttype/issue
System.Security.Principal.IdentityNotMappedException: Some or all identity references could not be translated.
at System.Security.Principal.SecurityIdentifier.Translate(IdentityReferenceCollection sourceSids, Type targetType, Boolean forceSuccess)
at System.Security.Principal.SecurityIdentifier.Translate(Type targetType)
at Microsoft.IdentityServer.Service.Tokens.MSISWindowsUserNameSecurityTokenHandler.AddClaimsInWindowsIdentity(UserNameSecurityToken usernameToken, WindowsClaimsIdentity windowsIdentity, DateTime PasswordMustChange)
at Microsoft.IdentityServer.Service.Tokens.MSISWindowsUserNameSecurityTokenHandler.ValidateTokenInternal(SecurityToken token)
at Microsoft.IdentityServer.Service.Tokens.MSISWindowsUserNameSecurityTokenHandler.ValidateToken(SecurityToken token)
at Microsoft.IdentityModel.Tokens.SecurityTokenHandlerCollection.ValidateToken(SecurityToken token)
at Microsoft.IdentityServer.Web.WSTrust.SecurityTokenServiceManager.GetEffectivePrincipal(SecurityTokenElement securityTokenElement, SecurityTokenHandlerCollection securityTokenHandlerCollection)
at Microsoft.IdentityServer.Web.WSTrust.SecurityTokenServiceManager.Issue(RequestSecurityToken request, IList`1& identityClaimSet)
Now the weird part: just before the error I was seeing a successful login for that particular user:
I decided to start my search with this part: System.Security.Principal.IdentityNotMappedException: Some or all identity references could not be translated. That led me to all kind of blogs/posts where people were having issue with typo’s in scripts or with users that didn’t exist in AD. But that wasn’t the case with me, after all, I just had a successful authentication! Using the first line of the stack trace: at System.Security.Principal.SecurityIdentifier.Translate(IdentityReferenceCollection sourceSids, Type targetType, Boolean forceSuccess) I took an educated guess of what the ADFS service was trying to do. And I was able to do the same using PowerShell
And yes I got the same error!:
At first sight this gave me nothing. But this was actually quite powerful: I was now able to reproduce the issue as many times as I liked, no need to go through the logon pages and most importantly: I could now take this PowerShell code and execute it on other servers! This way I could determine whether it was OS related, AD related, trust related,… I found out the following:
- Command fails on ADFS-SRV-01
- Command fails on ADFS-SRV-02
- Command fails on WEB-SRV-01
- Command runs on HyperV-SRV-01
- Command runs on DC-INTERNAL-01
Now what did this learned me:
- The command is fine and should work
- The command runs fine on other 2012 R2 servers
- The command runs fine on a member server (the Hyper-V server)
As I was getting nowhere with this I decided to take a Network Trace on the ADFS server while executing the PowerShell command. I expected to see one of the typical SID translation methods (TechNet: How SIDs and Account Names Can Be Mapped in Windows) to appear. However absolutely nothing appeared?! No outgoing traffic related to this code. Now wtf? I had found this article: ASKDS: Troubleshooting SID translation failures from the obvious to the not so obvious but that wouldn’t help me if there was no traffic to begin with.
Suddenly an idea popped up in my head. What if the network traffic wasn’t showing any SID resolving because the machine looked locally? And why would the machine look locally? Perhaps if the domain portion of the machine SID is the same as that of the user we were looking up? But they’re in different domains… However, there’s also the machine’s local SID! The one that is typically never encountered or seen! Here’s some info on it: Mark Russinovich: The Machine SID Duplication Myth (and Why Sysprep Matters)
I didn’t took the time to find out whether I could retrieve it’s value with PowerShell or so, but I just took PsGetsid.exe from SysInternals. This is what the command showed me for the ADFS server:
Bazinga! It seemed the local SID of all the machines that were failing the command were the same as the domain portion of the EXTERNAL domain SIDs! Now I asked to customer if he could deploy a new test server so I could reproduce the issue one more time. Indeed the issue appeared again. The local SID was again identical. Running sysprep on the server changed the local SID and after joining the server again to the domain we were able to succesfully execute the PowerShell commands!
The customer had been copying the same VHD over and over again without actually running sysprep on it… As the EXTERNAL domain was also created on a VM from that image the Domain Controller promotion process choose that local SID as base for the EXTERNAL domain SID. My customer choose to resolve this issue by destroying the EXTERNAL domain and setting it up again. Obviously this does not solve the fact that several servers were not sysprepped, and in the future this might cause other issues…
For a template you can run sysprep with generalize and the shutdown option:
Each time you boot a copy of your template it will run the sysprep process at first boot.
P.S. Don’t run sysprep on a machine with software/services installed. It might have a nasty outcome…