The purpose of this tutorial is to configure Apache NiFI to use Kerberos authentication against a Microsoft SQL Server, query the database, convert the output to JSON, and output that data in syslog format. NiFi is capable of doing all of this with minimal configuration.
Overview
I'd like to start by apologizing for any of the SQL, Active Directory, or Kerberos terminology that I'll eventually get incorrect. My knowledge of those technologies is minimal at best, e.g. part of the setup was learning how to log into MS-SQL, create a dummy database, and then get authentication working with Kerberos. This process should be much faster if you've got a team that already supports that.
There is a troubleshooting section at the end that walks through all the troubleshooting steps that were used, along with links to the corresponding sites.
The overall purpose of this tutorial is to get Apache NiFi pulling from an MS-SQL database using Kerberos authentication, converting the output to JSON format, and streaming that to a remote log collection platform.
Active Directory and SQL Server Setup
Disclaimer
Please note that most of this is for testing purposes. If you already have a "real" SQL Server connected to Active Directory and using Kerberos authentication, then you can likely skip most of this section. I'm using the steps below since I needed to create a Windows lab for this guide.
Configuring the Windows Environment
On the Microsoft side, I deployed Windows Server 2019 for Active Directory services and another Windows Server 2019 for SQL Server 2019. Both are on the domain YOURDOMAIN.CORP
for example purposes. The setup looks like this:
The SQL-00
and NIFI-DEV
servers should be pointed to AD-00
for DNS resolution. You can check the Kerberos settings with a simple nslookup
query:
The Active Directory User Account
I then created a service account callednifilogin
on the YOURDOMAIN.CORP
domain. This is the account that will be used by NiFi to authenticate via Kerberos into SQL Server 2019. For simplicity, I would remove the requirement that the user must change the password at login. Obviously not the best security practice, but there is no easy way to change the password from NiFi. Otherwise, you'll need to log into a workstation with the account and log back out after you've changed the password.
Microsoft SQL Server 2019 Setup
Please note that this was only necessary since I didn't have any sample data to work with. If you've already got SQL Server up and running with data, then there is no reason to load the sample database below. You can skip down to the Configuring Apache NiFi section.
I ended up installing Microsoft SQL Server Management Studio and downloading the AdventureWorks database for testing purposes. I opted for AdventureWorks2017.bak
and saved the file in the directory C:\Restore
. Simply open SSMS, log in with default credentials, right-click on Databases
, and select Restore Database
. Click on Device
, add the C:\Restore
directory, and then add the database backup.
The next step is to create a user account that has access to the database. We'll be using "Windows authentication" for the user authentication type. In SSMS, go to Security -> Logins
, right click, and select New Login
. You'll then use the account you'd created about for the Login name
, e.g. "nifilogin".
As mentioned above, I have zero expertise in Microsoft SQL, so I'm not 100% sure if the following step is necessary. I went to User Mapping
, clicked on AdventureWorks2017
, and mapped it to the user YOURDOMAIN\nifilogin
. Again, if you already have expertise on staff, they should be able to ensure the account has read access to the database you'll be querying.
Configuring Apache NiFi
Background
NiFi can be configured using my previous instructions. Keep in mind it's a Java application and can be deployed on various platforms, e.g. Microsoft Window. In my case, I'm using Ubuntu 20.04 and OpenJDK 11 for the NiFi environment. The NiFi server does not need to be part of the Active Directory environment.
Kerberos Basics
You'll first want to ensure the NiFi server is using the same DNS server as your Active Directory domain. As mentioned above, the NiFi server does not need to join the domain: it just needs to be using the same DNS server. In my lab scenario, DNS is running on the Active Directory server, so I've got everything pointed to AD-00
as the DNS resolver. This allows the Ubuntu server running NiFi to resolve everything associated with the YOURDOMAIN.CORP
domain.
Adding the Microsoft SQL Drivers to NiFi
The first step is to add the Microsoft SQL drivers to NiFi. There are multiple versions of the JAR file depending on the Java version that you are using, e.g. if you're using Java 11, then use the Java 11 version of the driver. The drivers can be downloaded from here:
You can also download the driver directly to your NiFi server using wget
or curl
. I'm using the preview version of the driver, but you'll likely want the stable version if you're running this in a production environment. Pay attention to the name of the file you're downloading since you'll be using it later.
Configuring Kerberos Settings on NiFi
The first step is to create the Kerberos configuration directory:
The next step is to create the configuration files. This is where having an actual Active Directory administrator is useful: they should be able to provide you with the correct host names where needed. You'll want to swap out the YOURDOMAIN.CORP
references with your actual domain.
Please note that debug=true
can be removed from the configuration if you're not doing any troubleshooting. We'll then need to create another configuration file with the Kerberos settings. I also learned the hard way that Kerberos likes everything to be in upper-case, so you'll likely want to keep it that way unless your Active Directory administrator says otherwise.
The next step is to get these files loaded into the NiFi configuration. Please note the numeric value in java.arg
, e.g. java.arg.20
. These should always be sequential. The numbers below should work with the default configuration unless you've already changed the related values in the configuration.
The java.arg.20=-Dsun.security.krb5.debug=true
line is obviously there to assist with debugging. You can remove it if necessary. Next we'll restart NiFi and tail the log file to ensure there are not any errors:
Create the NiFi Flow
We'll start by logging into the NiFi interface. This is listening on port 8080
by default, e.g. http://nifi-dev.otherdomain.internal:8080
.
The first component we'll add is the DBCPConnectionPool Controller Service
. This provides the NiFi flow with the details around the MS-SQL connection, including the authentication components. Click anywhere on the canvas so nothing is selected. You'll then click the cog icon inthe lower-left portion of the canvas, go to Controller Services
, click on the +
symbol to add a new service, select the DBCPConnectionPool
service, and ADD
.
This will then add the controller service named DBCPConnectionPool
by default. You'll then click on the cog icon to configure it.
I'll list out the settings for DBCPConnectionPool
below since it's a lot of copying and pasting. We'll also be renaming DBCPConnectionPool
to MS-SQL
in the settings too. Keep in mind that you'll want to swap out the host name, database name, driver location, and credentials with your own values. The Database Connection URL
should be changed to reflect your SQL server and the database name you'll be querying.
In addition, the Database Driver Location
option will include the MS-SQL JAR filename that you downloaded in the previous steps. Be sure to change the Database User
and Password
too. Click Apply
when done followed by the "Enable" lightning-bolt icon. Another window will pop up. You can leave this Service only
and click Enable
.
The next step and first processor we'll add is the QueryDatabaseTable
processor:
Once the processor is on the canvas, double-click on it to open the settings. You'll want to adapt the following values to your environment.
The next processor we'll add is the ConvertAvroToJSON
processor. This tells NiFi to take the results from the database query and convert them into JSON format. This works great with a SQL query since you're dealing with name:value pairs. There isn't muich configuration to this processor other than checking the box Failure
under Automatically Terminate Relationships
.
The next processor we'll add is SplitJson
. By default, the query output will be one giant blob of data. We'll want to split these into individual events that will be converted over to syslog. Drag a SplitJson
processor onto the canvas. You'll only need to change a few values here. Under Settings
, check Failure
and Original
under Automatically Terminate Relationships
. Then, under Properties
, add $.*
for the JsonPath Expression
. This tells it to split every line into a new event.
The last processor we'll add is the PutUDP
processor. This takes those individual events from above and sends them out in native syslog format. Drag the processor onto the canvas and put in the following settings: under Automatically Terminate Relationships
, check the Failure
and Success
options. This processor is the "last stop" for the flow, so we'll be terminating everything here. Next, you'll want to go into Properties
and plug in the Hostname
and Port
you'll be sending to, e.g. rsyslog.yourdomain.corp
and port 514
.
You should now have 4 processors on your canvas. Next week connect them up. We've already specified the terminations, so there shouldn't be anything beyond connecting them up:
You'll end up with this:
The last step is to click on a blank spot in the canvas so nothing is selected, right click on the screen, and select Start
. If all goes well, NiFi should start pulling the events from the database, converting them to JSON, and streaming them out.
Example Output in Elastic
This took quite a bit of trail and error, but I was able to get it working in three different lab environments using the instructions above.
Good luck and happy log collecting!
Troubleshooting Steps
As mentioned above, this took a lot of troubleshooting before I was able to get everything running as expected. I've listed out as much as possible, including all of the sites that were used in the setup and troubleshooting process. I'd really like to thank the individual that wrote this guide which got me started:
SQL Server
Start with the usual question: is port 1433/TCP open on the SQL server? I was burned by this a few times, mainly because (1) the Windows firewall was enabled and (2) SQL Server does not appear to listen on port 1433/TCP by default. You can start with nmap to determine if the port is open:
If the host is listed as down or the port as filtered
, then you'll want to get that resolved first so NiFi can connect. I first configured SQL Server to listen on 1433/TCP
using the following steps:
Go to Start
-> Microsoft SQL Server 2019
-> SQL Server 2019 Configuration Management
. Click on SQL Server Network Configuration
-> Protocols for MSSQLSERVER
. On the right, ensure that TCP/IP
is set to Enabled
. You'll also need to restart the service if it wasn't enabled.
...but I was still getting filtered
as the response. My next guess was the Windows firewall blocking the connection. Sure enough, I disabled the firewall and was able to connect to the port. This is a horrible security practice though, so you'll want to open the port to just the NiFi host if it isn't already. You should see something like this if everything is working as expected at the network layer:
Kerberos with Service Principal Name
I'm not going to pretend that I understand how SPNs are supposed to function. Fortunately SQL Server seems to enable one automatically after it's configured:
Notice the MSSQLSvc/SQL-00.YOURDOMAIN.CORP
line: that means SQL is registered. If you're unfortunate enough that one doesn't exist, here are some extended reading material and troubleshooting commands on the topic that were used:
- https://petri.com/how-to-use-setspn-to-set-active-directory-service-principal-names-2
- https://docs.microsoft.com/en-us/sql/connect/jdbc/using-kerberos-integrated-authentication-to-connect-to-sql-server?view=sql-server-ver15
- https://www.sqlshack.com/overview-of-service-principal-name-and-kerberos-authentication-sql-server/
- https://www.red-gate.com/simple-talk/sql/database-administration/questions-about-kerberos-and-sql-server-that-you-were-too-shy-to-ask/
- https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/register-a-service-principal-name-for-kerberos-connections?view=sql-server-ver15
Verifying That Kerberos Is Being Used in SQL Server
Here are instructions that can be used to determine if any of the accounts are using Kerberos instead of NTLM, along with the associated query:
You're good to go if it says KERBEROS
as the output for any of the existing session. Here's another query that provides more details than above:
You'll probably see a mix of NTLM and Kerberos in the output. I didn't have a session listing KERBEROS
until I got the authentication from NiFi working.
Troubleshooting NiFi
Google is your best friend here. Kerberos spits out a lot of cryptic errors. I tracked most of them down along with recommended solutions by just searching for them. However, to get the Java Kerberos errors, you'll first need to enable the debugging options that were included in the instructions. Specifically, the debug=true
and java.arg.20=-Dsun.security.krb5.debug=true
options that were included in the instructions above.
Once those are enabled, you can view the logs in two different places:
Here is some sample output from the nifi-bootstrap.log
file:
Cryptic, but I realized that I had references to the wrong domain in there.
Here is output from the nifi-app.log
file:
In this case it was easy to determine that port 1433/TCP was not responding.