The purpose of this tutorial is to configure Apache NiFI to use Kerberos authentication against a Microsoft SQL Server, query the database, convert the output to JSON, and output that data in syslog format. NiFi is capable of doing all of this with minimal configuration.
I'd like to start by apologizing for any of the SQL, Active Directory, or Kerberos terminology that I'll eventually get incorrect. My knowledge of those technologies is minimal at best, e.g. part of the setup was learning how to log into MS-SQL, create a dummy database, and then get authentication working with Kerberos. This process should be much faster if you've got a team that already supports that.
There is a troubleshooting section at the end that walks through all the troubleshooting steps that were used, along with links to the corresponding sites.
The overall purpose of this tutorial is to get Apache NiFi pulling from an MS-SQL database using Kerberos authentication, converting the output to JSON format, and streaming that to a remote log collection platform.
Active Directory and SQL Server Setup
Please note that most of this is for testing purposes. If you already have a "real" SQL Server connected to Active Directory and using Kerberos authentication, then you can likely skip most of this section. I'm using the steps below since I needed to create a Windows lab for this guide.
Configuring the Windows Environment
On the Microsoft side, I deployed Windows Server 2019 for Active Directory services and another Windows Server 2019 for SQL Server 2019. Both are on the domain
YOURDOMAIN.CORP for example purposes. The setup looks like this:
NIFI-DEV servers should be pointed to
AD-00 for DNS resolution. You can check the Kerberos settings with a simple
The Active Directory User Account
I then created a service account called
nifilogin on the
YOURDOMAIN.CORP domain. This is the account that will be used by NiFi to authenticate via Kerberos into SQL Server 2019. For simplicity, I would remove the requirement that the user must change the password at login. Obviously not the best security practice, but there is no easy way to change the password from NiFi. Otherwise, you'll need to log into a workstation with the account and log back out after you've changed the password.
Microsoft SQL Server 2019 Setup
Please note that this was only necessary since I didn't have any sample data to work with. If you've already got SQL Server up and running with data, then there is no reason to load the sample database below. You can skip down to the Configuring Apache NiFi section.
I ended up installing Microsoft SQL Server Management Studio and downloading the AdventureWorks database for testing purposes. I opted for
AdventureWorks2017.bak and saved the file in the directory
C:\Restore. Simply open SSMS, log in with default credentials, right-click on
Databases, and select
Restore Database. Click on
Device, add the
C:\Restore directory, and then add the database backup.
The next step is to create a user account that has access to the database. We'll be using "Windows authentication" for the user authentication type. In SSMS, go to
Security -> Logins, right click, and select
New Login. You'll then use the account you'd created about for the
Login name, e.g. "nifilogin".
As mentioned above, I have zero expertise in Microsoft SQL, so I'm not 100% sure if the following step is necessary. I went to
User Mapping, clicked on
AdventureWorks2017, and mapped it to the user
YOURDOMAIN\nifilogin. Again, if you already have expertise on staff, they should be able to ensure the account has read access to the database you'll be querying.
Configuring Apache NiFi
NiFi can be configured using my previous instructions. Keep in mind it's a Java application and can be deployed on various platforms, e.g. Microsoft Window. In my case, I'm using Ubuntu 20.04 and OpenJDK 11 for the NiFi environment. The NiFi server does not need to be part of the Active Directory environment.
You'll first want to ensure the NiFi server is using the same DNS server as your Active Directory domain. As mentioned above, the NiFi server does not need to join the domain: it just needs to be using the same DNS server. In my lab scenario, DNS is running on the Active Directory server, so I've got everything pointed to
AD-00 as the DNS resolver. This allows the Ubuntu server running NiFi to resolve everything associated with the
Adding the Microsoft SQL Drivers to NiFi
The first step is to add the Microsoft SQL drivers to NiFi. There are multiple versions of the JAR file depending on the Java version that you are using, e.g. if you're using Java 11, then use the Java 11 version of the driver. The drivers can be downloaded from here:
You can also download the driver directly to your NiFi server using
curl. I'm using the preview version of the driver, but you'll likely want the stable version if you're running this in a production environment. Pay attention to the name of the file you're downloading since you'll be using it later.
Configuring Kerberos Settings on NiFi
The first step is to create the Kerberos configuration directory:
The next step is to create the configuration files. This is where having an actual Active Directory administrator is useful: they should be able to provide you with the correct host names where needed. You'll want to swap out the
YOURDOMAIN.CORP references with your actual domain.
Please note that
debug=true can be removed from the configuration if you're not doing any troubleshooting. We'll then need to create another configuration file with the Kerberos settings. I also learned the hard way that Kerberos likes everything to be in upper-case, so you'll likely want to keep it that way unless your Active Directory administrator says otherwise.
The next step is to get these files loaded into the NiFi configuration. Please note the numeric value in
java.arg.20. These should always be sequential. The numbers below should work with the default configuration unless you've already changed the related values in the configuration.
java.arg.20=-Dsun.security.krb5.debug=true line is obviously there to assist with debugging. You can remove it if necessary. Next we'll restart NiFi and tail the log file to ensure there are not any errors:
Create the NiFi Flow
We'll start by logging into the NiFi interface. This is listening on port
8080 by default, e.g.
The first component we'll add is the
DBCPConnectionPool Controller Service. This provides the NiFi flow with the details around the MS-SQL connection, including the authentication components. Click anywhere on the canvas so nothing is selected. You'll then click the cog icon inthe lower-left portion of the canvas, go to
Controller Services, click on the
+ symbol to add a new service, select the
DBCPConnectionPool service, and
This will then add the controller service named
DBCPConnectionPool by default. You'll then click on the cog icon to configure it.
I'll list out the settings for
DBCPConnectionPool below since it's a lot of copying and pasting. We'll also be renaming
MS-SQL in the settings too. Keep in mind that you'll want to swap out the host name, database name, driver location, and credentials with your own values. The
Database Connection URL should be changed to reflect your SQL server and the database name you'll be querying.
In addition, the
Database Driver Location option will include the MS-SQL JAR filename that you downloaded in the previous steps. Be sure to change the
Database User and
Password too. Click
Apply when done followed by the "Enable" lightning-bolt icon. Another window will pop up. You can leave this
Service only and click
The next step and first processor we'll add is the
Once the processor is on the canvas, double-click on it to open the settings. You'll want to adapt the following values to your environment.
The next processor we'll add is the
ConvertAvroToJSON processor. This tells NiFi to take the results from the database query and convert them into JSON format. This works great with a SQL query since you're dealing with name:value pairs. There isn't muich configuration to this processor other than checking the box
Automatically Terminate Relationships.
The next processor we'll add is
SplitJson. By default, the query output will be one giant blob of data. We'll want to split these into individual events that will be converted over to syslog. Drag a
SplitJson processor onto the canvas. You'll only need to change a few values here. Under
Automatically Terminate Relationships. Then, under
$.* for the
JsonPath Expression. This tells it to split every line into a new event.
The last processor we'll add is the
PutUDP processor. This takes those individual events from above and sends them out in native syslog format. Drag the processor onto the canvas and put in the following settings: under
Automatically Terminate Relationships, check the
Success options. This processor is the "last stop" for the flow, so we'll be terminating everything here. Next, you'll want to go into
Properties and plug in the
Port you'll be sending to, e.g.
rsyslog.yourdomain.corp and port
You should now have 4 processors on your canvas. Next week connect them up. We've already specified the terminations, so there shouldn't be anything beyond connecting them up:
You'll end up with this:
The last step is to click on a blank spot in the canvas so nothing is selected, right click on the screen, and select
Start. If all goes well, NiFi should start pulling the events from the database, converting them to JSON, and streaming them out.
Example Output in Elastic
This took quite a bit of trail and error, but I was able to get it working in three different lab environments using the instructions above.
Good luck and happy log collecting!
As mentioned above, this took a lot of troubleshooting before I was able to get everything running as expected. I've listed out as much as possible, including all of the sites that were used in the setup and troubleshooting process. I'd really like to thank the individual that wrote this guide which got me started:
Start with the usual question: is port 1433/TCP open on the SQL server? I was burned by this a few times, mainly because (1) the Windows firewall was enabled and (2) SQL Server does not appear to listen on port 1433/TCP by default. You can start with nmap to determine if the port is open:
If the host is listed as down or the port as
filtered, then you'll want to get that resolved first so NiFi can connect. I first configured SQL Server to listen on
1433/TCP using the following steps:
Microsoft SQL Server 2019 ->
SQL Server 2019 Configuration Management. Click on
SQL Server Network Configuration ->
Protocols for MSSQLSERVER. On the right, ensure that
TCP/IP is set to
Enabled. You'll also need to restart the service if it wasn't enabled.
...but I was still getting
filtered as the response. My next guess was the Windows firewall blocking the connection. Sure enough, I disabled the firewall and was able to connect to the port. This is a horrible security practice though, so you'll want to open the port to just the NiFi host if it isn't already. You should see something like this if everything is working as expected at the network layer:
Kerberos with Service Principal Name
I'm not going to pretend that I understand how SPNs are supposed to function. Fortunately SQL Server seems to enable one automatically after it's configured:
MSSQLSvc/SQL-00.YOURDOMAIN.CORP line: that means SQL is registered. If you're unfortunate enough that one doesn't exist, here are some extended reading material and troubleshooting commands on the topic that were used:
Verifying That Kerberos Is Being Used in SQL Server
Here are instructions that can be used to determine if any of the accounts are using Kerberos instead of NTLM, along with the associated query:
You're good to go if it says
KERBEROS as the output for any of the existing session. Here's another query that provides more details than above:
You'll probably see a mix of NTLM and Kerberos in the output. I didn't have a session listing
KERBEROS until I got the authentication from NiFi working.
Google is your best friend here. Kerberos spits out a lot of cryptic errors. I tracked most of them down along with recommended solutions by just searching for them. However, to get the Java Kerberos errors, you'll first need to enable the debugging options that were included in the instructions. Specifically, the
java.arg.20=-Dsun.security.krb5.debug=true options that were included in the instructions above.
Once those are enabled, you can view the logs in two different places:
Here is some sample output from the
Cryptic, but I realized that I had references to the wrong domain in there.
Here is output from the
In this case it was easy to determine that port 1433/TCP was not responding.