Log ingestion has traditionally been a problem in large environments. Between different log formats, configuration options, and numerous input and output methods, what seems like a simple endeavor can turn into a massive time sink. Behold Apache NiFi: it’s designed specifically to process and distribute data. Best of all, NiFi is fairly simple to get up and running.
I’m currently running NiFi on Ubuntu 19.10 Eoan Ermine. NiFi was installed using the official documentation along with references to a secondary guide that was a bit more clear. Once it’s up and running, you should be able to connect through your web browser on port
My first use case was simple: leverage NiFi as a basic syslog load balancer. Send data to NiFi, do not modify the data, and then redistribute it to different destinations in a round-robin format. Let’s start with the first step of creating a UDP listener to receive syslog events. There is a syslog processor that will check RFC compliance, but we really don’t care about the log format and just want to pass it on to the destination nodes. In that case we’ll use the
ListenUDP processor with a few simple settings:
Basically you just need to define the port to listen on, with
514 being the default for syslog. Next we’ll want to create a
Process Group. The process group is a group of objects that are combined to form a single logical object. We can then route data to the object and modify the load balancing method of a single connection to that group. Once you’ve created the process group, double-click on it so you can add new processors inside the group.
Once in the “Load Balancer” process group, we’ll first add the syslog destinations. We’ll use the
PutUDP processor for this. Open up the settings for the process and select
Automatically Terminate Relationships. This basically tells the processor the flush the syslog event after it’s been routed. Next, under properties, put in the
port. The hostname is the syslog destination and the port will likely be
Once this is done, copy and paste the process to reflect the number of syslog destinations in the load balance. In this example we’ll have 3 of them. Copy and paste the original process so there is 3 total. Then go into each one and plug in a new hostname. You’ll ultimately end up with 3
PutUDP processors, with each one having a different syslog destination.
Next we’ll create an
Input Port. This basically acts as the interface to the process group. There’s no real configuration on the input port other than giving it a name. Once you’d done that, connect it to the
PutUDP processors that you’d previously created.
Finally, click on Apache NiFi in the bottom-left corner of your web browser. This will take you out of the process group and back onto the main canvas. From here we’ll connect the original
GetUDP processor to the
PutUDP process group that we’d created. The settings on the connection between the two will allow you to specify the load balancer settings.
In my example, I’ve increased the
Back Pressure Object Threshold from
1,000,000 objects. This will allow 1M UDP packets to remain in the queue. I’ve also selected
Round Robin as the Load Balance Strategy and
FirstInFirstOut as the Prioritizer.
The next step is to start everything. Simply right-click on any open space on the canvas and select Start. You should see everything turn from red to green. If everything is working correctly, the syslog events should get received by NiFi and then routed back out to the 3 syslog destinations in round-robin format. To test this, I ran
tcpdump to look at what was being sent:
Success! We can see it is being load balanced as expected. Hopefully this is a good starting point for what can be done with Apache NiFi.