Summary
Splunk has a fairly robust API. However, you'll occasionally get into situations where you need the data exported out of Splunk as syslog. This guide walks you through connecting to Splunk with Apache NiFi, pulling data in batches from Splunk via the API, and sending it out as syslog from NiFi. Similar to the Elasticsearch tutorial, the data is near real-time. In this case, we'll have NiFi querying Splunk for 1 minute of data from 5 minutes prior, with the query executing every minute.
Configuration
NiFi Setup
This guide assumes you already have NiFi up and running. If not, I'd previously written a guide on how to get it installed. You can also do a quick and easy install with Docker.
GetSplunk Processor
The first step is to configure the GetSplunk
processor. Click on Processor
in the top-left corner on the UI and drag it over to the canvas.
You'll then select GetSplunk
from the list. Once the processor is on the canvas, double-click on it to access the settings. Keep in mind that we'll be running this every minute asking for 1 minute of data from 5 minutes prior. For example, if it's 12:10, then we'd be asking for the data from 12:04 - 12:05. The next minute we'd be asking for 12:05 - 12:06, then 12:06 - 12:07, etc.
Under the Scheduling
tab, put in 1 min
for the Run Schedule
.
Next click on the Properties
tab. You'll want to plug in the following values. Keep in mind that my query is fairly simple: it just asks for everything in the syslog
index on Splunk. You'll want to modify this for your environment.
Scheme: https
Hostname: searchhead.splunk.company.internal
Port: 8089
Query: search index="syslog"
Time Field Strategy: Event Time
Time Range Strategy: Provided
Earliest Time: -6m@m
Latest Time: -5m@m
Time Zone: UTC
Application:
Owner:
Token:
Username: yoursplunkusername
Password: yoursplunkpassword
Security Protocol: TLSv1_2
Output Mode: RAW
Click Apply
at the bottom once you're finished. By default, this will come out as one giant blob of text. We'll split this up with the next processor.
For reference: if you wanted to pull an hour of logs every hour, e.g. batched logs, then you'd change the Earliest Time
to -2h@h
and the Latest Time
to -1h@h
. You'd then set the scheduling above so it runs every hour instead of every minute.
SplitText Processor
Next we'll use the SplitText
processor to chop up the previous blob of data into individual events. Drag a SplitText
processor onto the canvas and double-click it to access the settings.
First, click on the Settings
tab. Check failure
and original
under Automatically Terminate Relationships
. Why? We're going to drop everything except for the split data. We don't need anything else.
Next, click on the Properties
tab and enter the following:
Line Split Count: 1
The rest of the values can remain at the defaults. This tells NiFi to split each single line into an individual event. Click Apply
when done. We'll now connect the GetSplunk
processor to the SplitText
processor. Draw a line between the two and click Add
at the bottom of the pop-up. You'll end up with this:
Now that we have the individual events, we're going to send them to the PutUDP
processor and redirect the events back out of NiFi.
PutUDP Processor
Drag a PutUDP
processor to the canvas. Double-click again to open up the processor settings. Under the Settings
tab, check the boxes next to Failure
and Success
under Automatically Terminate Relationships
. This is the last processor in the flow, so we'll be dropping everything after it hits this processor.
Next click on the Properties
tab. There isn't much to change here.
Hostname: syslogdestination.company.internal
Port: 514
Click Apply
when done. Similar to above, you'll then connect the SplitText
processor to the PutUDP
processor. Select Splits
under the For Relationships
section when it pops up.
Enabling the Flow
Finally, we'll need to enable everything we just created. Right-click anywhere on the blank canvas and select Start
. Assuming there aren't any mistakes, the flow should fire up, start pulling data from Splunk, and streaming it back out in UDP.
Conclusion
The final product should look something like this:
If we watch the output with tcpdump, we should see something like this: