Splunk has a fairly robust API. However, you'll occasionally get into situations where you need the data exported out of Splunk as syslog. This guide walks you through connecting to Splunk with Apache NiFi, pulling data in batches from Splunk via the API, and sending it out as syslog from NiFi. Similar to the Elasticsearch tutorial, the data is near real-time. In this case, we'll have NiFi querying Splunk for 1 minute of data from 5 minutes prior, with the query executing every minute.
The first step is to configure the
GetSplunk processor. Click on
Processor in the top-left corner on the UI and drag it over to the canvas.
You'll then select
GetSplunk from the list. Once the processor is on the canvas, double-click on it to access the settings. Keep in mind that we'll be running this every minute asking for 1 minute of data from 5 minutes prior. For example, if it's 12:10, then we'd be asking for the data from 12:04 - 12:05. The next minute we'd be asking for 12:05 - 12:06, then 12:06 - 12:07, etc.
Scheduling tab, put in
1 min for the
Next click on the
Properties tab. You'll want to plug in the following values. Keep in mind that my query is fairly simple: it just asks for everything in the
syslog index on Splunk. You'll want to modify this for your environment.
Scheme: https Hostname: searchhead.splunk.company.internal Port: 8089 Query: search index="syslog" Time Field Strategy: Event Time Time Range Strategy: Provided Earliest Time: -6m@m Latest Time: -5m@m Time Zone: UTC Application: Owner: Token: Username: yoursplunkusername Password: yoursplunkpassword Security Protocol: TLSv1_2 Output Mode: RAW
Apply at the bottom once you're finished. By default, this will come out as one giant blob of text. We'll split this up with the next processor.
For reference: if you wanted to pull an hour of logs every hour, e.g. batched logs, then you'd change the
Earliest Time to
-2h@h and the
Latest Time to
-1h@h. You'd then set the scheduling above so it runs every hour instead of every minute.
Next we'll use the
SplitText processor to chop up the previous blob of data into individual events. Drag a
SplitText processor onto the canvas and double-click it to access the settings.
First, click on the
Settings tab. Check
Automatically Terminate Relationships. Why? We're going to drop everything except for the split data. We don't need anything else.
Next, click on the
Properties tab and enter the following:
Line Split Count: 1
The rest of the values can remain at the defaults. This tells NiFi to split each single line into an individual event. Click
Apply when done. We'll now connect the
GetSplunk processor to the
SplitText processor. Draw a line between the two and click
Add at the bottom of the pop-up. You'll end up with this:
Now that we have the individual events, we're going to send them to the
PutUDP processor and redirect the events back out of NiFi.
PutUDP processor to the canvas. Double-click again to open up the processor settings. Under the
Settings tab, check the boxes next to
Automatically Terminate Relationships. This is the last processor in the flow, so we'll be dropping everything after it hits this processor.
Next click on the
Properties tab. There isn't much to change here.
Hostname: syslogdestination.company.internal Port: 514
Apply when done. Similar to above, you'll then connect the
SplitText processor to the
PutUDP processor. Select
Splits under the
For Relationships section when it pops up.
Enabling the Flow
Finally, we'll need to enable everything we just created. Right-click anywhere on the blank canvas and select
Start. Assuming there aren't any mistakes, the flow should fire up, start pulling data from Splunk, and streaming it back out in UDP.
The final product should look something like this:
If we watch the output with tcpdump, we should see something like this: