A large, enterprise-class application involves many servers. As the number of servers increases, the challenges to troubleshoot or investigate any application failure increases multifold.

Thus, there is a need to integrate different sources of data for a consolidated view – to help improve developers/testers’ productivity and quickly resolve application/environment issues. Moreover, the solution needs a browser-based interface as opposed to providing administrative privilege, remote access to the server. Splunk captures, indexes and correlates real-time data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualizations.

Our Usage of Splunk

For enterprise customers, we have a private cloud setup to manage various environments/servers. The applications are quite complex, with many subsystems/REST services that depend on each other.

We have configured Splunk to monitor service logs (hard disk-based text files), event viewer data, etc. on each of the servers. It pulls the data from the different servers and displays it in a thin web browser client. Splunk also includes numerous useful analyses out of the box to help understand the reoccurrence/pattern of the issues.

If an application error occurs in a cluster environment with multiple servers, we are not immediately sure at which node the error occurred — so developers/testers have to depend on build engineers to login to each server remotely to identify and validate the event viewer/service logs and troubleshoot issues on a trial-and-error basis. With Splunk, we are able to pull different servers’ data in a single view and identify which server caused the issue. This helps to remove admin dependency by providing real-time and historical data, all accessible over the network.

Custom Dashboard Using Splunk

Apart from the general application error logs, system logs and event viewer data, we also need to monitor application-specific information. So we developed an ASP.Net web dashboard application, leveraging the Splunk (SDK) platform, to monitor REST Services status, Database connectivity, App Settings, Environment/Server Information for better troubleshooting.

 The main components of this implementation are (refer to Fig. 1):

  • Agent
  • Splunk Universal Forwarder
  • Splunk Central Receiver
  • Custom Web Application
Data Flow from App
Fig. 1: Data Flow from App Server/Workflow server to Splunk web UI/Custom Dashboard App

Agent

The Agent is a PowerShell script that runs (as a scheduled job) on an environment and collects the information like services status, assembly versions, data base status and machine configuration. It logs the collected information as XML to a location (file system) which is monitored by the Splunk forwarder.

Splunk Universal Forwarder

The Splunk Universal Forwarder is installed on all the environments to be monitored. The forwarder will monitor a folder (configured during installation) and filters the files based on the filters (like *.Log, *.Xml, *.txt etc if any), then forwards the files to the Splunk Central Receiver (to a specified port).

Splunk Central Receiver

The Splunk Central Receiver is configured on a server to receive the information on a particular port and logs them as Splunk events. These events can be queried using the Splunk web client or any custom application using the Splunk SDK.

Splunk Search application
Fig. 2: Splunk search application (web)

Custom Web Application

The custom web application uses Splunk SDK to connect to the Splunk central receiver host, queries the required information and displays it on the dashboard UI.

Please refer below for:

  • Code snippet to access splunk SDK
  • Dashboard snapshot of system information
  • Dashboard snapshot of service information
Host = “localhost” // servername
Port = 8080
userName=”admin”
password=”changeMe”
searchKey=”Sample-XML-FileName.xml”
searchQuery =”sourcetype=xml | sort _time | tail 1”

       public XmlDocument GetXmlFromSplunk(string host, int port, string userName, string password, string searchKey, string searchQuery)
       {
            var retVal = new XmlDocument();

            try
            {
                var source = searchKey + "-LogFile.xml";
                source = source.ToLower().Replace(" ", string.Empty);

                var searchString = string.Format("search source=*{0}* {1}", source, searchQuery);
                this.Log("Search String: " + searchString, null);

                var svcArgs = new Splunk.ServiceArgs
                    {
                        Host = host,
                        Port = port
                    };
                var service = new Splunk.Service(svcArgs);

                service = service.Login(userName, password);

                var jobArgs = new Splunk.JobArgs
                    {
                        ExecutionMode = Splunk.JobArgs.ExecutionModeEnum.Normal
                    };

                var queryArgs = new Splunk.Args();

                using (var queryResultStream = service.Oneshot(searchString))
                {
                    using (var resultsReader = new Splunk.ResultsReaderXml(queryResultStream))
                    {
                        foreach (var result in resultsReader)
                        {
                            var eventSource = result["source"].ToString().ToLower().Replace(" ", string.Empty);

                            var rawXml = result["_raw"].ToString();

                            try
                            {
                                retVal.LoadXml(rawXml);
                                break;
                            }
                            catch (Exception exception)
                            {
                                // if the xml is malformed, keep trying prior entries.
                                this.Log("Invalid XML from Splunk : " + rawXml, exception);
                            }
                        }
                    }
                }

                return retVal;
            }
            catch (Exception exception)
            {
                return null;
            }
        }

 Fig.3: Code snippets for Splunk SDK

Dashboard Snapshot
Fig. 4: Dashboard snapshot of System information
Dashboard Snapshot
Fig. 5: Dashboard snapshot of Service Information