Scheduler

SonarGateway includes a scheduler that can be used to deliver reports and CSV files generated using the Gateway. Scheduling of jobs is done within JSON Studio, the scheduler runs as part of the Gateway and the delivery is done by the sonar_dispatcher.py utility.

Scheduler Setup and Internals

Scheduling information is maintained in a MongoDB/SonarW database - therefore to use the scheduler you need to configure the Studio and the Gateway to be able to read and write job data. This is done with the following parameter in the web.xml file:

<context-param>
 <param-name>sonarg.schedUrl</param-name>
 <param-value>mongodb://qa:qa@qasys1.jsonar.com:27117</param-value>
</context-param>

Scheduling data will be saved to a database called lmrm__scheduler. The qa user in this case should have read/write privileges to this database.

A job schedule is maintained within a collection called lmrm__scheduled_jobs and takes the form of:

{
   "_id" : ObjectId("56129e3794361b9dadf68016"),
   "cron" : "0 0 0 29 * ? *",
   "header" : "some header",
   "footer" : "some footer",
   "subject" : "some subject",
   "emails" : "[\"qa1@jsonar.com\", \"qa2@jsonar.com\"]",
   "copies" : "[\"host1\", \"host2\"]",
   "type" : "PDF",
   "url" : "https://localhost:8443/Gateway?name=_b9&col=bar&type=find&output=csv&bind.a=&bind.b=&bind.c=&bind.d=&limit=10&findlimit=10&graphPoints=10&prettyPrint=1&host=127.0.0.1&port=47017&db=bh&sdb=test",
   "sendIfEmpty" : true,
   "user" : "qa",
   "name" : "failures"
}

All job schedules follow the cron conventions and since the Gateway uses the Quartz library, the precise format for the cron strings are defined in http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger.

Header and footer are used when PDFs are generated only. Subject is used for the email subject line. Emails is a string with a JSON array with strings defining the email addresses to send the data to. Copies is a string with a JSON array of targets to which the data should be copied to by the dispatcher.

URL is the Gateway URL that use used to get the CSV data being sent or used in producing the PDF report.

SendIfMpty set to false will not send the data if the result of the Gateway call is empty and is useful when building alerts. For example, when you want to send data about faults, make a query (e.g. an aggregation pipeline) that produces a list of all faults. If the list is empty nothing will be sent.

User denotes the user logged in to the Studio when creating the scheduled job. Each user can only see the scheduled jobs that they created.

Scheduling within JSON Studio

Scheduling is done within JSON Studio on the Publish URL page. When you compute the URL that defines the Web service endpoint for the data/report you can schedule this as a job. Click on the Schedule button to open a calendar display of your current jobs. Click on any open day to add a new job or on an existing job to edit it.

All scheduled jobs are scheduled using a cron string as documented in http://www.quartz-scheduler.org/documentation/quartz-1.x/tutorials/crontrigger. The calendar view will show the job at it’s next scheduled time - not in all it’s scheduled times. For example, if you schedule a job to run every night at 1am it will appear on your calendar as if occurring tomorrow at 1am - it will not show on all days. When you open the scheduler view two days from today it will show as the next day at 1am.

All cron times are UTC time, but when you see the job on your calendar it will be displayed according to the timezone of your browser. But the jobs will fire at times that are per where the Gateway is running.

For example, a cron schedule of “0 0 0 * * ? *

means every day at midnight, and if the Gateway server is running at US Eastern time, then every night at midnight eastern time the job will fire. But when you are looking at the calendar from a browser in eastern time it will show 4am and in a browser in pacific time will show 7am.

Firing Jobs

When a scheduled job fires, the Gateway does not actually perform the action, it merely adds a document to a collection in lmrm__scheduler called lmrm__dispatched_jobs. These directives are then used by the dispatcher to generate and deliver the content. Documents take the form of:

{
   "_id" : ObjectId("56127f9b3004b4949ead7da1"),
   "name" : "j6",
   "timestamp" : ISODate("2015-10-05T13:48:11.010Z"),
   "header" : "some header",
   "footer" : "some footer",
   "subject" : "some subject",
   "emails" : "[\"qa1@jsonar.com\", \"qa2@jsonar.com\"]",
   "copies" : "[\"host1\", \"host2\"]",
   "type" : "PDF",
   "url" : "https://localhost:8443/Gateway?name=_b9&col=bar&type=find&output=csv&bind.a=&bind.b=&bind.c=&bind.d=&limit=10&findlimit=10&graphPoints=10&prettyPrint=1&host=127.0.0.1&port=47017&db=bh&sdb=test",
   "sendIfEmpty" : true
}

Since the Gateway performs scheduling and since all scheduling is based on cron strings, the Gateway must be running for scheduling to occur. If the Gateway is not running when a fire time passes then that firing will not happen retroactively. Once the firing document is inserted into lmrm__disptached_jobs they will be delivered even if the dispatcher is not running (it will be done when the dispatcher comes up).

Dispatcher

The dispatcher is a python utility called dispatcher.py and comes with the Sonar Gateway and SonarG packages. It reads the documents produced by the scheduler from the lmrm__dispatched_jobs collection, gets the data by calling the Gateway, produces a PDF (if asked for), and emails or copies the CSV or PDF.

The Dispatcher configuration parameters are defined in dispatcher.conf, by default located at /opt/sonarfinder/sonarFinder/dispatcher.conf

  • A dispatch section that defines the URI used to connect to the lmrm__scheduler database
  • A section per database from which the data has to be retrieved that specifies the username and password used to connect. The URLs published through the scheduler do not include credentials so the name of the database is used to look this up in the config file.
  • SMTP parameters to allow the Dispatcher to send emails.
  • A section per SCP target to allow the Dispatcher to copy the files to other hosts.
  • The dispatcher can send syslog events per line in a sent CSV. To allow this configuration include a [syslog] section specifying the priority and facility to use for the syslog and configure a local syslogd deamon.
  • dispatcher.conf has multiple entries next to ‘database_password’ named ‘key_file’. If a rsa private key is provided instead of the ‘database_password’, dispatcher will use the binary blob encrypted in the provided key_file. It is doing so by decrypting the key with an empty passphrase, and using the resulting base64 encoding of that blob as the password.

If you are running SonarG the Dispatcher will already be running as a service. If you are deploying Sonar Gateway yourself you should make sure that the Dispatcher is running as a service.

IMPORTANT: The Dispatcher is only tested and certified to run on Linux. It requires Python 2.7 and the following packages must be installed for you to use the dispatcher:

pip install pymongo==2.7.2
pip install requests
pip install paramiko
pip install scpclient
pip install pdfkit
pip install validate_email

sudo apt-get install -y openssl build-essential xorg libssl-dev wkhtmltox

Dispatcher LDAP integration

The system can be set to use LDAP (for example Active Directory) to manage authentication as well as to handle authorization. Once set, users can use their LDAP credentials to login to Sonar.

A section needs to be set in the dispatcher.conf for defining the integration. It includes information about how to connect to the LDAP server and what to search for, mapping between LDAP groups and Sonar roles and a set of rules on how to perform the synchronization.

A scheduled job is required to perform the synchronization process. There could be multiple jobs using different LDAP sections (each has to have a unique name). The job(s) are added like any other job to the scheduler.

Note: only a single external authentication authority is supported by Sonar.

This mechanism allows the full control over users and their privileges in Sonar to be managed through an LDAP server.

Bind Variables and Jobs

Service APIs that generate a report or CSV are often parameterized and you often want to send the same report with different parameters to different people. The bind variable collection feature allows you to do so - defining a single schedule with a single (parameterized) query and having the scheduler/dispatcher send multiple emails/file copies, each with only the appropriate data, to the appropriate person.

To do this you:

  • Define a parameterized query
  • Populate a collection with a set of values, one per combination of bind values plus the target location/email.
  • Associate this collection when you schedule the job using the bind collection fields.

Note that the collection should reside in the lmrm__scheduler database.

Also note that the Studio has a helper utility that creates a template Excel spreadsheet with the right format for you which you can fill in and then import into the database using the Spreadsheets application of the studio.

For example, assume that the API generating the CSV is:

"url" : "https://sonarg.jsonar.com:8443/Gateway?name=__lmrm_predef_fromto_session&col=session&type=find&output=report&bind.From=&
   bind.To=&bind.Server_IP=&bind.Client_IP=&bind.DB_User=&bind.OS_User=&bind.Source Program=&bind.Service_Name=&
   bind.Server_Type=&bind.Login_Succeeded=&prettyPrint=1&host=localhost&port=27117&db=sonargd&sdb=lmrm__sonarg",

This has a long list of bind variables. The one we will focus on is DB_User.

If for example we fill in a collection with the following two documents:

> db.examplebv.find().pretty()
{
   "_id" : 1,
   "Source Program" : ".*",
   "Server_IP" : ".*",
   "Server_Type" : ".*",
   "To" : ISODate("2014-02-05T00:00:00Z"),
   "Client_IP" : ".*",
   "Login_Succeeded" : 1,
   "name" : "test1",
   "copy_list" : [ ],
   "OS_User" : ".*",
   "From" : ISODate("2014-02-01T00:00:00Z"),
   "email_list" : [
      "jane@jsonar.com"
   ],
   "DB_User" : "DB_USER_906",
   "Service_Name" : ".*"
}
{
   "_id" : 2,
   "Source Program" : ".*",
   "Server_IP" : ".*",
   "Server_Type" : ".*",
   "To" : ISODate("2014-02-05T00:00:00Z"),
   "Client_IP" : ".*",
   "Login_Succeeded" : 1,
   "name" : "test1",
   "copy_list" : [ ],
   "OS_User" : ".*",
   "From" : ISODate("2014-02-01T00:00:00Z"),
   "email_list" : [
      "john@jsonar.com"
   ],
   "DB_User" : "DB_USER_1540",
   "Service_Name" : ".*"
}

then Jane and John will each get a CSV but Jane will get all the records relevant to DB_USER_906 and John will get all records relevant to DB_USER_1540.

Data in the header, footer and subject will also be bound. For example, if you want the subject to include the value for DB_User include “$$DB_User” like:

This is a report about user "$$DB_User"

Table Of Contents

Previous topic

Using the Gateway (External API)

Next topic

Dashboard Builder

Copyright © 2013-2016 jSonar, Inc
MongoDB is a registered trademark of MongoDB Inc. Excel is a trademark of Microsoft Inc. JSON Studio is a registered trademark of jSonar Inc. All trademarks and service marks are the property of their respective owners.