Sonar Resource Manager

SonarResourceManager (SRM) is a utility that monitors all commands issued per a set of Mongo databases and enforces quotas in terms of runtime and concurrent operations. It can be used to ensure that certain clients do not issue operations that take too long to complete, to ensure that certain clients do not issue too many operations etc. Its primary purpose in the context of the Studio is to ensure that users using the Studio do not overwhelm the database with many and/or slow ad-hoc queries. The SRM however can also be used as a stand-alone utility to enforce quota policies for any database regardless of who the clients are.

SRM can be configured to enforce constraints on any number of databases. It can run on any machine so long as it is able to connect to the databases it needs to control. SRM operates by using the currentOp and killOp commands - i.e. it monitors what is going on using currentOp calls and when it decides that a quota has been exceeded it uses killOp. The accounts used by SRM to connect to the database need to have dbAdmin and clusterAdmin privileges if running a MongoDB version prior to 2.6; starting with 2.6 you can use custom access control to configure an appropriate role. The conf section must specify the admin database since these operations are instance-wide and not per-database. (The Log section can be any database to which there is write-permissions - this is where the report entries get logged).

SRM is configured using a set of rules that match certain connections and determine what to enforce on them. These rules are defined in a JSON configuration file called SonarRM.json by default. This is in addition to the configuration of the data source properties so that a run of SRM looks like:

./sonarrm.py -c ./SonarResourceManager.conf -r SonarRM.json

There are three categories of rules - fallthrough rules, enforcement rules and white lists. White lists are used to omit certain clients and connections from any quota limits - e.g. you would list all your application servers in the white lists to ensure that whatever happens and whatever rules you add (even if you make mistakes), no operation coming from the application servers will ever be killed. Enforcement rules define a set of match criteria on connection attributes, a set of actions and a set of constraints. A rule fires when all the attributes match and the listed constraints are exceeded and then the actions are executed. Fallthrough rules allow you to set up catch-all scenarios. Fallthrough rules are evaluated only when a connection was not handled through a white-list or an enforcement rule.

From a JSON perspective all three rule types can include the same fields apart from white list rules that do not have an action and do not have constraints.

Whitelists are specified using the whitelist array as follows:

"whitelist" : [
   {
   "client" : [ "admins" ],
   "collection" : [ "ALL" ] ,
   "operation" : [ "ALL" ] ,
   "database" : [ "staging" , "real" ],
   "instance" : [ "production" , "demo" ];
   },
   ...
]

Clients are specified using groups of hostnames. For example, admins described above is a name of a group of hosts, specified in the JSON configuration file as follows:

"admins" : [
   "admin1.jsonar.com",
   "admin2.jsonar.com",
   "ec2-44-207-112-213.compute-1.amazonaws.com",
   "192.168.0.1/24",
   "192.168.5.5/255.255.255.0"
],

The collection array is an array of strings or regular expressions used to match the collection of the operation. The database array matches the Mongodb database name. The instance array matches the section entries in the conf file and allows you to specify which instance to act upon.

In all rule fields such as collection, database and others, you can use strings or regular expressions - e.g. use “col1” to match one collection and {$regex: “col[4-9]”} or {$regex: “col[4-9]”, $options: “i”} to match all collections such as col4, col5, etc.

To ensure that an app server will never be touched use “ALL” for collections, databases etc.

The operation array lists which operations should always be allowed. Use “ALL” to specify that the client should be allowed to do anything. These operations can be any of the values that appear in currentOp and a few breakdowns for allowing more granular control (e.g. distinct can be separate from query):

'ALL'
'distinct'
'group'
'orderby'
'$eval'
'aggregate'
'mapreduce'
'copydb'
'clonecollection'
'cloneCollectionAsCapped'
'search'
'insert'
'query'
'update'
'remove'
'getmore'
'command'

When you specify operations in any of the rule types you can use an array of these strings or you can use a name of an operation group. As an example, rather than using:

operation: ["aggregate", "mapreduce", "group"]

embedded inside a rule you can specify an operation group:

"operations" : {
   "blocked" : [ "aggregate" , "distinct" , "insert"] ,
   "analytics" : [ "aggregate", "mapreduce", "group" ]
}

and then within the rule just write:

operation: "analytics",

The database array specifies datasource names defined in SonarResourceManager.conf.

Firing rules are specified using the rules array as follows:

"rules" : [
   {
   "client" : [ "finder_users" ],
   "collection" : [ "customers" , {$regex: "inventory[1-6]"} ] ,
   "operation" : [ "query" , "delete" ] ,
   "database" : [ "production" , "demo" ] ,
   "appserver_hostname": [ "user-laptop" ], # fine-grained rules - see below
   "db_username": [ "user1" ], # fine-grained rules - see below
   "whatmyuri": ["192.168.56.23"], # fine-grained rules - see below
   "endpoint_ip": ["192.168.56.111"], # fine-grained rules - see below
   "time" : [ "working_hours", "sunday_downtime" ] ,
   "constraints" : {
      "max_run_time" : 2,
      "max_concurrent" : 5
      },
   "action" : "terminate"
   },
   ...
]

In addition to the fields already mentioned for white list rules, firing rules can have a time field that is an array of periods. Each period specification defines a range of weekdays and a range of hours. For example, the “working hours” time period is defined below as Monday through Friday between 9am and 5pm:

"working_hours" : {
   "day" : [ "Mon", "Fri" ] ,
   "hours" : [ 9,17 ]
}

Hours are numbers; if you want to specify 4:30 pm use 16.30. The time is interpreted based on the time zone of the machine on which SRM is running. Specify a single day using:

"day" : [ "Sun"  ]

Constraints can limit how long an operation is allowed to run (specified in seconds) or how many concurrent operations that client is allowed to run at any point in time. If the number of concurrent operations exceed the quota the last one(s) are killed.

Actions include “terminate” (kill) or “report”. Report logs the details of the exceeded quota in a collection. It is recommended that you run your rules in reporting mode at first to ensure you are getting the desired behavior. Any termination also logs a report.

When an operation is interrupted using killOp you will get an exception of the following type:

error: { "$err" : "operation was interrupted", "code" : 11601 }

Fallthrough rules are used when an observed connection does not match any enforcement or white list rule. Fallthrough rules look like:

"fallthrough" : [
{
   "collection" :  "ALL",
   "operation" : "ALL" ,
   "database" :  "test",
   "constraints" : {
         "max_run_time" : 2,
         "max_concurrent" : 1
   },
   "action"  : "report"
}]

Fallthrough rules are useful when you want catch-all conditions - e.g. when you want to report on anything without affecting termination conditions or when you can’t think of all possible conditions and want to specify what is allowed for specific connections and take a default action for any other connection type.

SRM_Log

All actions taken by SRM are logged into the database specified in the log section of the config file. A log entry is written per action that is taken regardless of whether it was a report or a terminate. A log entry includes information explaining why the rule fired, which rule section the firing rule belogs to, etc.:

{
    "_id" : ObjectId("531a7eb939466f211799583c"),
    "reasons" : [
        [
            "Database test is in database list [u'test']",
            "Rule matches all collections",
            "Client 127.0.0.1:49388 (127.0.0.1) matched ip/subnet 127.0.0.1/255.255.255.255",
            "Time 18:21 is between 9:30 and 18:59 and day Fri is between Mon and Fri",
            "Operation query  is not allowed (ALL operation selected)",
            "Query running for 3 seconds, more than 2 seconds allowed"
        ]
    ],
    "action" : "terminate",
    "shard" : false,
    "ruleset": "rules",
    "op" : {
        "numYields" : 3,
        "lockStats" : {
            "timeAcquiringMicros" : {
                "r" : NumberLong(3419136),
                "w" : NumberLong(0)
            },
            "timeLockedMicros" : {
                "r" : NumberLong(6837819),
                "w" : NumberLong(0)
            }
        },
        "waitingForLock" : false,
        "desc" : "conn103",
        "connectionId" : 103,
        "locks" : {
            "^test" : "R",
            "^" : "r"
        },
        "client" : "127.0.0.1:49388",
        "threadId" : "0x7f4ff1d00700",
        "active" : true,
        "query" : {
            "$where" : "sleep(200)"
        },
        "opid" : 2049,
        "ns" : "test.foo",
        "secs_running" : 3,
        "op" : "query"
    }
}

Fine-grained Rules and Instrumentation

In addition to the client data contained in currentOp documents, SRM allows you to base your rules on fine-grained access data. An example is in an application server using a functional account - you may want to base your rules on the user signed into the application itself rather than the fact that the query is coming from the application server. Because the MongoDB clients do not send this data on their own you need to instrument your application to send this data.

The fields available to you in the rules to perform such fine-grained resource management are:

"appserver_hostname": [ "user-laptop" ],
"db_username": [ "user1" ],
"whatmyuri": ["app_servers"],
"endpoint_ip": ["finder_users"],
"app_name": ["tools"]

This data is not in currentOp by default. You need to use the $comment feature of MongoDB to pass this information along with the queries and operations. This is available in each one of the drivers. For example, to add this data to a find using the Java driver, add the following call using the cursor you get back from the find call:

dbCur.addSpecial("$comment", <instrumentation collection data>);

The fields to use for the instrumentation data are db_username, appserver_hostname, endpoint_ip and whatsmyuri. When you add this instrumentation and look at currentOp a query will look like:

"query" : {
            "$comment" : {
               "db_username" : "user1",
               "appserver_hostname" : "qa3.jsonar.com",
               "endpoint_ip" : "169.254.178.20",
               "app_name" : "jsonstudio",
               "whatmyuri" : "169.254.178.48:63151"
            },
            "$query" : {
               "user.id_str" : {
                  "$gt" : "724173240"
               },
               "user.friends_count" : {
                  "$gt" : 65
               },
               "user.profile_link_color" : {
                  "$gt" : "A5AFB3"
               },
               "user.profile_text_color" : {
                  "$gt" : "333333"
               },
               "user.id" : {
                  "$gt" : 724173240
               },
               "user.favourites_count" : {
                  "$gt" : 596
               },
               "user.protected" : {
                  "$gt" : false
               }
            }
         },

Whatsmyuri should be the string that you would get as a value for the “you” field when running the “whatsmyuri” command. Consult your driver documentation on how to run this command and how to add the $comment information.

JSON Studio does send this additional instrumentation data by default so that you can configure fine-grained rules even when using a shared Studio for many users.

Note: If you use Credentials Mapping then the username that is logged by JSON Studio is the real user name used to login to JSON Studio and not the functional ID used to login to the database - thus providing more granular control in the SRM.

Test Mode

Because rule sets are notoriously hard to build and test, SRM can be run in “test mode”. For example, if you run the following:

 ./sonarRM.py -c ./SonarResourceManager.conf -r SonarRM.json  -t
'testdb, coll1, query, 192.168.0.1 , {"age":33} , 0, 15, 30 , 17'

The text string is in the following format:

'instance name, DB name, collection name, operation,
 client_ip, query ,day , hour ,minutes ,seconds running'

SRM will run and just tell you what it would have done. In this case SonarResourceManager.conf is the configuration file holding the database connection information and SonarRM.json the configuration file with the rules.

Specify the test event in a string in the following format: ‘DB name, collection name, operation, client_ip, query ,day , hour ,minutes ,seconds running’.

You should also run the SRM in report mode for a period of time before you run in terminate mode.

Sharding

SRM supports sharded environments. To configure SRM for sharding, SRM needs to connect only to the mongos instances; the SRM will discover the mongod instances from the mongos instances and will discover what needs to be killed on the mongod instances automatically. Note that in sharded environments MongoDB does not populate client information. Therefore, if you want to do fine-grained control (see above) you need to instrument your applications/tools and the instrumentation only carries over in queries.

Limitations and Compensating Controls

There are a few limitations when running in MongoDB versions prior to 2.6 and especially when using sharding. Most of these are remedied or will be remedied over time with new releases of MongoDB and new releases of MongoDB drivers. This section details the limitations and suggested compensating controls.

  • Instrumentation limitations - Instrumentation can only be added to queries. Operations such as aggregation, distinct and count (computing the size of a cursor’s result set for example) cannot be instrumented prior to version 2.6. In general use rules that do not use instrumentation for these operations (e.g. remote IP address). When running within the studio use preferences to not compute result set size for large collections and not running distincts for large collections.
  • Sharding - Sharded environments do not log the IP from which the connection is made. You therefore need to rely on instrumentation for queries and use the already mentioned compensating controls for other operations.
  • Failsafe - some operations will never be evaluated by the SRM and you can instrument your code to ensure that in any environment (sharded or not) and any version of MongoDB, the SRM will never act upon your application. If the term lmrm__SRM_ignore is found anywhere in your operations then the SRM will ignore it. For example, you can add a section such as {“lmrm__SRM_ignore”: {$exists: false}} to any query, aggregation pipeline, distinct etc. and then write rules that handle all other operations with time limits. The exists clause is just an example since it will not affect any outcome - but you can place this string anywhere as long as it is part of the data shows in the query field of currentOp.

Table Of Contents

Previous topic

Resource Management and Indexes

Next topic

Working with differs

Copyright © 2013-2016 jSonar, Inc
MongoDB is a registered trademark of MongoDB Inc. Excel is a trademark of Microsoft Inc. JSON Studio is a registered trademark of jSonar Inc. All trademarks and service marks are the property of their respective owners.