Working with SonarProfiler

MongoDB has an excellent profiling mechanism described in MongoDB Profiling. MongoDB profiling adds documents to a capped collection called system.profile. Each document in this collection takes the form of:

{
"op": "query",
"ns": "test2.companies",
"query": { "name": { "$regex": "jsonar", "$options": "i" } },
"ntoreturn": 0,
"ntoskip": 0,
"keyUpdates": 0,
"numYield": 0,
"lockStats": { "timeLockedMicros": { "r": 78560, "w": 0 },
"timeAcquiringMicros": { "r": 6, "w": 3 } },
"nreturned": 1,
"responseLength": 5916,
"millis": 78,
"ts": { "$date": "2014-01-13T21:18:25.596Z" },
"client": "127.0.0.1",
"allUsers": [],
"user": ""
}

SonarProfiler extends this built-in profiling with capabilities for long term profiling and with analysis capabilities. For example, because system.profile is a capped collection, if you monitor for many events for any lengthy period, documents will be overwritten. SonarProfiler allows you to copy documents from system.profile into another collection (usually in a different database) so that you can collect profiling data for longer periods, so that you can centralize profiling data from various databases (e.g. various shards) into one location, and so that you can run complex aggregations on this data (aggregations that may place additional burden on your main database)

SonarProfiler is a python script that connects to your database and copies documents from system.profile into another location. While copying it also enriches each document with data that is then useful when performing analysis.

This python script has only been tested and certified for Linux. It may however work on any platform that has the appropriate python packages and modules.

You can choose to run the script manually or as a daemon. A single script can pull from multiple system.profile collections in different database and always inserts into a collection in a single database. Configure these in SonarProfiler.conf, for example:

[Main]
# Database where the output goes
db_name = prof
port = 27017
host =  analytics.jsonar.com
username =
password =
auth_method = none
kerberos_service_name = mongodb
output_collection = jsonar.profile
write_concern_w =
throttle = False

[test]
#profile of source of timing data. You can specify multiple hosts in case of shards:
# host = host1, host2, ...
db_name = test
port = 27017
host = web.jsonar.com
username =
password =
auth_method = none
kerberos_service_name = mongodb
include_ns = [a-z].*
# init_as: -1 - whatever the database is doing now - don't change anything
# init_as: 1.100 - change it to 1 for slow queries over 100 ms
# init_as: 2 - change it to 2 upon startup
init_as = -1
max_per_minute = 60000

In the above example documents will be pulled out of system.profile in test running on web.jsonar.com and inserted into jsonar.profile in the prof database on analytics.jsonar.com. Only documents that have ns entries that start with a lower case letter will be inserted into jsonar.profile (see the include_ns parameter).

To use a specific config file (rather that the one in the “local directory” use the -c flag.

The default for init_as is -1 which means that it is up to you to set the profiling level within the mongodb database. If you set it to 1.<milliseconds defining what is slow> or 2 then the script will call db.setProfilingLevel. It will remember what the original level was and return it before it exits but if it is killed or crashes then it will remain at whatever level it set it to.

After setting the profiling level (or using whatever it is set to), SonarProfiler opens a tailable cursor on system.profile and using the include_ns parameter decides what to copy over. As the capped collection starts overwriting itself the data is therefore not lost.

The user(s) pulling the profiling data from system.profile should have the dbAdmin role and the user inserting to the SonarProfiler collection should have read/write role to that database.

Each document inserted into the target database takes the form of:

{
"_id": { "$oid": "52d687d3a92f4a179a1571b5" },
"lockStats": { "timeAcquiringMicros": { "r": 1970525, "w": 9 },
"timeLockedMicros": { "r": 4097810, "w": 0 } },
"allUsers": [],
"nreturned": 0,
"query": { "retweeted_status.in_reply_to_screen_name": "Marina" },
"ops": [ { "field": "retweeted_status.in_reply_to_screen_name", "seq": 0, "val": "\"Marina\"", "op": "$query" } ],
"millis": 2132,
"ts": { "$date": "2014-01-15T13:06:27.610Z" },
"source": "169.254.178.48:27017",
"profile_group": "test",
"ns": "tweets.t1",
"cmd_struct_str": "query:{\"retweeted_status.in_reply_to_screen_name\": \"$$\"}",
"cmd_str": "query:{\"retweeted_status.in_reply_to_screen_name\": \"Marina\"}",
"responseLength": 20,
"numYield": 16,
"user": "",
"millis_since_epoch": 1389791187610,
"client": "169.254.178.48",
"ntoreturn": 0,
"keyUpdates": 0,
"ntoskip": 0,
"nscanned": 33745,
"op": "query"
}

Fields added include the ops array which lists the breakdown of the sequence that comprises the command (e.g. in aggregations), cmd_str and cmd_struct_str which provides a string representation for the entire command, and millis_since_epoch that provide the ts value as milliseconds since epoch.

The SonarProfiler collection has two indexes for queries based on runtime and for queries based on collections. Queries using a prefix of these indexes will run fast:

{ns: 1, ts: -1, op: 1}
{millis: -1, ts: -1}

SonarProfiler is a single threaded program and will pull data from the source databases as fast as it can. While precise rates depend on hardware and resource allocations, a single thread can pull at an order-of-magnitude of 1000 documents per second from a system.profile cursor. If you wish to pull mode data you can run multiple SonarProfile processes and give each one a different database or, if all traffic is going into a single database, split the ns between the different scripts.

It is important to size the profiling environment since system.profile is a capped collection and therefore if you are collecting a lot of profiling data and SonarProfiler is not fast enough, system.profile will overrun itself and you will lose profiling data. When you do perform such sizing, consider the following:

  • You can change the size of the capped collection. This will help only if traffic is bursty, not if it has a sustained high rate that a single SonarProfile process cannot handle.
  • Consider using 1.50 (or some other slow value rather than 2) as your profiling level.
  • Consider using multiple SonarProfiler programs and split the name space as explained above.
  • Since SonarProfiler does a lot of writing, you can change the write concern used by the process. This will increase write throughout and may help you achieve higher throughput. Set write_concern_w=0 in the config file. This can be any string or integer value listed at http://docs.mongodb.org/manual/reference/write-concern/ i.e. -1, 0, 1, any number greater than 1 or “majority”. By default, it uses mongodb’s current default (which is “1”).
  • If you want SonarProfile to use less cpu and are using profiling for long-term sampling, set throttle=true in the Main section of the config file. Instead of writing every entry it receives, SonarProfiler writes one single batch of 100 entries every second. Each batch only contains the first 100 entries. Remaining entries are dropped.

The default config file has a regex expression (as a comment) causing SonarProfiler to drop any operation done on any system collection and any lmrm__ collection; this is a recommended filter. Additionally, the following are always silently dropped no matter what the filter:

  • If a profile is reading from the same database as the output (specified in the Main section), it will ignore entries that originated from output_collection.
  • The profiler will silently ignore data from system.indexes.

Pipelines:

In addition to the profiler script, JSON Studio comes with a few sample pipelines for analyzing profile information. These are pipelines that can be loaded into the Aggregation Builder and used for producing reports and graphs. The pipelines usually have placeholders which you can use to customize your analysis - such as which collections to look at in terms of runtime information and which timeframes to look at.

The pipelines are provided in the sonarprofiler/Pipelines directory in your installation and can be used both to analyze your data as well as starting points for producing new pipelines specific to your needs. For example, most of the pipelines only compute the average response time. You may also require the maximal response time – in which case clone the provided pipeline and add an element to the appropriate pipeline stage.

Slow avg by day time:

This pipeline aggregates by collection, operation, day and hour and provides the average operation time. It is useful for discovering days and times when operations are running longer on average. A resulting document in the output collection looks like:

{
  "_id": {
    "op": "update",
    "ns": "tweets.t1",
    "day": 4,
    "hour": 13
  },
  "_avg_millis": 1163.6666666666667
}

This pipeline can be applied to both system.profile and to the collection produced by the SonarProfiler. You can customize the first step in the pipeline to filter only specific operations and collections using the regex expressions (by default the regex is empty and no filtering is performed).

To use this pipeline import slow_avg_by_day_time.txt included in the sonarprofiler/Pipelines directory in your installation using the Aggregation Builder.

Import chart_slow_avg_by_day_time.txt in the Visualizer to plot this data as a scatter plot. The size of the bubbles are the average run times.

Slow avg by day time per query:

This pipeline aggregates by collection, query, day and hour and provides the average operation time. It is useful for discovering days and times when operations are running longer on average. The difference between this pipeline and the previous pipeline is that it is more granular and combines data from the same query. For example, if the system has 100 finds with 2 fields and 200 finds with 3 fields then the previous pipeline will aggregate data from all 300 invocations together while this pipeline will keep them separate. A resulting document in the output collection looks like:

{
  "_id": {
    "ns": "tweets.t1",
    "day": 4,
    "hour": 13,
    "op": "query:{\"retweeted_status.lang\": \"$$\"}"
  },
  "_avg_millis": 40614.625
}

Note that it does not matter whether the find used “retweeted_status.lang”: “en” or “retweeted_status.lang”: “it”.

To use this pipeline import slow_avg_by_day_time_and_query.txt included in the sonarprofiler/Pipelines directory in your installation using the Aggregation Builder. Import chart_slow_avg_by_day_time.txt in the Visualizer to plot this data as a scatter plot. The size of the bubbles are the average runtimes.

Slow avg by time range, op and collection:

This pipeline first filters by a time range, an operation and collection names (last two defined using a regex), then groups by operation and collection and then sorts descending by the average runtime. A resulting document has the form:

{
  "_id": {
    "op": "update",
    "ns": "tweets.t1"
  },
  "count": 3,
  "_avg_millis": 1163.6666666666667,
  "_avg_responseLength": 0.0
}

This pipeline can be applied to both system.profile and to the collection produced by the SonarProfiler. You can customize the first step in the pipeline to filter only specific operations and collections using the regex expressions (by default the regex is empty and no filtering is performed). You must customize the timeframes or you will always get data for 1/1/2014 - 12/31/2014.

To use this pipeline import slow_avg_by_time_range_and_col.txt included in the sonarprofiler/Pipelines directory in your installation using the Aggregation Builder.

Import op_ns_treemap.txt in the Visualizer to plot this data as a treemap.

Slow avg by time range, query and collection:

This pipeline first filters by a time range, a query and collection names (last two defined using a regex), then groups by query and collection and then sorts descending by the average runtime.

A resulting document has the form:

{
  "_id": {
    "op": "query:{\"retweeted_status.in_reply_to_screen_name\": \"$$\"}",
    "ns": "tweets.GEICO"
  },
  "count": 2,
  "_avg_millis": 1096.0,
  "_avg_responseLength": 20.0
}

This pipeline can only be used on the collection produced by the SonarProfiler. Note that it does not matter whether the query used “retweeted_status.lang”: “en” or “retweeted_status.lang”: “it”.

You can customize the first step in the pipeline to filter only specific queries and collections using the regex expressions (by default the regex is empty and no filtering is performed). You must customize the timeframes or you will always get data for 1/1/2014 - 12/31/2014.

To use this pipeline import slow_avg_by_time_range_query_and_col.txt included in the sonarprofiler/Pipelines directory in your installation using the Aggregation Builder.

Import op_ns_treemap.txt in the Visualizer to plot this data as a treemap. Depending on how your data looks, it may be beneficial to reverse the role in the treemap - i.e. set the child to _id.ns and the parent to _id.op. Which chart will be more valuable depends on your data and how many collections there are vs. how many queries.

Table Of Contents

Previous topic

Working with the Run-time Viewer

Next topic

JSON Studio configuration options

Copyright © 2013-2016 jSonar, Inc
MongoDB is a registered trademark of MongoDB Inc. Excel is a trademark of Microsoft Inc. JSON Studio is a registered trademark of jSonar Inc. All trademarks and service marks are the property of their respective owners.