Working with the Aggregation Builder

Use the Aggregation Builder application to slice-and-dice information stored in JSON documents. The Aggregation Builder lets you aggregate data, group data, filter data, sort data, and more. Under the covers the Aggregation Builder uses MongoDB’s aggregation framework but rather than asking you to write JSON code it gives you a visual interface by which you can do your aggregations. If you come from the SQL world the Aggregation Builder lets you do operations involving GROUP BY, HAVING, ORDER BY and more – all on any document and without writing code.

The two main entities you will work with in the Aggregation Builder are pipelines and stages. A stage is an operation performed on a collection of documents that yields another collection of documents. Different stages perform different operations – some group data together, some filter data out, some project only certain fields per document and some sort the documents. A pipeline is a collection of stages that are invoked one after the other to transform an original collection of documents to a resulting collection of documents. You use the AnAggregationalytics Builder to build stages and pipelines and see what the results of each stage and of the entire pipeline are.

The Aggregation Builder has four panes as shown below. Use pane #1 to work on the pipeline. In the image below, the pipeline is made up of three stages – the flatten stage, the group stage and the sort stage. When you work on a pipeline you will be working on one stage or another - that’s where panes 2,3, and 4 come in. When you want to work on a pipeline stage you edit the stage you want to work on by clicking the pencil icon under the relevant stage. The stage details are then opened in Pane #3 which is the stage editor. As you are building the stage you will want to test it. Each such stage transforms the collection shown by Pane #2 into another collection that is shown in Pane #4.

_images/dicer1.jpg

When you first open the Aggregation Builder all panes will be empty. If you have already created some pipelines and saved them then they will appear in the Pipeline drop-down (pipelines are stored per user in an lmrm__ collection in the Studio database or in your main database if you did not specify a Studio database). If you want to create a new pipeline enter a new pipeline name into the drop-down/edit box and hit enter. A new pipeline is created with no stages. Next, click the “+” button to add a new stage. You need to specify the type of stage and give it a name. The stage will then be added to the pipeline.

_images/dicer2.jpg

Stage types and when to use them:

  • Group: Use group stages when you want to aggregate multiple documents from the source collection to a single document in the target collection and apply some aggregation function to the documents being aggregated to the one target document (e.g. count, sum, etc).
  • Filter: Use filter stages to decide which documents from the source collection should make it to the target collection. This stage is also sometimes called a match stage. Note that as of release 1.2 finder/match phases support subqueries.
  • Sort: Use sort stages to determine the order that the documents should appear in the target collection. The sort stage also allows you to specify skip and limit values - e.g. to skip the first 1000 documents in the source collection or limit the number of documents to a certain number.
  • Fields: Use fields stages to control how fields from the source documents should make it into documents in the target collection. You can for example determine that only some fields from the source are required in the target or you may want to rename the fields as the target documents are formed from the source documents. This is also sometimes called a projection stage. Additionally, you can unwind an array – i.e. create multiple target documents from a single source document that has an array-type field by making one target document per value in the array. This flattening is sometimes called unwinding.
  • Redact: Supported only in MongoDB 2.6 and up, this stage allows you to specify a redaction operation on source documents.
  • Custom: Use a custom stage to write a JSON object as you would if you were writing code using the MongoDB aggregate command and include it in this pipeline.
  • Out: Supported only in MongoDB 2.6 and up, use this stage to write the results into a collection of your choosing rather than returning a result.

The data grid in Pane #1 displays the entire pipeline. If there are too many stages to fit on a single page you can cursor through the pipeline. The order of the stages matters since each stage of the pipeline operate on the result of the previous stage. You can re-order the stages by dragging a stage and dropping it before or after another stage. As you start dragging a stage, blue drop-slots appear to show you where you can drop the stage as shown below.

_images/dicer3.jpg

Other buttons available at a pipeline level in the top toolbar of Pane #1 are:

  • Save a pipeline
  • Run an entire pipeline
  • View the pipeline code generated by the Aggregation Builder
  • Delete the pipeline
  • Clone a pipeline

Use the buttons underneath each stage in the pipeline data grid to:

  • Delete the stage.
  • Copy a stage (you can then paste the copy using the paste button to the right of the “+” add stage button.
  • View the stage’s code as generated by the Aggregation Builder.
  • Edit this stage (opens the stage in edit mode in Pane #3).
  • Execute this stage and only this stage.
  • Execute all stages up to and including this one.

You can also click directly on the stage name if you want to edit/change it. Whenever you execute a pipeline the run-time is displayed on the lower-right pane and the Run-time Viewer logs the run-time. Use the clock icon to open the Run-time Viewer.

When you are working on a stage you execute it on a collection. When you work on the first stage you specify a starting collection by selecting it from the Starting Collection drop down in Pane #1. This opens up the collection viewer in Pane #2. The operations of this collection viewer is almost identical to the operation of the collection viewer in the Finder. The only difference between the two collection viewers is the operation that happens upon you double-clicking on an item. When you double-click on a JSON element, what happens depends on the stage you are currently editing. In almost all cases that element is added to the definitions of the stage. For example, if you are editing a group stage the entity is either added to define what documents are grouped together or added as an aggregation element.

Most stage editors have a data table layout with some common components as shown below.

_images/dicer4.jpg

Use the “+” button to add an entry to the table manually (entries are usually added when you double click on the collection viewer of Pane #2 but the manual option is always available). Use the “x” button to delete the line. Use the pencil button to edit the entire line of the table and customize it or modify it. For example, when you double click on an element from the collection viewer and the Aggregation Builder adds a line to the table, it creates a name for the new entity automatically. However, you can edit it and modify it to whatever you want it to be.

Let’s look at an example. Suppose that you are editing a group stage shown below and want to add an average of the populations of all the documents being grouped together.

_images/dicer5.jpg

Click on Aggregate values and select the AVG operator. Double click on a field in the collection viewer and a new entry is added to your stage as shown below.

_images/dicer6.jpg

The Aggregation Builder called this new entry _avg_pop. If you do not like this name click the pencil button and change the name as shown below. You can change anything you want for that line - not just the field name.

_images/dicer7.jpg

Additional specific comments for some editor types:

  • Group: Add items either to grouping fields or aggregation values. If you want to use aggregation functions not represented by a radio box edit the text directly.
  • Filter: Add filter expressions. If you want to use a function that is not represented by a radio box edit the text directly.
  • Sort: Add items to order by. You can add more than one item; the first will be the primary sort field and the next will sort within. You can sort ascending or descending. Skip and limit are numeric.
  • Fields: Specify fields to project. You can use dot notation to create sub-documents in the target collection (see an example below)
_images/dicer8.jpg

When you run a stage (or run all stages until a certain point) the results are shown in the collection viewer of Pane #4 (on the right-hand-side). For the most part this collection viewer behaves like any other collection viewer with a few notable exceptions:

  • When you do not specify a $out stage the cursor will not show how many documents in total there are. The cursor running on the aggregation results does not support this.
  • Double-click on the JSON area in the right-hand-side collection viewer (Pane #4) also adds an entry to the stage editor. This is convenient when you are already working on the next stage.
  • The aggregation runtime is shown by the collection viewer. Note that this is the time spent in the database and is sometimes much shorter than elapsed time (especially when an $out stage is not used) since the elapsed time also includes the time spend materializing the documents by the Studio and saving resulting documents to a temporary collection.
  • If you use an $out stage (supported in MongoDB 2.6 and up) then the results are directly inserted by MongoDB into a collection and the collection viewer simply opens a cursor to that collection.
  • While you can always specify a $limit stage as part of the pipeline, for convenience, you can set a limit on the GUI itself. This limits the number of documents that will be materialized into the collection that is shown in the right-hand-side collection viewer. This limit is introduced in all pipelines so long as the last stage is not a $out stage and when the explicit limit is set to any value other than -1. The default value is settable in a Preference setting.

Subqueries

Subqueries are supported as describe in Subqueries .Additionally, the aggregation builder supports a generic value replacement mechanism using subqueries. This mechanism allows you to run an arbitrary find query or pipeline aggregation query, retrieve a result from the resulting document(s) and use it within your query. You can use the syntax in any stage - a $match, a $project and more. The subquery runs first to produce a value and then the aggregation pipeline constructed. For example:

s: {$lt: {$getSubQValue: {$ns: "other_collection", $q: {a: 1}, $p:
"field_name"}}}

or

s: {$lt: {$getSubQValue: {ns: "other_collection", pipeline: "a_saved_pipeline",
p: "field_name"}}}

Formatters

When you produce a report as the output of the aggregation builder, you have the option of adding formatters to the report. Formatters provide visual cues to columns (mostly to numbers). They work on the report whether you are viewing the report within the aggregation builder, as a Gateway call, or even within dashboards. Formatter definitions are part of the aggregation metadata and saved in the studio database, part of export files, etc.

A formatter definition is a JSON document that you enter by clicking on the Table Formatters button in the aggregation builder and editing the JSON. The data entered must be valid JSON or the formatters may be totally ignored and report results omitted (e.g. remember to include double quotes for key values on the left-hand-side of the document). The JSON document comprises of keys which are string representation of the column positions (e.g. “3” for the fourth column of “19” for the twentieth column) and values which specify the formatter to be used. Column positions start with 0. For example, to add a bar formatter to the sixth column use a JSON document:

{ "5": {"BarFormat": {}}}

To add an arrow formatter to the fourth column where any number under 30 will have a down arrow and any number over 30 will have an up arrow as well as a bar formatter to the sixth column use:

{"3": {"ArrowFormat" : {"base": 30}}, "5": {"BarFormat": {}}}

Supported formatters:

  • ArrowFormat - Adds a green up arrow or a red down arrow to numeric fields based on a base number or zero if omitted.

  • BarFormat - Adds a bar chart to numeric fields which a bar. You have control over ranges numbers, base and colors. For example, to add a bar with numbers over 1000 shown as positive/blue and numbers under 1000 shown as negative/red with a range between -5 and 5000 use:

    "6": {"BarFormat": {"base": 1000, "min": -5, "max": 5000}}
    
  • NumberFormat - Allows you to add a prefix, coloring and brackets to numeric fields most commonly used for currency displays. For example to format a number as a dollar quantity and display negative numbers in red with brackets, use:

    "16": {"NumberFormat": {"prefix": "$", "negativeColor": "red", "negativeParens":
    true}}
    
  • PatternFormat - Allows you to format the way a column is displayed by using a template and place holders for field values. This can be used for producing formatted text, hyperlinks, and more. For example, to use the value of column 3 as the target of a hyperlink and a display as what is shown in column 4 use (note that you should not embed single quotes anywhere in the double-quoted string as that will break the parsing of the page):

    {"7": { "PatternFormat": {"pattern": "<a href=http://jsonar.com/{0}>{1}</a>",
    "array": [3,4]  } }  }
    
  • ColorFormat - Allows you to add foreground and background colors to numeric fields based on ranges. Note that ranges must be nonzero in size - for example, even if you have only two values such as 0 and 1 you cannot have a range of [0,0] and [1,1] but rather use [-1,0] and [1,2]. For example to create two ranges where numbers between 0 and 1000 will have red text on a blue background and numbers between 1001 and 10000 will have green text on a yellow background use:

    {"6": {"ColorFormat": [[0, 1000, "red", "blue"], [1001, 10000, "green", "yellow"  ]
    ] }
    

For more information on the JSON subdocuments per formatter type described above see https://developers.google.com/chart/interactive/docs/reference#formatters

Note that DateFormatter is not currently supported; manage date formatting using $project and $dateToString within the pipeline.

In addition to cell formatters, additional formatters provide controls that allow you to highlight embedded/hierarchical structures. These formatters allow you to omit values that repeat in order to highlight when a new category begins and to add a blank line when the main value changes:

  • omitDepth - Specify a column index (starting at 0) - all values up to that column (inclusive) that are the same as the previous row will be made blank. E.g.:

    {"omitDepth": 3}
    
  • spaceAfter - Specify a column index (starting at 0) - if the value for that column is different than the value in that column in the previous row then an empty separator line will be added prior to the row. E.g.:

    {"spaceAfter": 1}
    

IMPORTANT KNOWN ISSUES:

1. Your formatter string must be on a single line or the report will not display.

  1. Do not include single quotes in your formatter strings.

3. When using URLs for patterns (e.g. for a drill down) you must encode double quotes as %22. For example, use:

{"0": {"BarFormat": {}}, "1": { "PatternFormat": {"pattern": "<a
href=/Gateway?name=fl2&col=flights&type=agg&output=report&bind.car=%22{0}%22&limit=100>{0}</a>",
"array": [1]  } }  }

instead of

{"0": {"BarFormat": {}}, "1": { "PatternFormat": {"pattern": "<a
href=/Gateway?name=fl2&col=flights&type=agg&output=report&bind.car=\"{0}\"&limit=100>{0}</a>",
"array": [1]  } }  }
  1. A BarFormat will display a red negative bar in empty lines.

Using Pattern Formatters for Implementing Report Drill-Down in Dashboards

Using pattern formatters you can implement header-details drill-down functionality where clicking on one line of a header report runs a detailed report in another frame in the dashboard.

To implement header-details drill-down:

1. Build the aggregation pipeline for the details query. Typically the query will get one or more values through bind variables when you click on a line in the header report.

2. Compute the URL in the Publish URL window and copy/paste the non-obfuscated URL (you will use this in the PatternFormat). Copy only the URL starting with /Gateway and without the user or database credentials since the drill-down is performed within the dashboard and does not need to re-authenticate.

  1. Build the aggregation pipeline for the header query.

4. Enter a PatternFormat for one of the columns in the formatter string using the URL you copy/pasted in step 2. Enter the values for the bind variables using the PatternFormat placeholders. Use a target to specify the appropriate frame name from the dashboard.

Let’s look at an example using a flight data collection. In this example we have a dashboard with two reports - a header/details report. The header report lists how many flights exist in the corpus per carrier. The second report show all flights for the selected carrier:

1. Create a report with a bind variable for carrier that displays flight data for that bind variable. The pipeline might look like:

{
  $match: {
    $and: [
      {
        "UNIQUE_CARRIER": "$$car"
        }
      ]
    }
  }
  1. Click on “Publish URL” and then on “Compute URL”. The initial URL looks like:

    https://localhost:8443/Gateway?name=fl2&col=flights&type=agg&output=report&bind.car=&limit=100&findlimit=100&graphPoints=100&prettyPrint=1&username=qa&host=127.0.0.1&port=47017&db=test&sdb=test

3. Of this URL keep only (assuming our pipeline is called fl2; copy paste and save it for step 5):

/Gateway?name=fl2&col=flights&type=agg&output=report&bind.car=&limit=1000&keepSession=1

Note: Adding keepSession=1 is required if you want the report to include a “Get More” button when there are more results than the value you specify in the limit.

4. Create a report that groups by carrier and counts how many flights there are per carrier. The pipeline might look like:

{
  $group: {
    _id: "$CARRIER",
      count: {
      $sum: 1
      }
    }
  },
{
  "$project": {
    CARRIER: "$_id.CARRIER",
      "count": 1,
      "_id": 0
    }
  }

5. Create a table formatter that will surround the carrier name with a hyperlink. Use the URL we kept in step #3 and enter {0} to denote the value we pass into the bind variable. In our dashboard frame0 will be the name of the iframe containing the header report and frame1 will be the name of the iframe containing the details report. For example:

{"0": {"BarFormat": {}}, "1": { "PatternFormat": {"pattern": "<a
href=/Gateway?name=fl2&col=flights&type=agg&output=report&bind.car=%22{0}%22&limit=1000
target=frame1>{0}</a>", "array": [1]  } }  }

When you now open the dashboard and click on a carrier the details report (in frame1) will show all flights per that carrier.

Here is a full example with multiple pattern formatters, an arrow and two bars. Note that the pattern formatter changes element {0} hence we added a duplicate column and hid it:

{"0": {"PatternFormat": {"pattern": "<a href=/Gateway?name=__lmrm_predef_outlier_sessions&col=session&type=agg&output=report&published_by=lmrm__ae&bind.dbuser=%22{0}%22&bind.datastore=%22{1}%22&limit=10000 target=_blank>{0}</a>", "array": [0,1]}},
"1": {"PatternFormat": {"pattern": "<a href=/Gateway?name=__lmrm_predef_outlier_raw1&col=instance&type=agg&output=report&published_by=lmrm__ae&bind.datastore=%22{1}%22&bind.dbuser=%22{0}%22&limit=10000 target=_blank>{1}</a>", "array": [12,1]}},
"5": {"BarFormat":{}},
"6": {"BarFormat": {}},
"8": {"ArrowFormat": {"Base": 0}}}

Using Pattern Formatters for Implementing Drill-Down that Change a Bind Variable in Dashboards

When you have bind variables that control reports and charts output, you may want a link on a report to modify the value of a bind variable. This is similar to a drill down except you are not opening a new dashboard / frame. Instead, you are modifying the value of a bind variable on which all other reports and charts depend and causing a screen refresh. You do this using the Gateway requesy type dashbind, i.e. by setting the href as below, and causing a reload of the entire page:

{"0": {"PatternFormat": {"pattern": "<a onClick=window.top.location.reload();
href=/Gateway?name=q&col=q&type=dashbind&bind.collector=%22{0}%22&>{0}</a>",
"array": [0] }}}

Formatters in Offline Reports

When JSON Studio is used in the context of SonarG, offline reports may sometimes be needed rather than Google viz reports (the default). In this case existing formatters cannot be used as these are Google viz features.

When using offline reports you can control formats using CSS. Your reports need to have an additional field in the output that generated a field called rowStyleClass. This value (a string) is then used as a CSS class. There are four built in classes you can use - redback, greenback and yellowback which create a background color and separateline which creates a bold divider between the line and the one below (and is equivalent to the spaceAfter used in online reports). You can add any CSS definition using the customer.css file and you can reference one or more CSS directive using projections for the rowStyleClass field. Using this, you can customize the styles in any offline report.

For example, suppose you have a report that has a field called “% CPU Sniffer” and you want the row to be colored red when this number is over 50 or green if less. Add a projection stage and create a field, and rows will be colored accordingly:

rowStyleClass: {$cond: [ {$gte: ["$% CPU Sniffer", 50]}, "redback", "greenback"   ]}

where redback and greenback are defined in sonar.css to be:

.greenback {
   background-color: #86C67C !important;
   color: #000000 !important;
}

 .redback {
   background-color: #FA5858 !important;
   color: #000000 !important;
}

Note: Offline reports are limited to 250 fields per line.

Table Of Contents

Previous topic

Working with the Finder

Next topic

Working with the Spreadsheet Bridge

Copyright © 2013-2016 jSonar, Inc
MongoDB is a registered trademark of MongoDB Inc. Excel is a trademark of Microsoft Inc. JSON Studio is a registered trademark of jSonar Inc. All trademarks and service marks are the property of their respective owners.