Using the Gateway (External API)

Once you build queries and aggregation pipelines, you may wish to open them up for others to use. JSON Studio includes an external gateway through which users can invoke a saved query/pipeline and get the results. This external gateway creates a Web service for each such query/pipeline and is invoked using HTTP.

The Gateway allows you to invoke a saved query or a saved pipeline using an HTTP call. You can pass in values for bind variables and you can get the results back in json format, as a CSV, or you may choose to insert or upsert the results into a collection.

API calls are made over HTTPS (always encrypted since you need to supply credentials for the system to authenticate with the database). A call provides a set of parameters as name-value pairs and can be made as a GET or a POST request. You can use your browser, curl or any application that can access a URL.

Results are either returned as the response to the call (over HTTPS) or saved to a collection of your choice. If you choose to receive the results on the response itself, they can be returned as an array of JSON documents or as CSV. When choosing a CSV it is the first document of the result set that determines which columns are created. If you choose a response over HTTPS, the size of the response is limited by a preference value (more on that later).

Two examples of an invocation are shown below - one as a browser URL and one using curl:


curl --ssl -k https://localhost:8443/Gateway -d name=two+phase
   -d col=world_bank_mod1 -d host=localhost -d db=tweets -d sdb=lmrm -d username=user1
   -d pwd=fbfj438h -d type=agg -d bind.sec="Technology" -d output=json

In the first example we are connecting to the database tweets on localhost using the user user1. We are also connecting to the database lmrm in the same instance. From lmrm we load the saved query called g3 and we run it on the GEICO collection. The query has a bind variable called loc (“$$loc”) and we bind the value “Chicago” to that variable before running the find. The results are returned as a CSV file.

The second example is similar but in this case we run the aggregation pipeline called “two phase”.

The quickest and simplest way to build Gateway URLs is within the Studio (if you are running version 2.6 and up). In the Finder, Analytics and Visualization applications you have a “Publish URL” button. When you click this a dialog as shown below appears.


This dialog already embeds your saved name and more. It will also present the bind variables with which you parameterized the query or pipeline. Normally you would leave those empty for the end-user to fill in but if you want the API to embody hard-coded values fill this here. You can also choose to let the API run within your account or have the end-user specify their own username/password. When you are ready click the Publish URL button and two URLs will be computed for you in the lower part of the screen:


The first URL has the name-value pairs as a GET request and the second encrypts all the parameters in a single q=<value> pair. Both URLs will act as a Gateway call. There are two advantages to using the encrypted URL: 1. Values are not exposed, and 2. If values such as username, password and bind values are (intentionally) omitted then the end-user will be challenged for the missing values using a simple HTML form. Note that you can also mix the two - e.g. have a set of name value pairs in addition to the q name-value. In this case the additional name-value pairs override what’s in the q name-value.

If you want to experiment with API calls through the gateway until you have the right name-value pairs, use the Gateway submission form (replacing localhost for your gateway’s location where you installed it):


This form includes the most common Gateway parameters which you can use rather than building URLs. Once you are pleased with the results you can click on the “Show URL” button to get the name-value pairs.

The parameter output can take one of three values: json, csv or a string of the form collection.<col name>. For example, if you specify output=collection.out1 then all results are inserted or upserted as new documents into the out1 collection in the database where you are running your query. You must have write privileges to do this. This pattern allows you to run any query or pipeline and then access the results directly from MongoDB. When you choose this option you should supply another parameter to the Gateway called desc (e.g. desc=startsA). Every document inserted into such a collection has two additional field which help you identify which documents are part of an API call. An example of such a document is:

"_id": { "$oid": "52f98316da0631f9624d7851" },
"Country Name": "Andorra",
"Country Code": "AND",
"lmrm__name": "startsA",
"lmrm__start_time": 1392083734291

lmrm__name will have the value you pass in for desc and lmrm__start_time the number of milliseconds since epoch at which you submitted this call to the Gateway.

The full set of parameters you can use in a Gateway API call are:

  • type - find, agg, list, insert, update or eval. The default is find. Use agg for aggregation pipelines. List returns all the available saved queries, pipelines and graphs that can be used in invocations. Use insert or update to stream JSON data into a collection (not from the result of the query - e.g. in an Internet of Things (IoT) type application). See Write Operations below for more information. Use eval to run Javascript code on the server using db.eval.

  • col - Name of the collection on which to run the query or pipeline. Mandatory.

  • name - Name of the saved query or pipeline. Note that the system searches the database specified by the sdb parameter for these definitions and if sdb is omitted it will look for them in the main database (specified by the db parameter). Mandatory

  • output - this parameter controls the output and has many options:

    • json - produces a JSON array as the result. This is the default.

    • csv - outputs the data in comma delimited format. When generating a CSV you can also use the filename=<file name> parameter to control the name of the generated CSV file.

    • perl - outputs the data in a Perl dictionary format (that can be passed to an eval function).

    • collection.<collection name> - instead of outputting the data, the result documents are inserted or upserted into a collection in the database.

    • report - produces a tabular report with the results.

    • graph.<graph name> - produces a Web page with the saved graph. Note - graphs are not consistently produced from the Gateway for IE9. IE11, Chrome, Firefox and Safari are all reliable in producing graphs through the gateway. Graph pages can then be converted into an image on the client - see

    • schema - produces a report for every field in the output of a query or an aggregation pipeline. Each line in the report lists the name of the full path, how many documents contained this field, what percentage, as well as statistics on types. Note that when you have arrays the percentages and counts can be larger than the total size of the collection.

Normally when you specify output=collection.<collection name> then each result document is inserted into the collection. If you wish an upsert to be performed instead you need to specify the fields that will be used for the match condition. For example, if you want to update the output collection using fields called first_name and last_name then you need to specify two additional named value pairs such as:


If the field name contains spaces or other such characters enclose the field name in single or double quotes. If there are no update.x=<field name> specifications then the Gateway will do an insert. When an update is performed it is always an upsert with multi=false.

Important note: Do not use curl with the -L flag to get a graph or a report. Curl will get the HTML page but not the graph data and thus will not give back the connection thread until 30 minutes pass and thus will consume a concurrent connection unnecessarily. You must let the call to get the data go through if you are retrieving graph data as that call does give back the connection.

  • dropTarget=force - When you use output=collection.<collection name>, this will cause the target collection to be dropped before insertion starts.
  • writeSdb - If the output is collection.<collection name> and results are written as documents into a collection, then normally it is written to the same database where the source collection is. If you set writeSdb (any value) then the collection is written into the Studio database.
  • desc - A string used for tagging documents inserted into the output collection when using output=collection.<collection name>. This value will populate a lmrm__desc field (along with a lmrm__start_time filed) in every document inserted. If not specified, then neither lmrm__desc nor lmrm__start_time are added and you will not know which documents came from this run. Sometimes you do not care - e.g. when chaining multiple calls or when you delete the collection every time before running the API.
  • limit - A number limiting how many documents are returned on an HTTP response. This is to avoid too large responses. There is a second limit that is controlled not by the API caller, but by the API definer. Within the Studio, there is a preference value that can be set in the Aggregation Builder preferences for limiting these responses. The minimum of these two limits controls the size of the response. The system will read this preference (from the sdb database). You should always specify a limit when running a report. Graphs have a separate parameter controlling them called graphPoints.
  • prettyPrint - Any value will cause JSON output to be indented.
  • trace - Any value will cause query information to be logged to lmrm__json_studio_runner_times not just for the main query (which is always logged) but also for any queries done as part of a join. This is useful for debugging joins. Trace also adds progress messages in the warnings array and can help jSonar support with diagnostics.
  • findlimit - An integer. If added and if you are invoking a find then a limit clause will be added at the end.
  • graphPoints - An integer. If output=graph.<name>, specify how many points to plot.
  • dropSource=force - Any value means that after running the API the source collection will be dropped! Take care to only use this when you chain APIs and use interim collections.
  • secondaryPref - Any value will allow connecting to a secondary of a replica set and prefer a secondary if the host value passed in is a replica set.
  • sessionTimeout - In seconds; tells the Gateway not to use the default expiry limit (default is 1800 seconds - 30 minutes). Use this when an API call may take a very long time and you do not want it to fail in the middle (e.g. creating a join).
  • published_by - A user name that published the API. This is an optional parameter and you will usually not use it. If only one user published a query/pipeline.graph by that name then it will be used. But if more than one user published such an entity under the same name then to disambiguate the API you need to specify which user’s definition you want to load.
  • factor - used only in the case of graph output and only for bars, columns, pies, lines and scatter plots. Allows resizing of the generated d3 graph. The default is 1 and does not need to be supplied. Use 1.3 to make 30% larger or 0.9 to make 10% smaller.
  • date_format - A string used to format items of type ISODate. Default if missing is yyyy-MM-dd HH:mm which is recognizable by Excel.
  • autoCancel - A number in seconds to auto cancel the query if it runs over that number of seconds. Set to 0 for no auto cancel. Omit to let the user’s preference determine the value.
  • showExceptions - By default if there is an error or an exception while processing the request the reply will be “No Data Found”; add showExceptions=1 to see details on why the request failed.

Additionally, you need to supply the login credentials to the database. These are exactly the same format as used by the login screen. The parameter names are:

  • username
  • pwd
  • host (Mandatory)
  • port (27017 is the default)
  • db (Mandatory)
  • sdb
  • ssl (true|false. false is the default)
  • anyCert (true|false. false is the default)
  • krb (true|false. false is the default)
  • ldap (true|false. false is the default)
  • keepSession - this flag is used when you plan on issuing many Gateway calls and don’t want to login every time. In this case issue a single Gateway call with login credentials and keepSession=1. Subsequent calls to the Gateway can then be made without supplying any credentials. Note that subsequent calls CANNOT include any data pertaining to the above - i.e. subsequent calls can only be made to the same login setup. If you supply any value at all for any of the above 9 parameters then a new session will be created and the earlier one expired. Also note that a session that is inactive for 30 minutes (or whatever the timeout parameter set for Tomcat) will be expired and you will need to re-login and that while you have not expired the session (through a timeout or a new value for any of the 9 parameters) you will be consuming one of the Gateway’s concurrent licenses. You can specify keepSession=1 or any value. If you wish to exit a session that you have marked to be kept simply issue new login credentials.

NOTE: keepSession plays an additional role in aggregation-based reports. Normally, when you use an API call that generates a report, there is a limit on how many rows are returned (typically 100, 1000, 10,000 etc.). Built-in cursoring exists in each such report - e.g. if your limit is 10K you will see 100 rows ata time and thus have 100 pages. But the cursoring is all running versus a local cache and no cursor is maintained on the database. When you use the same API call for the report but add keepSession=1 the cursor on the database will be preserved and a Get More button added to the report page - so you can still cursor through the local cache but then can retrieve another chunk of documents from the server into the local cache. This is only available for aggregation-based queries.

  • closeSession - this parameter forces the Gateway to close the session and discard any session information. Use this alone - i.e. no other parameters should be used with this one and the call should look like https://<host>:8443/Gateway?closeSession=1. This is only relevant when you have previously used keepSession and want to release this session to the counted pool.
  • When you use keepSession within a chart, the default is to create a “View Report” button that shows the underlying data for the chart in a report. This however runs the query a second time to create the cursor. To prevent this button and additional query while maintaining the session (e.g. for a single sign-on), use skipRepButton=1.
  • mapCredentials - any value will cause the Gateway to try to map the passed credentials if a mapping exists, or uses the passed values if it does not exist. This is useful when mapping user credentials to a functional ID used by the database. See Credentials Mapping for more information.

Finally, for each bind variable in the saved query/pipeline you need to bind a value using the form bind.<var name>=<var value>. Don’t forget to include double quotes to a value that is a string. For example, bind.myvar=7 will bind the numeric value 7 while bind.myvar=”7” will bind the string value “7”.

Latency and throughput: A call to the gateway takes approximately 20-30ms. The call is of course dependent on the query time and the time it takes to connect to the database. With a simple query and a connection over a LAN each call takes 20-30ms. Throughput depends on the hardware on which the Studio is running. On a dual-core i5 there was no issue sustaining 500 calls per second.

Published APIs

By default you can only invoke and use queries/pipelines/graphs that you construct. The credentials you pass into the Gateway must match the ones under which you save the entities in the Studio.

To allow a power-user to build an API that can then be used by other users/roles you can publish your saved entities using the Studio. When saving a query, pipeline or graph definition check the “Publish” checkbox. When doing so you grant any user the ability to use this entity. Checking this checkbox in the Studio also controls the delete option - if you delete a saved query for example and the Publish checkbox is checked then the public API definition is also deleted.

Entities are published under a name and the user name that published the entity. This means that if you use the same Studio DB for multiple applications/instances that have different users but they share a name, then the published data (and the saved data for that matter) may have collisions among different users.

When you make an API call through the gateway the order by which a definition is loaded is as follows (in the example below, assume the user name is u1 who is invoking a pipeline named p1):

1. First the entity is searched in the private definitions. I.e. if u1 has a saved pipeline by the name of p1 then it will be used.

2. If one does not exist, the Gateway checks whether a pipeline by the name of p1 has been published. If the API call specifies a publish_by flag then the Gateway loads the definition published by that user (e.g. if published_by=u2 it will try to load the definition by u2). If one does not exist it will return an error.

3. If a published_by was not specified and there is only one entity published under the name p1 then it will load that entity and use it.

4. If no entity p1 was published or more than one was published and no published_by value specified then the Gateway will return an error.

Write Operations

As of version 2.2 of the Gateway, write operations are supported. Using the Gateway you can update and insert to a collection. Write operations use JSON documents. The Gateway does not perform any validation of these documents and returns error messages (if they exist) as returned by the underlying database.

Parameters available for inserts (type=insert):

  • insertJson - a JSON document to insert

Parameters available for update (type=update):

  • updateJson - a JSON document to use in the update (e.g. using a $set).
  • updateQuery - a JSON document used to determine which documents in the collection to update.
  • upsert - any value will do an upsert; omit for upsert=false.
  • updateMulti - any value will set a multi update; omit for multi=false;

You can transform incoming documents before they are inserted or used for an update. You do this by writing a Java user-defined transformer. A transformer is a class that implements the following Java interface:

package com.jsonar.util;

public interface JSONTransformer {

    * Returns string or DBObject to be used in the various write operations.
    * Allows application-specific transformations.
    * Return null if should not continue.

   public static enum WriteType { INSERT, UPDATE, UPSERT };

   // Used for inserts
   public String transform(String input, String collectionName, WriteType wt, HttpServletRequest req);
   public Object transform(Object input, String collectionName, WriteType wt, HttpServletRequest req);

   // Used for updates
   public String[] transform(String input, String query, boolean upsert, boolean multi, String collectionName, WriteType wt, HttpServletRequest req);
   public Object[] transform(Object input, Object query, boolean upsert, boolean multi, String collectionName, WriteType wt, HttpServletRequest req);


The default implementation is a trivial one that performs no modification:

package com.jsonar.util;

public class DefaultJSONTransformer implements JSONTransformer {

   // Used for inserts
   public String transform(String input, String collectionName, WriteType wt, HttpServletRequest req) {
      return input;

   // Used for inserts
   public Object transform(Object input, String collectionName, WriteType wt, HttpServletRequest req) {
      return input;

   // Used for updates
   public String[] transform(String input, String query, boolean upsert, boolean multi, String collectionName, WriteType wt, HttpServletRequest req) {
      String[] ret = {input, query};
      return ret;

   // Used for updates
   public Object[] transform(Object input, Object query, boolean upsert, boolean multi, String collectionName, WriteType wt, HttpServletRequest req) {
      Object[] ret = {input, query};
      return ret;


You can implement your own transformer for any manipulation you may need to do on the documents before returning the document (or string) that is later used for insertion/update. These transform implementations can do much more than just change the document - for example, they can be used to implement streaming analytics functionality where values are computed based on documents coming in over the wire.

Note that in your class you should implement all four methods. Even if you want nothing special to occur, copy/paste the trivial implementation from above to return the input. Also note that the transform method receiving the string is always called first and then the transform method that receives the DBObject is called.

Once you implement the class, compile it and place it under jsonar/sonarFinder/WEB-INF/classes, edit jsonar/sonarFinder/WEB-INF/web.xml and replace your fully-qualified class name in:


The flow of the Gateway processing and your transformation class for insert is as follows:

  • The insert JSON string is retrieved from the HTTP request.
  • The string is passed to the first transform method above (in your class).
  • If you return null, no more processing will occur.
  • If you return a string, it is parsed and a JSON object created from the string.
  • The JSON document is then passed to the second transform method above (in your class).
  • If you return a null no more processing will occur.
  • If you return a document it is inserted into the collection.

Note that in an insert you can also pass in an array of documents/objects and they will be inserted using bulk insert in MongoDB version 2.6 and up.

The flow of the Gateway processing and your transformation class for update is as follows:

  • The update JSON string and the query JSON string are retrieved from the HTTP request.
  • The strings are passed to the third transform method above (in your class) along with all other parameters of the call (e.g. whether to upsert or not).
  • If you return null, no more processing will occur.
  • If you return a value it should be an array fo two strings - the first one being the new string to be used for update and the second is the string to be used for the query of what to upfate. These strings are parsed and JSON objects created from the strings.
  • The JSON documents are then passed to the fourth transform method above (in your class).
  • If you return a null no more processing will occur.
  • Otherwise, you should return two DBObjects and they will be used - the first for the update and the second for the query of what documents to update.

Note that for updates the document must be a JSON document and cannot be an array.


JSON Studio supports joins through the Gateway. You can build and design queries that do joins with additional collections (or the same collection). Running a query that contains joins is only possible through the Gateway - not through the interactive Studio.

Note: MongoDB does not support joins. The implementation of joins in JSON Studio therefore makes multiple find calls to build the result set. Be aware that after applying the match elements, the Studio will make additional find calls for every document in the starting collection’s result set to create the final documents that will include data from the joined collections.

JSON Studio support for joins is useful when you want to combine data from more than one collection. You may need data from more than one collection in order to query or compute something, but apart from writing code you have no way to iterate through two collections. The pattern Mongo asks you to follow is that of containment or embedding (sometimes also called de-normalization). Even if you did not design your collections with this in mind you can still use JSON Studio to put this together. The steps involved are:

  • Design the query that will give you the document(s) you want to add as subdocuments from the secondary collection, test it and then parameterize it.
  • Design the main query on the main collection and add a JOIN section to the projection clause.
  • Use the Gateway to run the join. If you just want the result use the output of the Gateway. If you need the results as a new collection use the Gateway’s option to insert the results into a new collection.

Let’s look at a full example. Start with a collection called projects that holds data about projects funded by the world bank. Each document takes the following form:

"_id": { "$oid": "52b213b38594d8a2be17c780" },
"closingdate": "2018-07-07T00:00:00Z",
"countrycode": "ET",
"countryname": "Federal Democratic Republic of Ethiopia",
"countryshortname": "Ethiopia",
"totalamt": 1.3E8

In a separate collection called dp1 we have gdp data by country. Each document in this collection takes the form:

"_id": { "$oid": "531927d642b13294d2d21c89" },
"1994": 17342.471939574487,
"1995": 16441.38125963514,
"1996": 16586.192297090263,
"1997": 17927.412026390564,
"1998": 19081.570288413997,
"1999": 19356.42086514723,
"2000": 20619.565849638784,
"2001": 20671.545738501973,
"2002": 20433.654108816663,
"2003": 20834.93971012373,
"2004": 22566.682157632404,
"2005": 23302.831988005506,
"2006": 24015.420612270114,
"2007": 25921.282140539228,
"2008": 27549.34573191061,
"2009": 24639.935326015646,
"2010": 24289.141516132615,
"2011": 25354.7824741086,
"Country Code": "ABW",
"Country Name": "Aruba"

Because they are in separate documents, we cannot query and aggregate this data together. For example, if we wanted to use the Aggregation Builder to group project information only for countries over a certain GDP threshold we would not be able to. We would need the data to be embedded, e.g. we would need each document to look something like:

 "_id": {
  "$oid": "52b213b38594d8a2be17c780"
  "closingdate": "2018-07-07T00:00:00Z",
  "countrycode": "ET",
  "countryname": "Federal Democratic Republic of Ethiopia",
  "countryshortname": "Ethiopia",
  "impagency": "MINISTRY OF EDUCATION",
  "totalamt": 1.3E8,
  "gdp data": {
       "_id": {
      "$oid": "531927d742b13294d2d21cd2"
      "1994": 124.55028984006057,
      "1995": 133.3778456526254,
      "1996": 144.22964451211635,
      "1997": 146.75139689788517,
      "1998": 129.41342202476415,
      "1999": 121.99507364700685,
      "2000": 122.65703782705778,
      "2001": 119.00835379315724,
      "2002": 110.27641437726285,
      "2003": 117.69763222093768,
      "2004": 134.39487997738743,
      "2005": 159.96867809189058,
      "2006": 191.7685451102841,
      "2007": 236.0982162942025,
      "2008": 309.95228313748186,
      "2009": 332.33055631420217,
      "2010": 302.098004655256,
      "2011": 335.0028859670898,
      "2012": 453.56899273923506,
      "Country Code": "ETH",
      "Country Name": "Ethiopia"

Normally you would have to write code to do this - but starting from release 1.3 of JSON Studio you can do this using the Gateway. What the Gateway does when invoked is iterate over the projects collection and for each document retrieved it runs an additional find on the dp1 collection (passing in the country name from the document retrieved), updates the document with the result of the second find, and then uses this new larger document for the output. The steps required to build this are:

  • Build a query on dp1 that just filters based on the country name. Rather than hard-code the country name, parameterize it. The Additional where condition is:

    "Country Name" : "$$country"
  • Call this query dp1_by_country and save it.

  • Now open the projects collection and build a new query. In this query there is just an Additional select clause that you can get by selecting the JOIN radio button and double clicking on countryshortname. Then edit the placeholders until it looks like:

    "gdp data" : {ns: "dp1", q: "dp1_by_country", "$$country": "countryshortname"}

    What you are saying is that you want a new field called “gdp data” to be created that is the result of running the dp1_by_country query on the dp1 collection and that you want the $$country bind variable to be bound to the value of countryname from each document in the main collection.

  • Call this query projects_join and save it.

  • Using the Gateway run the query, e.g. using:


    You will get the joined documents. If you want these documents saved to a new collection add output=collection.<new collection name> to the Gateway call.

Joins in JSON Studio are very flexible. Specifically, you can:

  • Create more than one join using a number of JOIN clauses in the projection area.
  • Use any number of fields for the join conditions
  • Use various data types and operators for creating the join conditions
  • Use any depth (dot notation) to map the join condition

Note: Joins can only be output as JSON data, CSV data or inserted into a collection. Use output=json, output=csv or output=collection.<name> only for joins.

Join time depends on a multitude of factors such as the speed of the mongodb database, the complexity of the joins, the network speed and more. As an example, a single Gateway thread on an Intel i7 doing a join with 3 collections where the main document has a few tens of fields and the two joined collections have a few fields will process at a rate of around 500 documents per second. A single thread on an Intel i7 doing a join with 4 collections where the main document has 20 Million documents each document taking ~500 bytes and each joined collection has between 3000 to 5000 documents each and each resulting join document takes ~1000 bytes will process at a rate of around 300 documents per second.

Table Of Contents

Previous topic

Binding Variables

Next topic


Copyright © 2013-2016 jSonar, Inc
MongoDB is a registered trademark of MongoDB Inc. Excel is a trademark of Microsoft Inc. JSON Studio is a registered trademark of jSonar Inc. All trademarks and service marks are the property of their respective owners.