Working with the Schema Analyzer

Use the Schema Analyzer to explore a collection’s schema or even to explore the schema for results of a find query.

While JSON databases are often called schema-less, the more appropriate terminology is flexible schema. Each document can have a different structure. However, in a majority of use-cases most documents in a collection have a consistent schema (or small number of schemas in case of schema evolution). The Schema Analyzer uses data created by the sampler (the Schema Analyzer) in order to show you an approximation of the collection’s documents’ schemas.

Navigate to the Schema Analyzer either using a navigation link or from one of the collection viewers.

When you navigate to the Schema Analyzer from one of the collection viewers you are able to review the schema for the entire collection. If you navigate to the Schema Analyzer from the result set panel in the Finder then you can view the schema (fields and types) for the results of a find query. You can also select any saved query (saved search) from the pull down and compute the schema for that result.

The Schema Analyzer has two panes as shown below.

_images/schemer1.jpg

The main pane shows a tree with the fields in the documents. These come from a sampling of documents and may be a superset of a particular document and a subset of the union of all fields in all documents. For each field the table shows the distribution among the various types. By clicking on the blue box with the percentage value you can issue a query against that collection to return values of that specific type; the result is shown in the Typed Values tab in the right tab view. You can also use the drop down to view distinct values for a field (when viewing collection schema details) or to see which document IDs do not have a value for a specific field - this is useful for data cleansing tasks.

Other buttons on the Schema Analyzer (from top-left to bottom right):

  • Select the collection for which to display the schema from the pull-down menu. When viewing result set schemas you can select a saved query.
  • Build or refresh the selected collection’s metadata using the quick method or the more comprehensive method.
  • Ping the database server.
  • Cancel a long running operation.
  • Expand all sub-documents in the schema super-set.
  • Collapse all non-top-level sub-documents in the schema super-set.
  • Refresh the main panel.
  • Create a stacked bar chart for the schema fields.
  • Create a report for the field names and types.
  • When values are shown in the lower-right hand pane, they are presented as a hyperlink. If you navigated to the Schema Analyzer from the Finder or from the Aggregation Builder, clicking on such a value has a similar effect to that of point-and-click on a collection viewer and an item will be added to the Finder’s additional where or to a Aggregation Builder stage’s data table.

Note that the main analysis of schemas should be done using the Python Sampler. The Java sampler is an approximation that computes valid metadata but does not handle incremental updates, throttling, comprehensive scans, or updates based on partial queries. It can be used during development and is good when docuemnts are relatively uniform or static - but for best results the Python Sonar Sampler utility should be used.

Casting using the Schema Analyzer

If you are using JSON Studio version 1.2 and up, you have a casting feature that allows you to convert data stored as strings to numbers or dates. For example, if a document has a string such as “Fri Apr 26 19:20:23 EDT 2013” then more likely than not this should be included as a datetime in the document. It is typical for such mishaps to occur when loading data and the Schema Analyzer gives you a simple way to cast the string data to the right type. Click the toggle casting button at the top right of the schema panel and a set of cast pull-downs will appear in the string columns. Select which type to convert to and casting will start.

A few notes to remember:

  • The documents will be updated so take care when doing this.

  • Casting may fail in which case the original data stays in the document.

  • Only immediate objects are converted. For example, if you have an object such as:

    {
    "_id" : ObjectId("52e7e1d718b249c832674b06"),
    "a" : "55",
    "b" : 2,
    "cc" : [
          [
       "5",
       "6"
    ],
    "7",
    "a"
    ]
    }
    

    and you choose to convert cc to Int then only the “7” will be converted. On the other hand, if you have an object such as:

    {
    "_id" : ObjectId("52e7e1d718b249c832674b06"),
    "a" : "55",
    "b" : 2,
    "cc" : {
       "ccc":
          [
       "5",
       "6"
     ],
    "cca": "7",
    "ccb: "a"
    ]
    

    and you choose to convert cc.ccc then both the 5 and the 6 will be converted.

  • Casting to a date requires you to select a date format. Various data formats are provided in the Preferences dialog. If you need a format that does not appear in the Preferences contact support@jsonar.com. Note also that the data format preference is not saved and is good for a single session only.

Adding Bind Variables using the Schema Analyzer

After you run a distinct query (by clicking on the flag icon), the Distinct values tab will show all distinct values. Click on the $$ icon at the top of the tab view to add a bind variable with all distinct values as options. After clicking the $$ icon click on the Bind link at the very top to name the variable.

See Binding Variables for more on bind variables.

Adding Aliases using the Schema Analyzer

Click on the comment icon next to each field to add an alias for this field. By default the alias is created for this collection only and the value initially given is identical to to the name. Click on the Aliases link at the very top to give the alias a meaningful value and, if you want the alias to apply to all collections, delete the collection component from the alias name.

See Working with Aliases for more on aliases.

Table Of Contents

Previous topic

Working with the Visualizer

Next topic

Schema Analyzer/Sampler

Copyright © 2013-2016 jSonar, Inc
MongoDB is a registered trademark of MongoDB Inc. Excel is a trademark of Microsoft Inc. JSON Studio is a registered trademark of jSonar Inc. All trademarks and service marks are the property of their respective owners.