Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Validator utility

Parent article

Lumada Data Catalog's Validator utility is useful for fixing inconsistencies or invalid entries that got added to the Solr repository because of defects in the code or unanticipated field situations.

This utility can verify and fix Tag Domain, Tag, User, Roles, Virtual Folders or Data Sources, and is available as a self-contained jar located in the contrib directory of Data Catalog install.

Starting from version 4.4, validator requires one of the following solr jars also in the class path before the validator jar.

  • waterlinedata-metadata-solr5-.jar
  • waterlinedata-metadata-solr7-.jar

It is recommended to invoke the validator utility using the validator.sh found under <LDC App-Server Dir>/bin path.

The script will automatically include the necessary jars required for validator.

The usage information and the supported functionality can be easily found by running the help command as follows:

<LDC App-Server Dir> $ bin/validator.sh -help

The following options are displayed:

Usage: -help | -verify | -verifyAndFix
           [-dataSource <dataSourceName>   ]
           [-verifyAction <verifyActionName>      ]

<verifyActionName> cane be one of the following:
       checkfor-invalid-tag-associations
       checkfor-orphan-tag-associations
       checkfor-tags-immutable
       checkfor-data-source-by-key
       checkfor-invalid-tag-names
       checkfor-invalid-dataset-names
       checkfor-invalid-user-names
       checkfor-invalid-tag-domains
       checkfor-invalid-resource-folder-maps [-dataResourceKey <resourceKey> ]
       checkfor-orphan-resource-folder-maps
       checkfor-duplicate-resource-folder-maps
       checkfor-duplicate-case-sensitive-tags
       checkfor-duplicate-resource-fields
       checkfor-invalid-audit-description
       checkfor-invalid-virtualfolder-definition
       checkfor-incorrectly-created-hive-views
       checkfor-duplicate-entities-with-same-key-attributes [-entityToFix <TagDomain/Tag/User/Role/Source/VirtualFolder> ]
       # Duplicate detection will be done for : TagDomains, Tags, Roles, users, DataSources and VirtualFolders
       checkfor-virtual-folder-run-records [-vfRunRecordsFile <vfRunRecordsFileName> ]
       checkfor-custom-props-in-use [-customPropertyName <customPropertyName> ]

If -verify is specified and -verifyAction is not specified, actions will be picked up from verify.actions file from classpath
If -verifyAndFix is specified -verifyAction is required

Where:

  • verify command will look for the inconsistencies.
  • verifyAFix command indicates and intent of corrective action on inconsistencies and is generally followed by the -verifyAction flag.
  • verifyAction will take action as specified by the following options:
    • checkfor-orphan-tag-associations checks and lists orphan tag associations.
    • checkfor-tags-immutable checks for immutable tags.
    • checkfor-data-source-by-key checks for data sources by key.
    • checkfor-invalid-tag-names verifies if all tag names are valid.
    • checkfor-invalid-dataset-names verifies if all dataset names are valid.
    • checkfor-invalid-user-names verifies if all user names are valid.
    • checkfor-invalid-tag-domains verifies if all tag domain names are valid.
    • checkfor-invalid-resource-folder-maps checks for folder maps that are invalid or incomplete for each resource and is followed by -dataResourceKey.
    • checkfor-orphan-resource-folder-maps checks for any folder_maps that are there but no corresponding resources.
    • checkfor-duplicate-resource-folder-maps checks for folder_maps that are duplicates.
    • checkfor-duplicate-case-sensitive-tags checks for duplicate case-sensitive tags.
    • checkfor-duplicate-resource-fields checks for duplicate resource fields.
    • checkfor-invalid-virtualfolder-definition checks for invalid virtual folder definitions.
    • checkfor-incorrectly-created-hive-views checks for incorrectly created Hive Views which are disguised as Hive Table.
    • checkfor-duplicate-entities-with-same-key-attributes

      To verify and fix a specific duplicate entity use the -EntityToFix flag.

      $ validator.sh -verifyAndFix \
                     -verifyAction checkfor-duplicate-entities-with-same-key-attributes \
                     -EntityToFix <TagDomain, Tag, User, Role, Source, VirtualFolder>
    • checkfor-virtual-folder-run-records
    • checkfor-custom-props-in-use checks for the resources in which the said custom property is set. This option must be followed by the -custompropertyName providing the name of the custom property for which the usage is to be verified and/or fixed.

      When used with the -verify command, the Validator will look for the resources having the said custom property set. The results can be confirmed by examining the wd-ui.log under <LDC Log Dir> (Typically /var/log/waterlinedata/).

      When used with the -verifyAndFix command, the Validator will reset the value of said custom property in all the resources found using the -verify command.

      ImportantVerify must be run before VerifyAndFix for proper resetting of custom property values.
  • entityToFix specifies the entity on which the fix action is to be performed.

NoteIf -verify is specified and -verifyAction is not specified, actions will be picked up from verify.actions file from classpath.

If -verifyAndFix is specified -verifyAction is required.

The following are some sample commands:

  • Verify duplicate entity existence:

    <LDC App-Server Dir> $ bin/validator.sh -verify \
                                            -verifyAction checkfor-duplicate-entities-with-same-key-attributes
  • Verify and fix all duplicate entities:

    <LDC App-Server Dir> $ bin/validator.sh -verifyAndFix \
                                            -verifyAction checkfor-duplicate-entities-with-same-key-attributes
  • Verify and fix a specific duplicate entity:

    <LDC App-Server Dir> $ bin/validator.sh -verifyAndFix \
                                            -verifyAction checkfor-duplicate-entities-with-same-key-attributes \
                                            -EntityToFix <TagDomain, Tag, User, Role, Source, VirtualFolder>

NoteIn case of Kerberized environments make sure that the Kerberos ticket is valid.