Skip to main content
Hitachi Vantara Lumada and Pentaho Documentation

Migration utility

Parent article

Lumada Data Catalog's Migration Utility was primarily designed to export glossary definitions from a test environment to a production environment to eliminate curation process duplication in the production environment, assuming the seeded resources are common across the two environments.

Today, this Migration Utility functions more like a data model mapper, where you can export/import:

  • The basic definitions of the Data Catalog assets or data model between environments with different data mapping.
  • In case of environments with matched data mapping, most of the asset metadata.
NoteWhile exporting metadata does not have any sequence restrictions, importing metadata insists on a specific import sequence due to internal dependencies which will be described in the Metadata files section.

The following metadata can be imported and exported:

Export metadata

You can export metadata from Lumada Data Catalog 2019.3 which will be saved in the CSV format. The Migration tool performs file level access checks when exporting metadata, that is, the user-defined in config file requires access to all the files being exported for the export utility to work as intended.

The export utility is part of the ldc-tools-x.x.x.jar and a typical workflow to export metadata looks as below:

Procedure

  1. From the command prompt, cd into the Data Catalog Install App-Server location (Typically, /opt/ldc/app-server).

  2. Run the following command to encrypt the password of the user that is exporting the data, the same user that will be specified in the config.properties.

    Make sure that the encrypt command has execute permissions for the user running this command:

    <App-Server Dir>$ bin/ldc-util encrypt

    The utility will prompt for the text password that needs to be encrypted.

    A sample looks like below:

    <App-Server Dir>$ bin/ldc-util encrypt Enter text: enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n)

    Note

    Use the entire string enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n) as the encrypted password.

    If your password contains special characters (,$,#, and so on), enclose the password in single quotes as below:

    <App-Server Dir>$ bin/ldc-util encrypt 
    
    Enter text:<'water$LinData'>
  3. Edit the config.properties under contrib directory

    (<WLD_AppServer>/contrib/config.properties) and update the following fields to reflect your environment values:

    #modify this configuration to match to your environment.
     url=<Data Catalog URL> (e.g. http://<hostname>:8082/api/v2) 
     user=<Data Catalog Administrator username> (e.g. waterlinesvc) 
     #pl encrypt password using Data Catalog command encrypt on command line and put it
          here... 
     password=<Data Catalog Administrator Encrypted Password obtained using the encrypt 
      command in wld-util script> 
     sourceMapping=false 
     importVersion=40 
     exportReviewsAndFavorites=true 
     confPath=<WLD_AppServer>/conf/configuration.json
  4. Save and exit.

  5. In the command window, navigate to the Data Catalog Installation app-server directory.

  6. Run one of the following commands providing the path to where the newly exported metadata file will be saved (typically /opt/ldc/to_import/)

    • For HTTP

      <APP-SERVER-HOME>$ java -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ \ 
                              com.hitachivantara.datacatalog.tools.ie.Utils \
                              -c contrib/config.properties \
                              -fp <name pre-fix string for saved exports> \ 
                              -a export \
                              -S -T 60
    • For HTTPS

      <APP-SERVER-HOME>$ java -Djavax.net.ssl.trustStore=<Path to Java key store file> \
                              -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ \ 
                              com.hitachivantara.datacatalog.tools.ie.Utils \
                              -c contrib/config.properties \
                              -fp <name prefix string for saved exports> \ 
                              -a export \
                              -S -T 60
    Where:

    -fp: (required); File prefix flag - the string passed in the fp flag will be used as the prefix for all the files exported using the export tool

    -c: (required); path to the config.properties file used by the migration tool

    -a: (required) action - export/import

    -S: means all suggested tag associations will also be included in exported CSVs

    -S -T 60: means suggested associations > 60% confidence will be included

    If -S and -T are omitted it means only accepted associations need to be included in exported CSV

    Note-T 60 without -S will generate an error.

    The utility then prompts for a choice to export different entities as shown in the sample output below:

    $ log4j:WARN No such property [datePattern] in org.apache.log4j.RollingFileAppender.
      INFO  | 2020-04-05 20:46:14,362 | KeyStoreManager [main]  - KeyStore file path : /opt/ldc/app-server/jetty-distribution-9.4.18.v20190429/ldc-base/etc/keystore
      INFO  | 2020-04-05 20:46:14,401 | KeyStoreUtility [main]  - Loading the existing keystore...
      INFO  | 2020-04-05 20:46:14,913 | Utils [main]  - File name is contrib/60xExport
      INFO  | 2020-04-05 20:46:14,913 | Utils [main]  - Server is http://hdp265.ldc.com:8082/api/v2
      INFO  | 2020-04-05 20:46:14,913 | Utils [main]  - User is sam_admin
      INFO  | 2020-04-05 20:46:14,914 | Utils [main]  - Action is export
      INFO  | 2020-04-05 20:46:14,914 | Utils [main]  - Source mapping is false
      INFO  | 2020-04-05 20:46:14,927 | Utils [main]  - Suggested Tags is not set
      Choose from the options below:
        1) Export Sources
        2) Export Folders
        3) Export Domains
        4) Export Roles
        5) Export Users
        6) Export Tags
        7) Export Associations
        8) Export Datasets
        9) Export Custom Properties
       10) Export Data Objects
       11) Export Comments/Description
       12) Export All the above
    
      For multiple options give comma separated values. e.x. 1,4,6
      Enter input[12]: 3,6,7,9,11
      INFO  | 2020-04-05 20:47:28,434 | Export [main]  - Exporting domains...
      INFO  | 2020-04-05 20:47:31,407 | Export [main]  - Number of domains exported 10 in PT2.971S
      INFO  | 2020-04-05 20:47:31,407 | Export [main]  - Exporting tags...
      INFO  | 2020-04-05 20:47:31,842 | Export [main]  - Number of tags exported 112 in PT0.435S
      INFO  | 2020-04-05 20:47:31,842 | Export [main]  - Exporting associations...
      INFO  | 2020-04-05 20:47:32,265 | Export [main]  - Fetched 0 associations for Built-in_Tags:3-Letter_Country_Code
      INFO  | 2020-04-05 20:47:32,414 | Export [main]  - Fetched 0 associations for Built-in_Tags:Country
      INFO  | 2020-04-05 20:47:32,617 | Export [main]  - Fetched 1 associations for Built-in_Tags:Email
      INFO  | 2020-04-05 20:47:33,864 | Export [main]  - Fetched 6 associations for Built-in_Tags:First_Name
      INFO  | 2020-04-05 20:47:34,576 | Export [main]  - Fetched 0 associations for Built-in_Tags:Global_City
      INFO  | 2020-04-05 20:47:34,619 | Export [main]  - Fetched 0 associations for Built-in_Tags:IP_Address
      INFO  | 2020-04-05 20:47:35,005 | Export [main]  - Fetched 5 associations for Built-in_Tags:Last_Name

    One can provide multiple options as a comma-separated series of choices.

    The entities are exported as csv format with the file name convention as 'exported_<entity>.csv'.

    For example:

    exported_associations.csv
    exported_comments.csv
    exported_customproperties.csv
    exported_datasets.csv
    exported_domains.csv
    exported_folders.csv
    exported_roles.csv
    exported_sources.csv
    exported_tags.csv
    exported_users.csv

    The metadata is exported and saved in the CSV file.

Import metadata

You can import metadata saved in the Lumada Data Catalog defined pre-defined CSV file format file only. Refer to the Metadata files section for details.

In order to import users, roles, data sources, tag associations, and user reviews, you must use only the CSV files of the metadata that were exported earlier from Data Catalog using the same version of the Migration tool.

The import utility is part of the ldc-tools-xxxx.x.jar and a typical workflow to import metadata looks as below:

Procedure

  1. From the command prompt, change to the Data Catalog Install App-Server directory, /opt/ldc/app-server by default.

  2. Run the following command to encrypt the password of the Data Catalog Administrator:

    Make sure that the encrypt command has execute permissions for the user running this command.

    <APP-SERVER-HOME>$ bin/ldc-util encrypt

    The utility will prompt for the text password that needs to be encrypted.

    A sample looks like below:

    <APP-SERVER-HOME>$ bin/ldc-util encrypt
    Enter text:
    enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n)

    Use the entire string enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n) as the encrypted password.

    Note

    If your password contains special characters (,$,#, and so on), enclose the password in single quotes as below:

    APP-SERVER-HOME$ bin/ldc-util encrypt 
    
    Enter text:<'water$LinData'>
  3. Edit the config.properties under contrib directory (APP-SERVER-HOME/contrib/config.properties ) and update the following fields to reflect your environment values:

    #modify this configuration to match to your environment.
    url=<Data Catalog URL> (e.g. http://<hostname>:8082/api/v2)
    user=<Data Catalog Administrator username> (e.g. waterlinesvc) 
    #pl encrypt password using Data Catalog command encrypt on command line and put it
          here... 
    password=<Data Catalog Administrator Encrypted Password obtained using the encrypt 
      command in ldc script>  
    sourceMapping=false 
    importVersion=40
    exportReviewsAndFavorites=true
    confPath=APP-SERVER-HOME/conf/configuration.json
  4. Save and exit.

  5. Navigate to the app-server folder.

  6. Run one of the following import command providing the path to the previously exported metadata file (say<WLD_Install_Dir>/migration-utility-export/2019-3Export_sources.csv) in the following command:

    • For HTTP

      <APP-SERVER-HOME>$ java -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ 
                              com.hitachivantara.datacatalog.tools.ie.Utils 
                              -c contrib/config.properties 
                              -fp /home/waterlinesvc/migration-utility-export/60xExport_sources.csv 
                              -a import
    • For HTTPS

      <APP-SERVER-HOME>$ java -Djavax.net.ssl.trustStore=<Path to Java key store file> \ 
                              -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ 
                              com.hitachivantara.datacatalog.tools.ie.Utils 
                              -c contrib/config.properties 
                              -fp /home/waterlinesvc/migration-utility-export/60xExport_sources.csv 
                              -a import

    The metadata is imported into Data Catalog.

Metadata files

This section explains the format how the data is represented inside the CSV file.

Make sure the required tag domains are available in Lumada Data Catalog. If not, import tag domains first and then import tags.

Data source

Column sequence in the data source import file is as below:

  1. SourceName
  2. SourceDescription
  3. SourceType
  4. SourcePath
  5. Source_HDFS_URL
  6. Source_HIVE_URL
  7. Source_JDBC_URL
  8. Source_JDBC_USER
  9. Source_JDBC_PASSWD
  10. Source_JDBC_DRIVER

Data Source import will succeed only when the path definitions match in the target environment.

Sample JDBC data source import file:

1. "OracleHR","Corporate Oracle
      HR","jdbc","/HR",,,"jdbc:oracle:thin:@172.31.38.71:1521:XE","h
      r","rO0ABXcIAAABYv7KAz5zcgAZamF2YXguY3J5cHRvLlNlYWxlZE9iamVjdD42PabDt1RwAgAEWwANZW5jb2RlZFB
      hcmFtc3QAAltCWwAQZW5jcnlwdGVkQ29udGVudHEAfgABTAAJcGFyYW1zQWxndAASTGphdmEvbGFuZy9TdHJpbmc7TA
      AHc2VhbEFsZ3EAfgACeHBwdXIAAltCrPMXAYIVOACAAB4cAAAABDFgkyIzH3cWC5fAYIVOACAAB4cAAAABDFgkyIzH3
      cWC5fr32fVXHycHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5nr32fVXHycHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n","ora
      cle.jdbc.OracleDriver"

Sample HIVE data source import file:

1. "demoHive","Claims HIVE database",source_type_hive,"/default",,"jdbc:hive2:// ip-172-31-31
   -141.ec2.internal:10000",,,,"org.apache.hive.jdbc.HiveDriver"

Sample HDFS data source import file:

1. "sfo-airport2","Local HDFS sfo",source_type_hdfs,"/sfo-airport3","hdfs:// ip-172-31-31-141
   .ec2.internal:8020",,,,,
Note

After importing Data Sources, make sure to:

  • Create and connect agents.
  • Profile this data before importing any tag associations or Data Objects.

Roles

Column sequence in the role import file

  1. RoleName
  2. RoleDescription
  3. RoleAccessLevel
  4. RoleVirtualFolders
  5. RoleDomains
  6. MetadataAccess
  7. DataAccess
  8. MakeDefaultRole
  9. AllowJobExecution

Sample role import file

1. "Analyst","Analyst Role",Analyst,"[]","[]",”FALSE”, “METADATA”, “FALSE”, “TRUE”
2. "SFOAnalyst","Curates tags and discovers data in the SFO Open Dataset.",Analyst,"[SFOAirpo
   rt]","[Aviation]", ”TRUE”, “NATIVE”, “FALSE”, “TRUE”

User

Column sequence in the user import file is as follows:

  1. USER_NAME
  2. USER_DESCRIPTION
  3. USER_ROLE
  4. USER_FAVORITES (Include favorites in an Array and every favorite has below parts separated by "::")
    1. ReviewSource
    2. ReviewResource
  5. USER_REVIEWS (Include reviews in an Array and every review has below parts separated by "::")
    1. ReviewSource
    2. ReviewResource
    3. ReviewRating
    4. ReviewTitle
    5. ReviewDescription
  6. USER_ATTRIBUTES [the following attributes are to be passed as comma-separated list in '{}']
    1. firstname
    2. lastname
    3. email
    4. user_last_login [Timestamp]

Sample user import file:

1. "lara_analyst","","[SFOAnalyst]","[]","[]","{firstName=Lara,lastName=Analyst,email=lara@
   xyz.com}"
2. "waterlinesvc","","[Guest]","[]","[]","{TimeStamp}"

Before importing any users, make sure you have imported any custom role that will be assigned to the users.

Tag domain

Column sequence in the domain import file:

  1. DomainName
  2. DomainDescription
  3. DomainColor

Sample domain import file:

1. "Aviation","SFO open data. Air traffic landing and passenger statistics, tail (aircrafts) 
   numbers and models","#50AAE7"
2. "Built-in_Tags","Built-in_Tags","#30BAE7"

Tags

Column sequence in the tag import file

  1. TAG_DOMAIN
  2. TAG_NAME
  3. TAG_DESCRIPTION
  4. TAG_METHOD
  5. TAG_REGEX
  6. TAG_MINSCORE
  7. TAG_MINLEN
  8. TAG_MAXLEN
  9. TAG_ENABLED
  10. TAG_LEARNING_ENABLED
  11. EXTERNAL_ID
  12. EXTERNAL_SOURCE_NAME
  13. SYNONYMS

Sample REGEX tag import file

1. "Built-in_Tags","Email","Email",REGEX,"[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-zA-Z0-9!#$%&'
   *+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]
   *[a-zA-Z0-9])?",0.4,5,64,true,true,,,""

Sample Value tag import file

1. "Built-in_Tags","First_Name","First_Name",VALUE,,0.4,0,0,true,true,,,"" 
2. "Sales","Customers","Tag for Customers",VALUE,,0.6,0,0,true,true,,,""   
3. "Sales","Location",,VALUE,,0.8,0,0,true,true,,,""   
4. "Sales","Orders","Tag for Orders",VALUE,,0.0,0,0,true,true,,,""   
5. "Sales","Regions","Tag for Regions",VALUE,,0.8,0,0,true,true,,,""   
6. "Sales","Sales","Tag for Sales",VALUE,,0.6,0,0,true,true,,,""  
7. "Sales","Stores",,VALUE,,0.8,0,0,true,true,,,

Hierarchy in tags can be indicated in the tag name itself using the dot notation for naming the tag as follows:

1. "Sales","Orders.Jan.Week1","Tag for Orders",VALUE,,0.0,0,0,true,true,,,""

This will identify a tag Week1 with parent tag Jan and grandparent tag Orders under the Sales domain.

Note

Before importing any tags, make sure you have imported the corresponding tag domains.

Tag associations

Column sequence in the tag associations import file is as follows:

  1. TagDomain
  2. TagName
  3. DataSourceName
  4. ResourceName
  5. FieldName
  6. TagPropagation [Possible values: SEED or OTHER; SEED is used in Discovery Tags (Marked with dot(.) in front of tag name).
  7. TagAssociationState

Sample tag associations import file.

1. "Sales","COGS","HIVE","/foodmart.sales_fact_dec_1998","store_cost",SAMPLE,ACCEPTED
2. "Global","Customer_Master_GDPR","HIVE","/foodmart.customer","customer_id",OTHER,REJECTED
3. "Sales","Sales.Sales_location","HDFS","/user/waterlinesvc/Pass1/raw/nyc_open/r at_sightings_nyc_2014.csv","LOCATION_TYPE",SAMPLE,ACCEPTED   
4. "Sales","Sales.Sales_location","MySQL","/HR.employee","location",SAMPLE,ACCEPTED</div>
Note

Before importing any tag associations, make sure you have imported the corresponding tags and tag domains.

Reviews

Column sequence in the Review import file.

  1. USER_NAME
  2. USER_DESCRIPTION
  3. USER_ROLE (enclose roles in Array of Strings)
  4. USER_FAVORITES (Include favorites in an Array and every favorite has below parts separated by "::")
    1. Review/Source
    2. Review/Resource
  5. USER_REVIEWS (Include reviews in an Array and every review has below parts separated by "::")
    1. Review/Source
    2. Review/Resource
    3. Review/Rating
    4. Review/Title
    5. Review/Description
  6. USER_ATTRIBUTES [the following attributes are to be passed as comma-separated list in '{}']
    1. firstname
    2. lastname
    3. email
    4. user_last_login [Timestamp]

    Sample csv for user reviews:

    1. waterlinesvc,Predefined administrator user,"[Administrator]","[]","[MyHIVE::/default.tabl e3::4::Database_review::
    This is one of the niceset database, 
    2. MyHIVE::/default.table3::4::::]","{firstName=Waterline, lastName=Service, email=build+waterlinesvc@waterlinedata.com}"
Note

Before importing any Reviews, make sure you have imported Role and Users.

Field comments or descriptions

The Export Comments/Description option can export the custom field comments, label and Description as a _comments.csv file. Column sequence in this comments.csv file is as follows:

  1. VFName
  2. Resource Path
  3. Field name
  4. Field Label
  5. User Comment

Sample comments import file:

DO_Test,/user/EmpSkill_Analytics/Employee.csv,employee_id,EMPID,This is a sample comment to test re-profiling effects of user comments

Virtual folder

Column sequence in the Virtual Folder import file

  1. VFName
  2. VFDescription
  3. VFParent
  4. VFPath
  5. VFPInclude
  6. VFPathExclude
  7. VF_IsRoot

Sample virtual folder import file

"foodmart4","foodmart4","foodMart","/foodmart","(.customer_s1.*|.sales.*)",,false
      "foodMartChild","food mart","foodMart","/foodmart",".customer.*",,false

Data objects

Column sequence in the Data Object import file is a multi-level nested sequence as explained below

  1. Data Object Name
  2. Description
  3. Join Conditions: Multiple join conditions separated by ':#;' as in join_condition1 :#;join_condition2 :#;join_condition3
    • Where each join condition in turn is constructed as
      1. left_field_info
      2. right_field_info
      3. join_info

      with each entity in join_condition separated as left_field_info;;right_field_info##join_info

      • Each join_info in turn is built as follows
        1. Left Column info
        2. Right Column info
        3. Join Cardinality
        4. Join Order
        5. Join Operation

        with each entity in join_info separated as

        left_col_info;;right_col_info##Join_Cardinality,Join_Order,Join_Operation

        • Each column info in turn is built as follows
          1. Column Origin
          2. Column Data Source Name
          3. Column path
          4. Column resolved name
          5. Column name

          with each entity in col_info separated with a ',' as

          Left_col_origin,Left_col_source_name,Left_col_path,Left_col_res_name,Left_col_name

1. ProjectDO,Lara's Projects,HdfsDS::HdfsDS::/data/DocData/Lara/DO/Data/ProjectReq.csv::Proje
      ctReq.csv::man_id;;HdfsDS::HdfsDS::/data/DocData/Joe/DO_Demo/UseCase1/Data/Employ.csv::Emp
      loy.csv::emp_id##MANY_MANY::ALTERNATE::JoinOpEQ
Note

Before importing any Data Objects, make sure you have:

  • Imported the corresponding Data Sources and/or virtual folders
  • Have profiled the resources
  • Imported the tag domains, tags, and tag associations

Datasets

Column sequence in the Dataset import file

  1. Dataset Name
  2. Description
  3. ID
  4. Schema Version
  5. Virtual Folder
  6. Path Specifications in the format
    1. Exclude Regex Pattern
    2. Include Regex Pattern
    3. Source Path

      With each entity separated by :: (two colons) and multiple Path Specifications separated by ;; (two semicolons) as:

      <Exclude Regex1>::<Include Regex1>::<Source Path1>;;<Exclude Regex2>::<Include Regex2>::<Source Path2>

  7. Fields
  8. Origin

Sample Dataset import file

1. Dset1,,Dset1,10,HdfsDS,::.*::/data/data_NYPD;;,,HdfsDS 
2. EmpDataSet,,EmpDataSet01,10,HdfsDS,::.*::/data/DocData;;,,HdfsDS 
3. Sample,Sample Description,ID:01,9,DocDataHdfs,.*json::.*csv::/data/DocData/Joe;;.*xml::.*j
      son::/data/DocData/Lara;;,,HdfsDS

Custom properties

Column sequence in the Custom Properties import file is as follows:

  1. Name
  2. Display Name
  3. Data Type
  4. Description
  5. Property Group
  6. Property Group Description
  7. Is property custom property
  8. Access Level, where:
    • Where each join condition in turn is constructed as
    • pg1 = View: Everyone/Set: Analyst & Higher
    • pg2 = View: Everyone/Set: Steward & Higher
    • pg3 = View: Everyone/Set: Admin Only
    • pg4 = View: Everyone/Set: Nobody
    • pg5 = View: Analyst & Higher/Set: Admin Only
    • pg6 = View: Steward & Higher/Set: Admin Only
    • pg7 = View: Admin Only/Set: Admin Only
  9. Is property case sensitive
  10. Searchable
  11. Facetable

Sample custom property import file

1. DataGrp,Data Group,string,Data Group,,,true,pg1,false,true,true 
2. Department,Department,string,,,,true,pg1,false,true,true 
3. test_property,Test Property,string,Test Description,TestGroup,,true,pg7,false,false,true 
4. test_property2,Test Property 2,string,Test Property 2 Description,TestGroup,,true,pg4,fa
    lse,true,false