Migration utility
The Lumada Data Catalog's migration utility functions like a data model mapper that you can use to export and import the following items:
- In environments with different data mappings, you can export or import the basic definitions of Data Catalog assets or the data model.
- In environments with matched data mappings, you can export or import most of the asset metadata.
The following metadata can be exported and imported:
- Data source
- Roles
- User
- Tag domain
- Tags
- Tag associations
- Reviews
- Field comments or descriptions
- Virtual folder
- Data objects
- Datasets
- Custom properties
Migration utility role-based access
The migration utility checks file-level access and role permissions when exporting or importing metadata. For the export utility to work as intended, the user that is defined in the configuration file requires the following permissions and access:
For exporting metadata
- Read access to the data for the resources being exported
- View access for assets to be exported, for example:
- Manage Datasets permission to export a dataset
- View Tag Domain permission to export the tag domain
For importing metadata
- View Tag Domain permission for the domain into which tags are being imported
- Manage permissions for Data Catalog assets to be imported, for example:
- Manage Tag Domain permission to import the tag domain
- Manage Tags permission to import tags
Export metadata
Procedure
From the command prompt, navigate to the location of the LDC Application Server installation, which is typically /opt/ldc/app-server.
Run the following command to encrypt the password of the user that is exporting the data, which should be the same user that is specified in the config.properties file.
Make sure that the user running the utility has execute permissions for the encrypt command.
Enter the following command:
<APP-SERVER-HOME>$ bin/ldc-util encrypt
The utility prompts for the password.Enter the text password to be encrypted.
APP-SERVER-HOME$ bin/ldc-util encrypt Enter text: enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n)
NoteUse the entire string as the encrypted password. If your password contains special characters (such as $ or #), enclose the password in single quotation marks. Example:enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n)
APP-SERVER-HOME$ bin/ldc-util encrypt Enter text:<'ldc$LinData'>
Edit the
APP-SERVER-HOME/contrib/config.properties
file and update the following fields to reflect your environment values:#modify this configuration to match to your environment. url=<Data Catalog URL> (e.g. http://<hostname>:8082/api/v2) user=<Data Catalog Administrator username> (e.g. ldcuser) #pl encrypt password using Data Catalog command encrypt on command line and put it here... password=<Data Catalog Administrator Encrypted Password obtained using the encrypt command in ldc-util script> sourceMapping=false importVersion=40 exportReviewsAndFavorites=true confPath=APP-SERVER-HOME/conf/configuration.json
Save and exit.
Make sure you are in the LDC Application Server installation directory.
Run one of the following export commands, providing the path to the newly exported metadata file (for example, /opt/ldc/to_import/):
For HTTP
<APP-SERVER-HOME>$ java -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ \ com.hitachivantara.datacatalog.tools.ie.Utils \ -c contrib/config.properties \ -fp <name prefix string for saved exports> \ -a export \ -S -T 60
For HTTPS
<APP-SERVER-HOME>$ java -Djavax.net.ssl.trustStore=<Path to Java key store file> \ -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ \ com.hitachivantara.datacatalog.tools.ie.Utils \ -c contrib/config.properties \ -fp <name prefix string for saved exports> \ -a export \ -S -T 60
Parameter Description -fp <name prefix string> File prefix. The string passed in this parameter is used as the prefix for all the files exported using the export tool. (Required) -c Configuration file location. Path to the config.properties
file used by the migration tool. (Required)-a Action. Specify export
orimport
. (Required)-S Suggested association. All suggested tag associations are included in exported CSV files. -T <value> is an option for -S that divides suggested associations based on tag association weight for increased accuracy.
Example:-S -T 60 means suggested associations with > 60% confidence are included. If you omit -S and -T, it means only accepted associations are included in the exported CSV file.
NoteThe parameter -T is only valid in the context of -S, so if you provide -T without -S, an error occurs.The utility prompts for a choice to export different entities as shown in the following sample output:
$ log4j:WARN No such property [datePattern] in org.apache.log4j.RollingFileAppender. INFO | 2020-04-05 20:46:14,362 | KeyStoreManager [main] - KeyStore file path : /opt/ldc/app-server/jetty-distribution-9.4.18.v20190429/ldc-base/etc/keystore INFO | 2020-04-05 20:46:14,401 | KeyStoreUtility [main] - Loading the existing keystore... INFO | 2020-04-05 20:46:14,913 | Utils [main] - File name is contrib/60xExport INFO | 2020-04-05 20:46:14,913 | Utils [main] - Server is http://hdp265.ldc.com:8082/api/v2 INFO | 2020-04-05 20:46:14,913 | Utils [main] - User is sam_admin INFO | 2020-04-05 20:46:14,914 | Utils [main] - Action is export INFO | 2020-04-05 20:46:14,914 | Utils [main] - Source mapping is false INFO | 2020-04-05 20:46:14,927 | Utils [main] - Suggested Tags is not set Choose from the options below: 1) Export Sources 2) Export Folders 3) Export Domains 4) Export Roles 5) Export Users 6) Export Tags 7) Export Associations 8) Export Datasets 9) Export Custom Properties 10) Export Data Objects 11) Export Comments/Description 12) Export All the above For multiple options give comma separated values. e.x. 1,4,6 Enter input[12]: 3,6,7,9,11 INFO | 2020-04-05 20:47:28,434 | Export [main] - Exporting domains... INFO | 2020-04-05 20:47:31,407 | Export [main] - Number of domains exported 10 in PT2.971S INFO | 2020-04-05 20:47:31,407 | Export [main] - Exporting tags... INFO | 2020-04-05 20:47:31,842 | Export [main] - Number of tags exported 112 in PT0.435S INFO | 2020-04-05 20:47:31,842 | Export [main] - Exporting associations... INFO | 2020-04-05 20:47:32,265 | Export [main] - Fetched 0 associations for Built-in_Tags:3-Letter_Country_Code INFO | 2020-04-05 20:47:32,414 | Export [main] - Fetched 0 associations for Built-in_Tags:Country INFO | 2020-04-05 20:47:32,617 | Export [main] - Fetched 1 associations for Built-in_Tags:Email INFO | 2020-04-05 20:47:33,864 | Export [main] - Fetched 6 associations for Built-in_Tags:First_Name INFO | 2020-04-05 20:47:34,576 | Export [main] - Fetched 0 associations for Built-in_Tags:Global_City INFO | 2020-04-05 20:47:34,619 | Export [main] - Fetched 0 associations for Built-in_Tags:IP_Address INFO | 2020-04-05 20:47:35,005 | Export [main] - Fetched 5 associations for Built-in_Tags:Last_Name
Enter the number(s) corresponding to the items you want to export. You can provide multiple options as a comma-separated series of choices, as shown in the previous sample output.
Results
exported_<entity>.csv
For example:
exported_associations.csv exported_comments.csv exported_customproperties.csv exported_datasets.csv exported_domains.csv exported_folders.csv exported_roles.csv exported_sources.csv exported_tags.csv exported_users.csv
Import metadata
To import users, roles, data sources, tag associations, and user reviews, you must maintain import-export compatibility:
- Use only the CSV metadata files that were exported earlier from Data Catalog.
- Use the same version of the Migration utility that was used for exporting the metadata files.
The import utility is part of the ldc-tools-xxxx.x.jar file and a typical workflow to import metadata is similar to the following steps:
Procedure
From the command prompt, navigate to the LDC Application Server installation directory, typically /opt/ldc/app-server.
Run the following command to encrypt the password of the Data Catalog Administrator:
Make sure that the user running the utility has execute permissions for the encrypt command.
Enter the following command:
<APP-SERVER-HOME>$ bin/ldc-util encrypt
The utility prompts for the password.Enter the text password to be encrypted.
<APP-SERVER-HOME>$ bin/ldc-util encrypt Enter text: enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n)
NoteUse the entire string
enc(rO0ABXcIAAAMX+AYIVOACAAB4cAAAABBEE7Z8Cn8miwpTRz7IgD35cHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n)
as the encrypted password.If your password contains special characters (such as $ or #), enclose the password in single quotes.
APP-SERVER-HOME$ bin/ldc-util encrypt Enter text:<'ldc$LinData'>
Edit the
APP-SERVER-HOME/contrib/config.properties
file and update the following fields to reflect your environment values:#modify this configuration to match to your environment. url=<Data Catalog URL> (e.g. http://<hostname>:8082/api/v2) user=<Data Catalog Administrator username> (e.g. lcduser) #pl encrypt password using Data Catalog command encrypt on command line and put it here... password=<Data Catalog Administrator Encrypted Password obtained using the encrypt command in ldc script> sourceMapping=false importVersion=40 exportReviewsAndFavorites=true confPath=APP-SERVER-HOME/conf/configuration.json
Save and exit.
Make sure you are in the LDC Application Server installation directory.
Run one of the following import commands, providing the path to the previously exported metadata file (for example,
APP-SERVER-HOME/migration-utility-export/2019-3Export_sources.csv)
in the command:For HTTP
<APP-SERVER-HOME>$ java -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ com.hitachivantara.datacatalog.tools.ie.Utils -c contrib/config.properties -fp /home/ldcuser/migration-utility-export/60xExport_sources.csv -a import
For HTTPS
<APP-SERVER-HOME>$ java -Djavax.net.ssl.trustStore=<Path to Java key store file> \ -cp contrib/ldc-tools-6.0.1.jar:/opt/ldc/app-server/conf/ com.hitachivantara.datacatalog.tools.ie.Utils -c contrib/config.properties -fp /home/ldcuser/migration-utility-export/60xExport_sources.csv -a import
Results
The metadata is imported into Data Catalog.
Metadata files
The data is formatted inside the CSV file. Make sure the required tag domains are available in Data Catalog. If not, then import the tag domains first and then import the tags.
Data source
The data source import file should have its columns in the following order:
- SourceName
- SourceDescription
- SourceType
- SourcePath
- Source_HDFS_URL
- Source_HIVE_URL
- Source_JDBC_URL
- Source_JDBC_USER
- Source_JDBC_PASSWD
- Source_JDBC_DRIVER
Use the following samples as a guide.
Sample JDBC data source import file:
1. "OracleHR","Corporate Oracle HR","jdbc","/HR",,,"jdbc:oracle:thin:@172.31.38.71:1521:XE","h r","rO0ABXcIAAABYv7KAz5zcgAZamF2YXguY3J5cHRvLlNlYWxlZE9iamVjdD42PabDt1RwAgAEWwANZW5jb2RlZFB hcmFtc3QAAltCWwAQZW5jcnlwdGVkQ29udGVudHEAfgABTAAJcGFyYW1zQWxndAASTGphdmEvbGFuZy9TdHJpbmc7TA AHc2VhbEFsZ3EAfgACeHBwdXIAAltCrPMXAYIVOACAAB4cAAAABDFgkyIzH3cWC5fAYIVOACAAB4cAAAABDFgkyIzH3 cWC5fr32fVXHycHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5nr32fVXHycHQAFEFFUy9FQ0IvUEtDUzVQYWRkaW5n","ora cle.jdbc.OracleDriver"
Sample HIVE data source import file:
1. "demoHive","Claims HIVE database",source_type_hive,"/default",,"jdbc:hive2:// ip-172-31-31 -141.ec2.internal:10000",,,,"org.apache.hive.jdbc.HiveDriver"
Sample HDFS data source import file:
1. "sfo-airport2","Local HDFS sfo",source_type_hdfs,"/sfo-airport3","hdfs:// ip-172-31-31-141 .ec2.internal:8020",,,,,
After importing data sources, make sure to:
- Create and connect agents.
- Profile the data before importing any tag associations or data objects.
Roles
The columns for the roles metadata import file are in the following order:
- RoleName
- RoleDescription
- RoleAccessLevel
- RoleVirtualFolders
- RoleDomains
- MetadataAccess
- DataAccess
- MakeDefaultRole
- AllowJobExecution
Sample role import file:
1. "Analyst","Analyst Role",Analyst,"[]","[]",”FALSE”, “METADATA”, “FALSE”, “TRUE” 2. "SFOAnalyst","Curates tags and discovers data in the SFO Open Dataset.",Analyst,"[SFOAirpo rt]","[Aviation]", ”TRUE”, “NATIVE”, “FALSE”, “TRUE”
User
Before importing any users, make sure you have imported all custom roles that are assigned to the users.
The columns for the user metadata import file are in the following order:
- USER_NAME
- USER_DESCRIPTION
- USER_ROLE
- USER_FAVORITES: Include favorites in an array with each favorite separated with "::".
- ReviewSource
- ReviewResource
- USER_REVIEWS: Include reviews in an array with each review separated with "::".
- ReviewSource
- ReviewResource
- ReviewRating
- ReviewTitle
- ReviewDescription
- USER_ATTRIBUTES: The following attributes are to be passed as a comma-separated list enclosed in '{}'.
- firstname
- lastname
- user_last_login [Timestamp]
Sample user import file:
1. "lara_analyst","","[SFOAnalyst]","[]","[]","{firstName=Lara,lastName=Analyst,email=lara@ xyz.com}" 2. "ldcuser","","[Guest]","[]","[]","{TimeStamp}"
Tag domain
The columns for the tag domain metadata import file are in the following order:
- DomainName
- DomainDescription
- DomainColor
Sample domain import file:
1. "Aviation","SFO open data. Air traffic landing and passenger statistics, tail (aircrafts) numbers and models","#50AAE7" 2. "Built-in_Tags","Built-in_Tags","#30BAE7"
Tags
The columns for the tags metadata import file are in the following order:
- TAG_DOMAIN
- TAG_NAME
- TAG_DESCRIPTION
- TAG_METHOD
- TAG_REGEX
- TAG_MINSCORE
- TAG_MINLEN
- TAG_MAXLEN
- TAG_ENABLED
- TAG_LEARNING_ENABLED
- EXTERNAL_ID
- EXTERNAL_SOURCE_NAME
- SYNONYMS
Use the following samples as guides.
Sample REGEX tag import file
1. "Built-in_Tags","Email","Email",REGEX,"[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:.[a-zA-Z0-9!#$%&' *+/=?^_`{|}~-]+)*@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?.)+[a-zA-Z0-9](?:[a-zA-Z0-9-] *[a-zA-Z0-9])?",0.4,5,64,true,true,,,""
Sample Value tag import file
1. "Built-in_Tags","First_Name","First_Name",VALUE,,0.4,0,0,true,true,,,"" 2. "Sales","Customers","Tag for Customers",VALUE,,0.6,0,0,true,true,,,"" 3. "Sales","Location",,VALUE,,0.8,0,0,true,true,,,"" 4. "Sales","Orders","Tag for Orders",VALUE,,0.0,0,0,true,true,,,"" 5. "Sales","Regions","Tag for Regions",VALUE,,0.8,0,0,true,true,,,"" 6. "Sales","Sales","Tag for Sales",VALUE,,0.6,0,0,true,true,,,"" 7. "Sales","Stores",,VALUE,,0.8,0,0,true,true,,,
You can use the tag name to show hierarchy in tags, using the dot notation for naming the tag as follows:
1. "Sales","Orders.Jan.Week1","Tag for Orders",VALUE,,0.0,0,0,true,true,,,""
The example above identifies a tag Week1
with parent tag Jan
and grandparent tag Orders
under the Sales
domain.
Tag associations
The columns for the tag associations metadata import file are in the following order:
- TagDomain
- TagName
- DataSourceName
- ResourceName
- FieldName
- TagPropagation: Possible values: SEED or OTHER. SEED is used in Discovery Tags (Marked with dot(.) in front of tag name).
- TagAssociationState
Sample tag associations import file
1. "Sales","COGS","HIVE","/foodmart.sales_fact_dec_1998","store_cost",SAMPLE,ACCEPTED 2. "Global","Customer_Master_GDPR","HIVE","/foodmart.customer","customer_id",OTHER,REJECTED 3. "Sales","Sales.Sales_location","HDFS","/user/ldcuser/Pass1/raw/nyc_open/r at_sightings_nyc_2014.csv","LOCATION_TYPE",SAMPLE,ACCEPTED 4. "Sales","Sales.Sales_location","MySQL","/HR.employee","location",SAMPLE,ACCEPTED</div>
Reviews
The columns for the reviews metadata import file are in the following order:
- USER_NAME
- USER_DESCRIPTION
- USER_ROLE (enclose roles in array of strings)
- USER_FAVORITES: Include favorites in an array, with the following parts of each favorite separated with "::".
- Review/Source
- Review/Resource
- USER_REVIEWS: Include the following parts of a review in an array, with the following parts of each review separated with "::".
- Review/Source
- Review/Resource
- Review/Rating
- Review/Title
- Review/Description
- USER_ATTRIBUTES: Pass the following attributes in comma-separated list enclosed in '{}'.
- firstname
- lastname
- user_last_login [Timestamp]
Use the following sample as a guide.
Sample CSV file for user reviews:
1. ldcuser,Predefined administrator user,"[Administrator]","[]","[MyHIVE::/default.tabl e3::4::Database_review:: This is one of the nicest database, 2. MyHIVE::/default.table3::4::::]","{firstName=LDC, lastName=Service, email=build+ldcuser@wldata.com}"
Field comments or descriptions
The Export Comments/Description option can export the custom field comments, label, and description as a _comments.csv file. file.
The columns of the field comments or descriptions metadata import file are in the following order:
- VFName
- Resource Path
- Field name
- Field Label
- User Comment
Use the following sample as a guide.
Sample comments import file
DO_Test,/user/EmpSkill_Analytics/Employee.csv,employee_id,EMPID,This is a sample comment to test re-profiling effects of user comments
Virtual folder
The columns for the virtual folder metadata import file are in the following order:
- VFName
- VFDescription
- VFParent
- VFPath
- VFPInclude
- VFPathExclude
- VF_IsRoot
Use the following sample as a guide.
Sample virtual folder import file
"foodmart4","foodmart4","foodMart","/foodmart","(.customer_s1.*|.sales.*)",,false "foodMartChild","food mart","foodMart","/foodmart",".customer.*",,false
Data objects
- Imported the corresponding data sources and virtual folders
- Profiled the resources
- Imported the tag domains, tags, and tag associations
The column sequence in the data object metadata import file is a multi-level nested sequence as follows:
- Data Object Name
- Description
- Join Conditions: Multiple join conditions are separated with '
:#;
' as in:join_condition1:#;join_condition2:#;join_condition3
- Where each join condition is constructed with:
- left_field_info
- right_field_info
- join_info
with each entity in
join_condition
separated asleft_field_info;;right_field_info##join_info
- Each
join_info
is built as follows:- Left Column info
- Right Column info
- Join Cardinality
- Join Order
- Join Operation
with each entity in
join_info
separated as follows:left_col_info;;right_col_info##Join_Cardinality,Join_Order,Join_Operation
- Each column info is built as follows:
- Column Origin
- Column Data Source Name
- Column path
- Column resolved name
- Column name
with each entity in
col_info
separated with a comma (,
):Left_col_origin,Left_col_source_name,Left_col_path,Left_col_res_name,Left_col_name
- Where each join condition is constructed with:
Use the following sample as a guide.
Sample data objects import file
1. ProjectDO,Lara's Projects,HdfsDS::HdfsDS::/data/DocData/Lara/DO/Data/ProjectReq.csv::Proje ctReq.csv::man_id;;HdfsDS::HdfsDS::/data/DocData/Joe/DO_Demo/UseCase1/Data/Employ.csv::Emp loy.csv::emp_id##MANY_MANY::ALTERNATE::JoinOpEQ
Datasets
The columns for the datasets metadata import file are in the following order:
- Dataset Name
- Description
- ID
- Schema Version
- Virtual Folder
- Path Specifications in the format
- Exclude Regex Pattern
- Include Regex Pattern
- Source Path
With each entity separated by
::
(two colons) and multiple path specifications separated by;;
(two semicolons). For example:<Exclude Regex1>::<Include Regex1>::<Source Path1>;;<Exclude Regex2>::<Include Regex2>::<Source Path2>
- Fields
- Origin
Use the following sample as a guide.
Sample dataset import file
1. Dset1,,Dset1,10,HdfsDS,::.*::/data/data_NYPD;;,,HdfsDS 2. EmpDataSet,,EmpDataSet01,10,HdfsDS,::.*::/data/DocData;;,,HdfsDS 3. Sample,Sample Description,ID:01,9,DocDataHdfs,.*json::.*csv::/data/DocData/Joe;;.*xml::.*j son::/data/DocData/Lara;;,,HdfsDS
Custom properties
The columns for the custom properties metadata import file are in the following order:
- Name
- Display Name
- Data Type
- Description
- Property Group
- Property Group Description
- Is property custom property
- Access Level, where each join condition is constructed as:
- pg1 = View: Everyone/Set: Analyst & Higher
- pg2 = View: Everyone/Set: Steward & Higher
- pg3 = View: Everyone/Set: Admin Only
- pg4 = View: Everyone/Set: Nobody
- pg5 = View: Analyst & Higher/Set: Admin Only
- pg6 = View: Steward & Higher/Set: Admin Only
- pg7 = View: Admin Only/Set: Admin Only
- Is property case sensitive
- Searchable
- Facetable
Use the following sample custom property import file as a guide.
1. DataGrp,Data Group,string,Data Group,,,true,pg1,false,true,true 2. Department,Department,string,,,,true,pg1,false,true,true 3. test_property,Test Property,string,Test Description,TestGroup,,true,pg7,false,false,true 4. test_property2,Test Property 2,string,Test Property 2 Description,TestGroup,,true,pg4,fa lse,true,false