Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Manage data sources

With Pentaho Data Catalog, you can process data from file systems and relational databases.

To process data from these systems, Data Catalog establishes a data source definition. This data source stores the connection information to your sources of data, including their access URLs and credentials for the service user.

NoteFor the latest supported versions refer to the release notes.

The following data sources are supported:

TypeData source
File System
Relational Databases
NoSQL Databases
Data Platforms
Object Stores

Adding a data source

If your role has the Manage Data Sources privilege, you can perform the following steps to create data source definitions.

AWS S3 data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Region

    Geographical location where AWS maintains a cluster of data centers.

    EndpointLocation of the bucket. For example, s3.<region containing S3 bucket>.amazonaws.com
    Access KeyUser credential to access data on the bucket.
    Secret KeyPassword credential to access data on the bucket.
    Bucket Name

    The name of the S3 bucket in which the data resides. For S3 access from non-EMR file systems, Data Catalog uses the AWS command line interface to access S3 data.

    These commands send requests using access keys, which consist of an access key ID and a secret access key. You must specify the logical name for the cluster root.

    This value is defined by dfs.nameservices in the hdfssite.xml configuration file. For S3 access from AWS S3 and MapR file systems, you must identify the root of the MapR file system with maprfs:///.

    PathDirectory where this data source is included.
  5. Click Test Connection to test your connection to the specified data source.

  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

  8. Click Scan Files. This process loads files and folders to the system.

    You can monitor the status of the file scan on the Workers page.

Denodo data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration methodBy default, it is URI, as the connection is configured using a URL.
    • Username and Password: Credentials associated with your Denodo account to log in and access the Denodo environment.
    • URI: URIs are used to access and manage various objects and services within the Denodo environment. For example, the URI would look like vdp://denodo-server:9999/data-sources/MyDatabase
    • Database Name: The name of the data sources within the Denodo environment that contain the data you want to access.
    DriverSelect an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.

    To upload a new driver, click Manage Drivers, and click Add New, upload the driver, and then click Add Driver.

  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

DynamoDB data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    RegionGeographical location where AWS maintains a cluster of data centers.
    Access Key and Secret KeyAWS Access Key ID and Secret Access Key that are used for authentication and authorization when interacting with DynamoDB.
  5. Click Test Connection to test your connection to the specified data source.

  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

  8. Click Scan Files. This process loads files and folders to the system.

    You can monitor the status of the file scan on the Workers page.

HCP data source

You can add data to Data Catalog from Hitachi Content Platform (HCP) by adding HCP as data source.

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Region

    Geographical location where HCP maintains data centers.

    EndpointLocation of the bucket. hostname or IP address
    Access KeyThe access key of the S3 credentials to access the bucket.
    Secret KeyThe secret key of the S3 credentials to access the bucket.
    Bucket Name

    The name of the S3 bucket in which the data resides.

    PathDirectory where this data source is included.
  5. Click Test Connection to test your connection to the specified data source.

  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

  8. Click Scan Files. This process loads files and folders to the system.

    You can monitor the status of the file scan on the Workers page.

Local File System data source

You can add data to Data Catalog from your local file system by adding Local File System as a data source.

Before you begin

To access files on your local system, make the following changes to the vendor/docker-compose.yml file to ensure that it is accessible by the ws_default container.

  1. Open the vendor/docker-compose.yml file and add the following lines under the ws_default service.
    services:
      ws_default:
        volumes:
          - /my/path/to/file:/tmp/my-path

    You can also include a remote file share as a Local File System. As an example, refer to the following code snippet for adding cifs-share to the Local File System.

    services:
      ws_default:
        volumes:
          - cifs-share:/cifs-share
          
          // Following are optional settings to add cifs share to local file system
          - cifs-share:/cifs-share //Remote file share
    volumes:
      cifs-share:
        driver_opts:
          type: cifs
          o: "username=<user1>,password=<password>,file_mode=0777,dir_mode=0777"
          device: "<IP Address>”
    
  2. Save changes.
  3. Restart the ws_default container for the changes to take effect.
Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. In the Path field, specify the path to your local file system.

  5. Click Test Connection to test your connection to the specified data source.

  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

  8. Click Scan Files. This process loads files and folders to the system.

    You can monitor the status of the file scan on the Workers page.

Microsoft Access data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration Method: Select Credentials or URI as a configuration method.
    Configuration Method: Credentials
    • Username/Password: Credentials that provide access to the specified database.
    • Database File:
    Configuration Method: URI
    • Username/Password: Credentials that provide access to the specified database.
    • URI: For example, URL would look like jdbc: postgresql://localhost:<port_no>/.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

Microsoft SQL data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration Method: Select Credentials or URI as a configuration method.
    Configuration Method: Credentials
    • Username/Password: Credentials that provide access to the specified database.
    • Host: The address of the machine where the Microsoft SQL database server is running. It can be an IP address or a domain name.
    • Port: The port number on which the Microsoft SQL server is listening for incoming connections. The default port is 5432.
    Configuration Method: URI
    • Username/Password: Credentials that provide access to the specified database.
    • Service URI: For example, URL would look like Server=myServerAddress;Database=myDatabase;User Id=myUsername;Password=myPassword;Port=1433;Integrated Security=False;Connection Timeout=30;.
    DriverSelect an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.
    Database NameThe name of the database within the Microsoft SQL server that you want to connect with.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

MySQL data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  5. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

NFS data source

Network File System (NFS) is a distributed file system protocol that enables remote file access over Unix and Linux networks.You can create a data source using the NFS with the local file system path by mounting data as a local file system to either the remote or local agent. Furthermore, you can easily add data to Data Catalog from Hitachi Network Attached Storage (HNAS) and NetApp data storage.
This protocol uses a client-server model where the server provides the shared file system and the client mounts the file system to access the shared files as if they were on a local disk. You can add data to the Data Catalog from any file-sharing network system if it is transferrable via the Network File System (NFS).

Perform the following steps to add NFS as a data source

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration methodBy default, it is URI.
    • URI: URIs are used to identify and locate resources on the internet or within a network. For example, the URI would look like nfs://server.example.com
    • Path: NFS path to access the data source. For example the path would look like nfs:/share/data
  5. Click Test Connection to test your connection to the specified data source.

  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

  8. Click Scan Files. This process loads files and folders to the system.

    You can monitor the status of the file scan on the Workers page.

Oracle data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration Method: Select Credentials,URI , or SSL as a configuration method.
    Configuration Method: Credentials
    • Username/Password: Credentials that provide access to the specified database.
    • Host: The address of the machine where the Oracle database server is running. It can be an IP address or a domain name.
    • Port: The port number on which the Oracle server is listening for incoming connections.
    Configuration method: URI
    • Username/Password: Credentials that provide access to the specified database.
    • Service URI: For example, URL would look like jdbc:oracle:thin:@oracle.example.com:1521/mydb.
    Configuration method: SSLSelect on of the following option, and enter the required encryption details.
    • Encryption only
    • Encryption with Server and Client Authentication
    DriverIf you are selecting configuration method as Credentials or URI, then you must use the driver. Select an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

PostgreSQL data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration Method: Select Credentials or URI as a configuration method.
    Configuration Method: Credentials
    • Username/Password: Credentials that provide access to the specified database.
    • Host: The address of the machine where the PostgreSQL database server is running. It can be an IP address or a domain name.
    • Port: The port number on which the PostgreSQL server is listening for incoming connections. The default port is 5432.
    Configuration Method: URI
    • Username/Password: Credentials that provide access to the specified database.
    • Service URI: For example, URL would look like jdbc: postgresql://localhost:<port_no>/.
    DriverSelect an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.
    Database NameThe name of the database within the PostgreSQL server that you want to connect with.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

SAP HANA data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration Method: Select Credentials or URI as a configuration method.
    Configuration Method: Credentials
    • Username/Password: Credentials that provide access to the specified database.
    • Host: A physical or virtual machine (server) where an instance of SAP HANA is installed and running. It can be an IP address or a domain name.
    • Port: The port number on which the SAP HANA database server is listening for incoming connections.
    Configuration Method: URI
    • Username/Password: Credentials that provide access to the specified database.
    • URI: For example, URL would look like jdbc: sap://localhost:<port_no>/<database_name>?user=<user>&password=<password>.
    DriverSelect an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.
    Database NameThe name of the database within the SAP HANA server that you want to connect with.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

SMB/CIFS data source

Server Message Block (SMB) and Common Internet File System (CIFS) are Windows filesharing protocols used in storage systems. You can add data to Data Catalog from a filesharing protocol CIFS or SMB to the remote agent or local agent, thereby enabling the creation of a data source as CIFS or SMB with the local file system path.

This protocol uses a client-server model where the server provides the shared file system and the client mounts the file system to access the shared files as if they were on a local disk. You can add data to the Data Catalog from any file-sharing network system if it is transferrable via the Server Message Block (SMB) and Common Internet File System (CIFS).

Perform the following steps to add SMB/CIFS as a data source

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration methodBy default, it is URI.
    • URI: URIs are used to identify and locate resources on the internet or within a network. For example, the URI would look like smb/cifs://server.example.com
    • Path: NFS path to access the data source. For example the path would look like smb/cifs://server:/path/to/resource
  5. Click Test Connection to test your connection to the specified data source.

  6. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  7. Click Create Data Source to establish your data source connection.

  8. Click Scan Files. This process loads files and folders to the system.

    You can monitor the status of the file scan on the Workers page.

Snowflake data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration methodBy default it is credentials as the configuration method:
    • Username/Password: Credentials that provide access to the specified database.
    • Host: The address of the machine where the snowflake database server is running. It can be an IP address or a domain name.
    • Database Name: The name of the database within the snowflake that you want to connect with.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

Vertica data source

Perform the following steps to identify your data source within Data Catalog:

Procedure

  1. Click Management in the left toolbar.

    The Manage Your Environment page opens.
  2. In the Resources tile, click Add Data Source.

    The Create Data Source page opens.
  3. Specify the following basic information for the connection to your data source.

    NoteData Catalog encrypts your data source connection details, such as user name and password, before storing them.
    FieldDescription
    Data Source NameSpecify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.
    NoteNames must start with a letter, and must contain only letters, digits, and underscores. Spaces in names are not supported.
    Data Source ID (Optional)Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.
    NoteYou cannot modify Data Source ID for this data source after you specify or generate it.
    Description (Optional)Specify a description of your data source.
    Data Source TypeSelect the type of your data source. Data Catalog then prompts you to specify additional connection information based on the file system or database type you are trying to access.
  4. After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

    FieldDescription
    Affinity

    This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

    Configuration Method: Select Credentials or URI as a configuration method.
    Configuration Method: Credentials
    • Username/Password: Credentials that provide access to the specified database.
    • Host: A physical or virtual machine (server) where an instance of the Vertica database software is installed and running. It can be an IP address or a domain name.
    • Port: The port number on which the Vertica server is listening for incoming connections.
    Configuration Method: URI
    • Username/Password: Credentials that provide access to the specified database.
    • URI: For example, URL would look like jdbc:vertica://<hostname>:<port>/<database>?user=<username>&password=<password>.
    DriverSelect an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.
    Database NameThe name of the database within the Vertica server that you want to connect with.
  5. Click Test Connection to test your connection to the specified data source.

    NoteBefore you finalize and save your new data source configuration, you need to perform a process called Ingest Schemas. This process loads fundamental information about the database schema and related metadata into the system.
  6. Click Ingest Schema, select the schemas, and then click Ingest Schemas.

    NoteAlthough you can select all schemas, it is a best practice to avoid selecting certain system-related schemas that are unnecessary for your needs.
  7. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  8. Click Create Data Source to establish your data source connection.

Edit a data source

You can edit a data source as needed.

Perform the following steps to edit a data source:

Procedure

  1. Click Management in the left navigation menu and click Data Sources.

  2. Locate the data source that you want to edit and then click the View Details (>) icon at the right end of the row for the data source.

    The Edit Data source page opens.
  3. Edit the fields, then click Test Connection to verify your connection to the specified data source.

  4. Click Save Data Source.