Skip to main content

Pentaho+ documentation has moved!

The new product documentation portal is here. Check it out now at docs.hitachivantara.com

 

Hitachi Vantara Lumada and Pentaho Documentation

Installing standalone Solr

Parent article

These Solr installation instructions apply to installing Solr on the following platforms:

  • Azure HDInsight
  • CDH/CDP
  • EMR
  • HDP
  • MapR

Lumada Data Catalog is certified for Solr 8.4.1and requires it to be installed separately. Versions higher than 8.4.1 need to be certified before usage. Contact support at Hitachi Vantara Lumada and Pentaho Support Portal for compatibility and certification requests. For information about downloading and installing Apache Solr 8.4.1, see the Solr Guide documentation. For other supported Solr versions, see the Solr Support Matrix. A best practice is to configure Solr in SolrCloud mode.

Prerequisites

Before you install Solr, you must meet the following prerequisites:
  • Identify the location or partition on your cluster where you want to install Solr. The default is /opt/solr.
  • You must have superuser access to create a Solr user.
  • If you are installing on a Kerberized environment, you must have keytabs for Zookeeper and Solr that are created using the principals as primary/instance@realm. For example, zookeeper/hostname@hitachivantara.COM

Installing Solr

Solr should be installed in a partition or location that is not the root partition or location. A best practice is to use /mnt or /mnt1 for EMR installations and /opt for MapR installations. The location you choose must be available on each Solr server node. You must first either configure a service user for Solr to do the installation or perform the installation as a user with permissions to install applications on the cluster.

Perform the following steps to download Solr 8.4.1:

Procedure

  1. Download Solr 8.4.1 from the Apache archives https://archive.apache.org/dist/lucene/solr/8.4.1/.

  2. Create a Solr user using the command, $ sudo useradd solr

  3. Expand the Solr 8.4.1 installation package on the node to be the first location for Solr. Enter the following commands:

    $ cd /opt
    $ wget https://archive.apache.org/dist/lucene/solr/8.4.1/solr-8.4.1.tgz
    $ tar -xf solr-8.4.1.tgz
  4. Change ownership to Solr user using the command, $ sudo chown -R solr:solr solr-8.4.1

Configuring Solr for HDFS storage

To configure Solr for HDFS storage, you must configure a storage location, and then generate configuration files. If you have chosen to use local storage or you are using MapR, see Configuring standalone Solr for local storage.

Configure the HDFS storage location

The default HDFS storage location for Solr is either the Solr user directory /user/solr or the Solr root directory /solr. You must be the HDFS service user or have sudo permissions to configure the storage location.

Perform the following steps to create a storage location:

Procedure

  1. Create a storage location on HDFS using the following commands:

    $ sudo -u hdfs hadoop fs -mkdir /user/solr
    $ sudo -u hdfs hadoop fs -chown solr /user/solr
    
  2. Switch the user to the Solr user by entering the following command: $ sudo su - solr

  3. Copy the default configuration files on the first Solr node using the following command:

    $ cd <Solr Install Dir>   
    $ cp -r server/solr/configsets/_default server/solr/configsets/wdconfig
    NoteThe default installation directory is /opt/solr-8.4.1.

Generate configuration files for HDFS storage

You must modify two default configuration files to configure the storage: the managed-schema file and the solrconfig.xml file.

Perform the following steps to generate configuration files for Data Catalog:

Procedure

  1. Open the copy of the managed-schema file in the /server/solr/configsets/wdconfig/conf/ directory with any text editor and edit the code as follows:

    1. Add a _root_ fieldType as shown in the following code:

      <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
      <field name="_root_" type="string" indexed="true" stored="false"/>
    2. Comment out the copyField source entry:

      <copyField source="*" dest="_text_"/>
    3. Find the <fieldType name = "text_general" ... element and add the following code below the text_general definition:

      <fieldType name="text_with_special_chars" class="solr.TextField" positionIncrementGap="100"> 
          <analyzer type="index"> 
              <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
              <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> 
              <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> 
              <filter class="solr.LowerCaseFilterFactory"/> 
          </analyzer> 
          <analyzer type="query"> 
              <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
              <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> 
              <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> 
              <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> 
              <filter class="solr.LowerCaseFilterFactory"/> 
          </analyzer> 
      </fieldType>
    4. Save and close the file.

  2. Open the copy of the solrconfig.xml file in the <Solr Install Dir>/server/solr/configsets/wdconfig/conf directory with any text editor and change the code as follows:

    1. Replace the default NRTCachingDirectoryFactory with HdfsDirectoryFactory and update the URL to the HDFS location where the Solr collection is stored, as shown in the code below:

      <directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
         <str name="solr.hdfs.home">hdfs://namenode:8020/user/solr</str>
         <bool name="solr.hdfs.blockcache.enabled">true</bool>
         <int name="solr.hdfs.blockcache.slab.count">1</int>
         <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool>
         <int name="solr.hdfs.blockcache.blocksperbank">16384</int>
         <bool name="solr.hdfs.blockcache.read.enabled">true</bool>
         <bool name="solr.hdfs.blockcache.write.enabled">false</bool>
         <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool>
         <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int>
         <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int>
      </directoryFactory>
    2. Set the lockType to HDFS as shown: <lockType>hdfs</lockType>

    3. Turn off the spellcheck utility as shown: <str name="spellcheck">off</str>

    4. Change the hard autoCommit timeout value to 15000 and the soft autoCommit timeout value to 10000.

  3. Save and close the file.

  4. Restart Solr

Configure a local Solr node storage location

This location must be available on all nodes where Solr is running and is used when starting the Solr server. For HDFS, only configuration files are stored in this location. Data files are stored in HDFS.

Perform the following steps to configure a local Solr node storage location.

Procedure

  1. Use the following commands to set up a local storage location for a Solr node, then copy the Solr configuration files to that location:

    $ mkdir ~/ldc-solr-node
    $ chmod 700 ~/ldc-solr-node
    $ cp <Solr Install Dir>/server/solr/solr.xml ~/ldc-solr-node
    $ cp <Solr Install Dir>/server/solr/zoo.cfg ~/ldc-solr-node
    $ cp -r <Solr Install Dir>/server/solr/configsets/wdconfig ~/ldc-solr-node/
  2. Repeat these commands for all Solr nodes.

    NoteOnly the solr.xml and zoo.cfg files need to be added to the additional nodes. ZooKeeper copies its configuration files to all the nodes listed in the ensemble.

Configuring standalone Solr for local storage

You must modify two default configuration files to configure the storage: the managed-schema file and the solrconfig.xml file.

Perform the following steps to generate configuration files for Data Catalog.

Procedure

  1. Use the following command to switch the user to the Solr user: $ sudo su - solr

  2. Copy the default configuration files in the Solr installation directory using the following command: $ cp -r server/solr/configsets/_default server/solr/configsets/wdconfig

Generate configuration files for local storage

Perform the following steps to generate configuration files for Data Catalog:

Procedure

  1. Open the copy of the managed-schema file in the server/solr/configsets/wdconfig/conf/ directory with any text editor and change the code as follows:

    1. Add a _root_ field type as shown in the following code:

      <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
      <field name="_root_" type="string" indexed="true" stored="false"/>
    2. Comment out the copyField source entry as shown here: <!-- <copyField source="*" dest="_text_"/> -->

    3. Find the <fieldType name = "text_general " ... element and add the following code below the "text_general" definition.

      <fieldType name="text_with_special_chars" class="solr.TextField" positionIncrementGap="100"> 
          <analyzer type="index"> 
              <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
              <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> 
              <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> 
              <filter class="solr.LowerCaseFilterFactory"/> 
          </analyzer> 
          <analyzer type="query"> 
              <tokenizer class="solr.WhitespaceTokenizerFactory"/> 
              <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> 
              <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> 
              <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> 
              <filter class="solr.LowerCaseFilterFactory"/> 
          </analyzer> 
      </fieldType>
  2. Open the copy of the solrconfig.xml file in the server/solr/configsets/wdconfig/conf/ directory with any text editor and change the code as follows:

    1. Use the default directory factory: <directoryFactory name="DirectoryFactory"

      <directoryFactory name="DirectoryFactory"
        class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}">
      </directoryFactory>
    2. Turn off the spellcheck utility: <str name="spellcheck">off</str>

    3. Save and close the file.

  3. Restart Solr.

Configure the local storage location

This location must be available on all nodes where Solr is running. It is used when starting the Solr server.

Perform the following steps to configure a local Solr storage location:

Procedure

  1. Use the following commands to set up a local storage location for a Solr node, then copy the Solr configuration files to that location:

    $ mkdir ~/ldc-solr-node
    $ chmod 700 ~/ldc-solr-node
    $ cp <Solr Install Dir>/server/solr/solr.xml ~/ldc-solr-node
    $ cp <Solr Install Dir>/server/solr/zoo.cfg ~/ldc-solr-node
    $ cp -r <Solr Install Dir>server/solr/configsets/wdconfig ~/ldc-solr-node/
  2. Repeat the previous step for any additional Solr nodes.

    NoteOnly the solr.xml and zoo.cfg files should be added to the additional nodes. ZooKeeper copies its configuration files to all the nodes listed in the ensemble.

Creating the Solr collection

The Data Catalog collection must be created using the configuration files that you updated in either the Generate configuration files for HDFS storage or the Generate configuration files for local storage topic.

Perform the following steps to create a collection.

Procedure

  1. Use the following commands to start the first Solr server listening at port 8983 with a local instance of ZooKeeper at port 9983: $ <Solr Install Dir>/bin/solr start -cloud -p 8983 -z <zookeeper_ensemble> -m 8g -s <local_solr_storage_location>

    NoteYou can get the ZooKeeper ensemble string from Ambari.
  2. To start additional Solr instances on the other Solr nodes, use the same command as in Step 1.

  3. To upload the customized configuration files to ZooKeeper, use the following command: $ <Solr Install Dir>/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper_ensemble> -cmd upconfig -confname wdconfig -confdir <solr_install_location>/server/solr/configsets/wdconfig/conf

  4. Create the collection using the following command: $ <Solr Install Dir>/bin/solr create -c wdcollection -shards 1 -replicationFactor 2 -n wdconfig -p 8983

  5. Validate that the collection is accessible to the Data Catalog service user by logging into the Solr admin page to verify that the collection was created.

Validate Data Catalog Solr collection compatibility

To verify that the Data Catalog can access the Solr collection, check that the fieldType is installed by using the following command: curl ‘http://localhost:8983/solr/wdcollection/schema/fieldtypes/text_with_special_chars’

You can also use the following URL to verify that the Data Catalog can access the Solr collection: http://localhost:8983/solr/wdcollection/schema/fieldtypes/text_with_special_chars.

If you receive a 404 status error that no such path exists, such as in the sample message below, then consult your system administrator or our support team at Hitachi Vantara Lumada and Pentaho Support Portal.

"No such path /schema/fieldtypes/text_with_special_chars"
If the fieldType is successfully installed, the status field returns a 0 (zero).