Installing standalone Solr
These Solr installation instructions apply to installing Solr on the following platforms:
- Azure HDInsight
- CDH/CDP
- EMR
- HDP
- MapR
Lumada Data Catalog is certified for Solr 8.4.1and requires it to be installed separately. Versions higher than 8.4.1 need to be certified before usage. Contact support at Hitachi Vantara Lumada and Pentaho Support Portal for compatibility and certification requests. For information about downloading and installing Apache Solr 8.4.1, see the Solr Guide documentation. For other supported Solr versions, see the Solr Support Matrix. A best practice is to configure Solr in SolrCloud mode.
Prerequisites
- Identify the location or partition on your cluster where you want to install Solr. The default is /opt/solr.
- You must have superuser access to create a Solr user.
- If you are installing on a Kerberized environment, you must have keytabs for Zookeeper and Solr that are created using the principals as
primary/instance@realm
. For example,zookeeper/hostname@hitachivantara.COM
Installing Solr
Perform the following steps to download Solr 8.4.1:
Procedure
Download Solr 8.4.1 from the Apache archives https://archive.apache.org/dist/lucene/solr/8.4.1/.
Create a Solr user using the command,
$ sudo useradd solr
Expand the Solr 8.4.1 installation package on the node to be the first location for Solr. Enter the following commands:
$ cd /opt $ wget https://archive.apache.org/dist/lucene/solr/8.4.1/solr-8.4.1.tgz $ tar -xf solr-8.4.1.tgz
Change ownership to Solr user using the command,
$ sudo chown -R solr:solr solr-8.4.1
Configuring Solr for HDFS storage
To configure Solr for HDFS storage, you must configure a storage location, and then generate configuration files. If you have chosen to use local storage or you are using MapR, see Configuring standalone Solr for local storage.
Configure the HDFS storage location
Perform the following steps to create a storage location:
Procedure
Create a storage location on HDFS using the following commands:
$ sudo -u hdfs hadoop fs -mkdir /user/solr $ sudo -u hdfs hadoop fs -chown solr /user/solr
Switch the user to the Solr user by entering the following command:
$ sudo su - solr
Copy the default configuration files on the first Solr node using the following command:
$ cd <Solr Install Dir> $ cp -r server/solr/configsets/_default server/solr/configsets/wdconfig
NoteThe default installation directory is /opt/solr-8.4.1.
Generate configuration files for HDFS storage
Perform the following steps to generate configuration files for Data Catalog:
Procedure
Open the copy of the managed-schema file in the /server/solr/configsets/wdconfig/conf/ directory with any text editor and edit the code as follows:
Add a _root_ fieldType as shown in the following code:
<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/> <field name="_root_" type="string" indexed="true" stored="false"/>
Comment out the copyField source entry:
<copyField source="*" dest="_text_"/>
Find the
<fieldType name = "text_general" ...
element and add the following code below thetext_general
definition:<fieldType name="text_with_special_chars" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Save and close the file.
Open the copy of the solrconfig.xml file in the <Solr Install Dir>/server/solr/configsets/wdconfig/conf directory with any text editor and change the code as follows:
Replace the default NRTCachingDirectoryFactory with HdfsDirectoryFactory and update the URL to the HDFS location where the Solr collection is stored, as shown in the code below:
<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory"> <str name="solr.hdfs.home">hdfs://namenode:8020/user/solr</str> <bool name="solr.hdfs.blockcache.enabled">true</bool> <int name="solr.hdfs.blockcache.slab.count">1</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">false</bool> <int name="solr.hdfs.blockcache.blocksperbank">16384</int> <bool name="solr.hdfs.blockcache.read.enabled">true</bool> <bool name="solr.hdfs.blockcache.write.enabled">false</bool> <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">16</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">192</int> </directoryFactory>
Set the lockType to HDFS as shown:
<lockType>hdfs</lockType>
Turn off the spellcheck utility as shown:
<str name="spellcheck">off</str>
Change the hard autoCommit timeout value to 15000 and the soft autoCommit timeout value to 10000.
Save and close the file.
Restart Solr
Configure a local Solr node storage location
Perform the following steps to configure a local Solr node storage location.
Procedure
Use the following commands to set up a local storage location for a Solr node, then copy the Solr configuration files to that location:
$ mkdir ~/ldc-solr-node $ chmod 700 ~/ldc-solr-node $ cp <Solr Install Dir>/server/solr/solr.xml ~/ldc-solr-node $ cp <Solr Install Dir>/server/solr/zoo.cfg ~/ldc-solr-node $ cp -r <Solr Install Dir>/server/solr/configsets/wdconfig ~/ldc-solr-node/
Repeat these commands for all Solr nodes.
NoteOnly the solr.xml and zoo.cfg files need to be added to the additional nodes. ZooKeeper copies its configuration files to all the nodes listed in the ensemble.
Configuring standalone Solr for local storage
Perform the following steps to generate configuration files for Data Catalog.
Procedure
Use the following command to switch the user to the Solr user:
$ sudo su - solr
Copy the default configuration files in the Solr installation directory using the following command:
$ cp -r server/solr/configsets/_default server/solr/configsets/wdconfig
Generate configuration files for local storage
Procedure
Open the copy of the managed-schema file in the server/solr/configsets/wdconfig/conf/ directory with any text editor and change the code as follows:
Add a
_root_
field type as shown in the following code:<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/> <field name="_root_" type="string" indexed="true" stored="false"/>
Comment out the copyField source entry as shown here:
<!-- <copyField source="*" dest="_text_"/> -->
Find the
<fieldType name = "text_general " ...
element and add the following code below the "text_general" definition.<fieldType name="text_with_special_chars" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
Open the copy of the solrconfig.xml file in the server/solr/configsets/wdconfig/conf/ directory with any text editor and change the code as follows:
Use the default directory factory:
<directoryFactory name="DirectoryFactory"
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"> </directoryFactory>
Turn off the spellcheck utility:
<str name="spellcheck">off</str>
Save and close the file.
Restart Solr.
Configure the local storage location
Perform the following steps to configure a local Solr storage location:
Procedure
Use the following commands to set up a local storage location for a Solr node, then copy the Solr configuration files to that location:
$ mkdir ~/ldc-solr-node $ chmod 700 ~/ldc-solr-node $ cp <Solr Install Dir>/server/solr/solr.xml ~/ldc-solr-node $ cp <Solr Install Dir>/server/solr/zoo.cfg ~/ldc-solr-node $ cp -r <Solr Install Dir>server/solr/configsets/wdconfig ~/ldc-solr-node/
Repeat the previous step for any additional Solr nodes.
NoteOnly the solr.xml and zoo.cfg files should be added to the additional nodes. ZooKeeper copies its configuration files to all the nodes listed in the ensemble.
Creating the Solr collection
Perform the following steps to create a collection.
Procedure
Use the following commands to start the first Solr server listening at port 8983 with a local instance of ZooKeeper at port 9983:
$ <Solr Install Dir>/bin/solr start -cloud -p 8983 -z <zookeeper_ensemble> -m 8g -s <local_solr_storage_location>
NoteYou can get the ZooKeeper ensemble string from Ambari.To start additional Solr instances on the other Solr nodes, use the same command as in Step 1.
To upload the customized configuration files to ZooKeeper, use the following command:
$ <Solr Install Dir>/server/scripts/cloud-scripts/zkcli.sh -zkhost <zookeeper_ensemble> -cmd upconfig -confname wdconfig -confdir <solr_install_location>/server/solr/configsets/wdconfig/conf
Create the collection using the following command:
$ <Solr Install Dir>/bin/solr create -c wdcollection -shards 1 -replicationFactor 2 -n wdconfig -p 8983
Validate that the collection is accessible to the Data Catalog service user by logging into the Solr admin page to verify that the collection was created.
Validate Data Catalog Solr collection compatibility
To verify that the Data Catalog can access the Solr collection, check that the fieldType is installed by
using the following command: curl
‘http://localhost:8983/solr/wdcollection/schema/fieldtypes/text_with_special_chars’
You can also use the following URL to verify that the Data Catalog can access the Solr collection: http://localhost:8983/solr/wdcollection/schema/fieldtypes/text_with_special_chars.
If you receive a 404 status error that no such path exists, such as in the sample message below, then consult your system administrator or our support team at Hitachi Vantara Lumada and Pentaho Support Portal.