ApacheSolr for Drupal 6 improves on the out-of-the-box search experience for Drupal users. The easiest way to get Solr running on your Drupal web site is to use the hosted service provided by Acquia; it is way easier than running your own Solr. You simply point your queries to their Solr server and you’re done.

For various reasons, you might want to run your own Solr web service on your own machine. In this article, I will walk you through setting up a working Solr installation using Tomcat 6 on CentOS 6. The end result of this walkthrough will be two separate Solr indexes (via two separate Solr web apps) for two different web sites running on a single Tomcat. I will assume that you are using Acquia’s Drupal (which ships with SolrPHPClient).

Warning: This article assumes all services are on a single machine (suitable for a small organization). Running Solr on a separate machine is possible but raises security implications that are outside the scope of this article.

These are the tasks that we will work on:

  1. Set-up Solr
  2. Set-up Tomcat
  3. Tweak CentOS security thinger (SELinux)
  4. Configure Acquia Drupal

Prerequisites

The prerequisites are:

  • CentOS 6 Web Server w/ PHP 5.3, MySQL 5, Tomcat 6, Java 6 (all services running w/ no problemos)
  • Acquia Drupal 6 installed
  • Familiarity with Drupal (basic skills – enabling modules, setting permissions on nodes, etc)
  • Familiarity with Java & Tomcat (basic skills)
  • Familiarity working with Linux in a terminal and vi (intermediate skills)

This is my system (a web server set-up with Anaconda):

# uname -a
Linux templeton.localdomain 2.6.32-71.29.1.el6.i686 #1 SMP Mon Jun 27 18:07:00 BST 2011 i686 i686 i386 GNU/Linux
# cat /etc/redhat-release
CentOS Linux release 6.0 (Final)
# yum list installed | grep mysql-server
mysql-server.i686       5.1.52-1.el6_0.1  @updates
# yum list installed | grep php
php.i686                5.3.2-6.el6_0.1   @updates                              
php-cli.i686            5.3.2-6.el6_0.1   @updates                              
php-common.i686         5.3.2-6.el6_0.1   @updates                              
php-gd.i686             5.3.2-6.el6_0.1   @updates                              
php-mysql.i686          5.3.2-6.el6_0.1   @updates                              
php-pdo.i686            5.3.2-6.el6_0.1   @updates                              
php-pear.noarch         1:1.9.0-2.el6     @anaconda-centos-201106051823.i386/6.0
php-xml.i686            5.3.2-6.el6_0.1   @updates
# java -version
java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.31.b17.el6_0-i386)
OpenJDK Client VM (build 14.0-b16, mixed mode)
# yum list installed | grep tomcat6
tomcat6.noarch          6.0.24-24.el6_0   @updates                              
tomcat6-el-2.1-api.noarch
tomcat6-jsp-2.1-api.noarch
tomcat6-lib.noarch      6.0.24-24.el6_0   @updates                              
tomcat6-servlet-2.5-api.noarch
# /sbin/service tomcat6 status
tomcat6 (pid 1790) is running...                           [  OK  ]
# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted

Notice the hashmark (#) as my terminal prompt. It denotes that I am executing all these commands as root (use ‘su -’). You can also prefix the following commands with ‘sudo’.

Download Solr

Obtain a copy of the Solr tarball from a nearby mirror:

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Select Solr 1.4.1 or the latest recommended Solr:

ie. http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/

I’m using the 54M GZipped Tarball and downloading it using wget:

# wget http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
--2011-09-02 02:06:05--  http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
Resolving apache.sunsite.ualberta.ca... 129.128.5.190
Connecting to apache.sunsite.ualberta.ca|129.128.5.190|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56374837 (54M) [application/x-tar]
Saving to: “apache-solr-1.4.1.tgz”

100%[=============================================>] 56,374,837   261K/s   in 6m 20s  

2011-09-02 02:12:42 (145 KB/s) - “apache-solr-1.4.1.tgz” saved [56374837/56374837]

# tar zxvf apache-solr-1.4.1.tgz
apache-solr-1.4.1/client/
...
# pwd
/root

Copy the Solr package somewhere reasonable like in the /opt folder:

# mkdir -p /opt/solr
# cp -r -p /root/apache-solr-1.4.1 /opt/solr

Link it (the Solr WAR file) to the Tomcat library directory:

# ln -s /opt/solr/apache-solr-1.4.1/dist/apache-solr-1.4.1.war /usr/share/tomcat6/lib/solr.war

In the future, when you upgrade your software, install the Solr upgrade and update the symlink.

Create Solr directories

You need to choose where your Solr indexes will be kept. I put them into the /var directory and that’s where I’m assuming that you will put yours:

# mkdir -p /var/solr
# cp -r -p /opt/solr/apache-solr-1.4.1/example/solr/ /var/solr/
# mv /var/solr/solr /var/solr/example.com
# ls -l /var/solr/example.com/
total 12
drwxr-xr-x. 2 root root 4096 Sep  2 02:44 bin
drwxr-xr-x. 3 root root 4096 Sep  2 02:44 conf
-rw-r--r--. 1 root root 2259 Sep  2 02:44 README.txt

Each domain has its own Solr indexes located in ‘data‘ and its own configuration files in ‘conf‘. There are two optional directories: ‘bin‘ (for replication scripts) and ‘lib‘ (for plugins). Unless your other apps use them, chances are they will be missing.

Install Drupal ApacheSolr plugin protwords, schema and solrconfig

You should already have Acquia Drupal 6 running or Drupal 6 with the ApacheSolr plugin installed. You can copy the ‘protwords.txt’, ‘schema.xml’, and ‘solrconfig.xml’ files from the plugin directory in your respective distribution rather than downloading it, but adjust the paths accordingly.

If you don’t already have the ApacheSolr plugin, get it from the Drupal web site.

http://drupal.org/project/apachesolr

Choose the latest Tarball and use wget to download it to your server, then copy the ApacheSolr configuration files (and backup originals using ‘b’ flag):

# wget http://ftp.drupal.org/files/projects/apachesolr-6.x-1.5.tar.gz
# tar zxvf apachesolr-6.x-1.5.tar.gz
...
# echo 'If ur root cp may give u a scary msg next cmd! Ignore it! Y to overwrite!'
If ur root cp may give u a scary msg next cmd! Ignore it! Y to overwrite!
#
# cp -b -p -f apachesolr/protwords.txt /var/solr/example.com/conf
# cp -b -p -f apachesolr/schema.xml /var/solr/example.com/conf
# cp -b -p -f apachesolr/solrconfig.xml /var/solr/example.com/conf
#
# echo 'Fix group so tomcat can use this!'
Fix group so tomcat can use this!
#
# chown -R root:tomcat /var/solr/example.com
# chmod -R 775 /var/solr/

Warning! If you are not using the Acquia distribution and instead installed the ApacheSolr plugin from the main Drupal web site then you should check that you have a copy of the SolrPhpClient (version r22 – see module README for the gory details). The Acquia distribution includes the correct SolrPhpClient (so you might want to use that instead?).

Make the two Solr instances for the two domains

This walkthrough will create two domains, but you can create more. Using the example.com folder as a prototype, just recursively copy it twice to make two domains (use ‘p’ switch to ‘preserve’ the file permissions and settings):

# cp -r -p /var/solr/example.com /var/solr/www1.kelvinwong.ca
# cp -r -p /var/solr/example.com /var/solr/www2.kelvinwong.ca

If the future, to add a new domain, copy the example.com folder you just made and customize it. This will also work for additional domains that you want to support.

Configure Tomcat 6

It’s All About Context: The Context element represents a web application run within a particular Tomcat virtual host. Each web application is based on a Web Application Archive (WAR) file or a corresponding unpacked directory. The web application used to process each web request is determined by matching the request to the path of each Context. You may define as many Context elements as you wish, but each Context MUST have a unique path. More on Context

Contexts are no longer put into Tomcat’s server.xml file since that file is read only at server start-up. Instead Contexts are placed into a folder hierarchy under CATALINA_BASE (on CentOS 6 it is /etc/tomcat6). Create and configure the following files:

# touch /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml
# touch /etc/tomcat6/Catalina/localhost/www2.kelvinwong.ca.xml
# chown tomcat:root /etc/tomcat6/Catalina/localhost/{www1.kelvinwong.ca.xml,www2.kelvinwong.ca.xml}
# chmod 664 /etc/tomcat6/Catalina/localhost/{www1.kelvinwong.ca.xml,www2.kelvinwong.ca.xml}

Tomcat will use these files to find the WAR and deploy the application using the settings in the Context. Note: Contexts can be overridden (they often are) and there are more than a few in Tomcat. Review Tomcat’s documentation if they give you any trouble.

Make sure your Context fragments have .xml suffixes!

Place the following into /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml

# vi /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/usr/share/tomcat6/lib/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="/var/solr/www1.kelvinwong.ca" override="true" />
</Context>

The Context fragment is simply telling Tomcat where to find the Context root (document base). It is an absolute path to its web app archive (WAR) file. CrossContext allows Solr to get a request dispatcher from ServletContext.getContext() for access to other web apps on the virtual host. The Environment tag defines the ‘solr/home‘ setting and allows it to be overridden. That’s all you need.

Change the other fragment:

# vi /etc/tomcat6/Catalina/localhost/www2.kelvinwong.ca.xml

Change the paths:

<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/usr/share/tomcat6/lib/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="/var/solr/www2.kelvinwong.ca" override="true" />
</Context>

Bind Tomcat to Local Port

By default, Tomcat listens on port 8080. The default iptables ruleset in CentOS 6 does not allow remote connections to port 8080. For our purposes this is fine since we want our Drupal sites to connect locally on port 8080. Local good, remote bad.

You can also tell Tomcat to bind to localhost and not any of the other network adapters. Open Tomcat’s server.xml file:

# vi /etc/tomcat6/server.xml

Change Tomcat’s binding address to the localhost address (127.0.0.1) in the Connector tag:

69
70
71
72
73
74
    <Connector port="8080" protocol="HTTP/1.1" 
               connectionTimeout="20000" 
               redirectPort="8443" 
               URIEncoding="UTF-8"
               maxHttpHeaderSize="65535"
               address="127.0.0.1" />

Solr is a web service that takes many requests from Drupal using the HTTP GET method, similar to you typing into your browser’s web address bar. These requests routinely get very long; you can increase the GET request character limit by increasing the maxHttpHeaderSize attribute (from 8k to 64k as shown). To handle non-English characters, you should also set the request encoding to UTF-8. The Connector as-shown does both.

Restart Tomcat to reload the server.xml file:

# /sbin/service tomcat6 restart
Stopping tomcat6:                                          [  OK  ]
Starting tomcat6:                                          [  OK  ]

View Solr Admin (optional)

You should now be able to view the Solr administration page if you open a local web browser on the server. If you don’t have a desktop on the server (as should be the case), you can use a text-browser like elinks.

View http://localhost:8080/www1.kelvinwong.ca/admin:

# elinks http://localhost:8080/www1.kelvinwong.ca/admin

You should see the Solr administration page in your browser.

SELinux

“Apache Solr: Your site was unable to contact the Apache Solr server,” reports Drupal; SELinux chuckles.

SELinux is enabled by default on CentOS 6, so you will likely have it running and it will not appreciate Apache trying to talk to Tomcat/Solr on port 8080 (check /var/log/audit/audit.log):

type=AVC msg=audit(1315100262.891:17629): avc:  denied  { name_connect } for  pid=2064 comm="httpd" dest=8080 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:http_cache_port_t:s0 tclass=tcp_socket

type=SYSCALL msg=audit(1315100262.891:17629): arch=40000003 syscall=102 success=no exit=-13 a0=3 a1=bfbe6590 a2=b70426f4 a3=11 items=0 ppid=2060 pid=2064 auid=500 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4 comm="httpd" exe="/usr/sbin/httpd" subj=unconfined_u:system_r:httpd_t:s0 key=(null)

You can either turn off SELinux (not recommended) or fix the attributes so that SELinux allows Apache to talk to Tomcat. The handy tool sealert gives helpful advice:

# sealert -a /var/log/audit/audit.log | less

Summary:

SELinux is preventing the http daemon from connecting to itself or the relay ports

Detailed Description:

SELinux has denied the http daemon from connecting to itself or the relay ports. An httpd script is trying to make a network connection to an http/ftp port. If you did not setup httpd to make network connections, this could signal an intrusion attempt.

Allowing Access:

If you want httpd to connect to httpd/ftp ports you need to turn on the
httpd_can_network_relay boolean: "setsebool -P httpd_can_network_relay=1"

Fix Command:

setsebool -P httpd_can_network_relay=1

Additional Information:

Source Context                unconfined_u:system_r:httpd_t:s0
Target Context                system_u:object_r:http_cache_port_t:s0
Target Objects                None [ tcp_socket ]
Source                        httpd
Source Path                   /usr/sbin/httpd
Port                          8080
Host                          <Unknown>
Source RPM Packages           httpd-2.2.15-5.el6.centos
Target RPM Packages           
Policy RPM                    selinux-policy-3.7.19-54.el6_0.5
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Plugin Name                   httpd_can_network_relay
Host Name                     templeton.localdomainPlatform                      Linux templeton.localdomain                              2.6.32-71.29.1.el6.i686 #1 SMP Mon Jun 27 18:07:00
                              BST 2011 i686 i686
Alert Count                   14First Seen                    Sat Sep  3 18:25:40 2011Last Seen                     Sat Sep  3 18:37:42 2011Local ID                      4b66d238-ddf7-4b74-bbe5-3fb54be5b3e4Line Numbers                  178, 179, 180, 181, 182, 183, 184, 185, 192, 193,
                              194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
                              204, 205, 206, 207, 208, 209, 210, 211

Once upon a time and a very good time it was there was a moocow coming down along the road and this moocow that was coming down along the road met a nicens little boy named baby tuckoo[1]

The quick fix is to set the network relay flag (‘P’ flag makes the change persistent across reboots):

# setsebool -P httpd_can_network_relay=1
# getsebool httpd_can_network_relay
httpd_can_network_relay --> on

You don’t need sealert to use setsebool but it is a useful utility to debug errors with SELinux. If you don’t have sealert installed, it is a simple thing to install it since it is part of the setroubleshoot package:

# yum install setroubleshoot

Configure Drupal to use Solr

Turning now to your Drupal installation…

Enable the Solr Search service module…

Configure the Apache Solr Search module by visiting http://www1.kelvinwong.ca/?q=admin/settings/apachesolr

Solr host name
localhost
Solr port
8080
Solr path
/www1.kelvinwong.ca

The Solr path is the name of your Context fragment minus the xml suffix (ie. /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml)



The cron job indexes 50 nodes at a time by default. When indexed, you can then search for nodes by keyword.

Save the settings. You should see:

  • The configuration options have been saved.
  • Apache Solr: Your site has contacted the Apache Solr server.
  • Apache Solr PHP Client Library: Correct version “Revision: 22″.

Try a search

You can re-index the site by force or let cron do it gradually. Either way it take a while for Solr to process the data.


http://www1.kelvinwong.ca/?q=admin/settings/apachesolr/index


Once you have indexed your site and adjusted the permissions on the search form (so anonymous users can use the search form), visit it:


http://www1.kelvinwong.ca/?q=search/apachesolr_search


Intentionally misspell something and let Solr give you hints!

What about the other one??? www2?

Ah, yes…the other one is set-up in a similar manner, just use the following configuration in Drupal:

Solr host name
localhost
Solr port
8080
Solr path
/www2.kelvinwong.ca