Upload files using FileField and Generic Class-based Views in Django 1.5

Filebaby list view

This tutorial demonstrates the use of the FileField model field to save user submitted files in a Django 1.5 web application using Django’s generic Class-based Views (CBVs).

Goals

This code will demonstrate how to handle user submitted files using Django’s generic Class-based views (CBVs).

Your users will be able to:

  • Download file(s) from a list of recently uploaded files
  • Anonymously upload files for immediate publishing

Caveat

This tutorial does not cover access control; anonymous users can upload files and publish them immediately. Control over who uploads and downloads files using generic CBVs is covered in the Django documentation.

This app was not tested on any Windows operating systems.

Prerequisites

  • Basic knowledge of Django
  • Basic knowledge of Python
  • Basic knowledge of Unix
  • Python 2.7 recommended (installed and working)
  • Unix-like operating system (Linux, Mac, FreeBSD, etc)
  • Virtualenv and Virtualenvwrapper are strongly recommended

You can follow along by downloading the completed code and running it, or by visiting the repository.

Get the source code

This entire tutorial is available on Bitbucket:


https://bitbucket.org/kelvinwong_ca/kelvinwong_ca_blogcode/src

Examine the directory named:

./django_filefield_tutorial/

Or you can download a tarball here:


https://bitbucket.org/kelvinwong_ca/kelvinwong_ca_blogcode/downloads

Get the download named:

django_filefield_tutorial.tar.gz

Uncompress it:

$ tar zxvf django_filefield_tutorial.tar.gz

Make a new virtualenv and install the requirements with ‘pip‘:

$ pip install -r uploadering/requirements.txt

When you have installed the project requirements, open the uploadering directory:

$ cd uploadering

Ensure the code works by running its tests.

Run the tests (optional)

I have included a few simple tests that you can use to test your application:

uploadering/filebaby/tests.py

The tests can be run using Django’s test runner:

$ python manage.py test filebaby
Creating test database for alias 'default'...
..........
----------------------------------------------------------------------
Ran 10 tests in 0.158s

OK
Destroying test database for alias 'default'...

Running these tests will verify that things will work as expected for the rest of the tutorial.

Big Picture Overview

Try running the server. Open the ‘uploadering’ directory, initialize the database and run the development server using the ‘manage.py’ script:

$ python manage.py syncdb

...setup databases stuff...

$ python manage.py runserver

Open the ‘/add’ URL (hint: try http://localhost:8000/add). You should see the following.

filebaby add file inset

When your user clicks the ‘add file’ button on the app it loads the ‘/add’ URL. An empty form is presented.

The user attaches a file and uploads the file back to the ‘/add’ URL. That file is received as a POST with file data and Django processes it. When processing is completed, the class-based view emits a redirect to the user’s browser asking it to load the success URL which in this case is the home page. A success message is displayed.

filebaby success inset

Where does Django store your uploaded files?

You need to tell Django where to store the uploaded files. This location is assigned in the ‘MEDIA_ROOT’ variable. If you are in the project’s root folder (named uploadering) open the settings.py file located at:

uploadering/settings.py

Django expects a full absolute pathname. You could put in a text string, but I usually set up a PROJECT_ROOT variable which is calculated relative to the directory containing all the project files. Put it at the top of the settings file somewhere.

 
# uploadering/settings.py
 
import os  # Put this up top, maybe line #2
#
# ...somewhere lower in the file...
#
PROJECT_ROOT = os.path.abspath(os.path.dirname(os.path.dirname(__file__)))
#
# This contains the full absolute path to the project directory
# ie. /home/myuser/webapps/uploadering/

Once configured, you can use it to build your absolute pathnames in your project.

 
# Absolute filesystem path to the directory that will hold user-uploaded files.
# Example: "/var/www/example.com/media/"
# If PROJECT_ROOT is '/home/myuser/webapps/uploadering/' then
# MEDIA_ROOT is '/home/myuser/webapps/uploadering/userfiles/'
#
MEDIA_ROOT = os.path.join(PROJECT_ROOT, 'userfiles')
 
# URL that handles the media served from MEDIA_ROOT. Make sure to use a
# trailing slash.
# Examples: "http://example.com/media/", "http://media.example.com/"
#
MEDIA_URL = '/files/'  # Note they don't have to be identical names

Now that your settings.py is configured, any user uploaded files will reside in this directory:

userfiles/

This is the directory tree structure relative to your project root (the README file marks the user uploads folder):

uploadering/
├── filebaby
├── uploadering
└── userfiles
    └── README

Now let’s look at the model for the user files.

FileField on the model sets the upload directory format

Our FilebabyFile data model has a property named ‘f’ for ‘file’. For your own application the name should be more descriptive, but for this tutorial a simple ‘f’ is good enough. Open this file and find the FileField model field:

filebaby/models.py

Find the FileField model field:

# filebaby/models.py
 
class FilebabyFile(models.Model):
    """This holds a single user uploaded file"""
    f = models.FileField(upload_to='.')

The FileField model field has one required attribute named ‘upload_to’. You must set this. You have three choices for this attribute: a string containing a period (as shown), a ‘strftime’ format string or a custom callable (usually a function but it can be any callable). I’m going to leave this as a dot string. This will ungraciously dump all your user submitted files into the MEDIA_ROOT directory.

For your own application, you might make this a named user directory, a hashed string or something more appropriate than a dot string. To do this, you can set the ‘upload_to’ parameter to use a callback.

FilebabyForm model form class is a standard model form

The FilebabyForm is a regular ModelForm that obtains its properties from the model designated in the inner class Meta. It is located at:

filebaby/forms.py

If you open this file you will be bored to tears.

# filebaby/forms.py
 
class FilebabyForm(forms.ModelForm):
    """Upload files with this form"""
    class Meta:
        model = FilebabyFile

There is nothing remarkable about this class so I won’t dwell on it. More information on ModelForms can be found on the Django web site.

FileAddView class-based view handles successful uploads

The FileAddView uses the generic FormView class-based view provided by Django. Examine the application views file:

# filebaby/views.py
 
class FileAddView(FormView):
 
    form_class = FilebabyForm
    success_url = reverse_lazy('home')
    template_name = "filebaby/add.html"
 
    def form_valid(self, form):
        form.save(commit=True)
        messages.success(self.request, 'File uploaded!')
        return super(FileAddView, self).form_valid(form)

You can see that I have overridden the following class properties:

  1. form_class – The form class used by the view
  2. success_url – Destination on successful upload
  3. template_name – Template used with the upload form

I have also overridden this method:

  1. form_valid – For processing successful uploads

The form_class is the FilebabyForm we discussed earlier. The FormView expects a form and one must be provided; here it is.

The success_url is where the user ends up when the file is uploaded successfully. I have used named URLs in my URLConf scheme and therefore I used the lazy version of the URL reverse function. This is necessary since the URLs are not loaded when the views are instantiated. If you see a NameError exception then you might have used the non-lazy version of ‘reverse’:

NameError at / name 'reverse' is not defined

The template_name is the template that contains a groovy form. It might be so groovy that it doesn’t work on early versions of Internet Explorer – I didn’t test it a whole lot on Windows and you’re not using IE7 right? Right? Check it out:

filebaby/templates/filebaby/add.html

If you find the template too groovy and difficult to follow, I have included a much simpler boring one as well (it can be swapped out – try it):

filebaby/templates/filebaby/add-boring.html

The boring version contains only the important parts of the form: enctype setting, file input and errors placeholder, submit button and cross site request forgery (CSRF) token. That’s it.

<form action="/add" method="post" enctype="multipart/form-data">
<input name="f" type="file" id="file" />
<input type="submit" id="submit" value="Submit File" />
{% csrf_token %}
</form>

The form_valid method runs when the form passes validation. Since we are uploading a file, this method is activated when a file is received. The form instance is passed to the method and we need to save it to place the file in the MEDIA_URL directory. A success message is displayed.

Mapping URLs to Views in URLConf

The add view is mapped to a URL in the URLConf file at:

# uploadering/urls.py
url(r'^add$', FileAddView.as_view(), name='filebaby-add'),

The path ‘/add’ gets mapped to the FileAddView. A helpful name is provided to use the reverse helper in the templates. Naming your URLs is not required but I do it.

Serving the user submitted files in development

In order to serve the user submitted files while you are developing your site, you need to add some code to the URLConf.

# /uploadering/uploadering/urls.py
 
from django.conf.urls.static import static
from django.conf import settings
 
urlpatterns = patterns('',
    # ...
    # URL mappings
    # ...
) + static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

Adding the static function to the end of the urlpatterns tells the development server to map the MEDIA_URL to the files in the MEDIA_ROOT. Using our settings, a browser request for:

/files/myfile.txt

Yields a file located at:

/home/myuser/webapps/uploadering/userfiles/myfile.txt

Listing your uploaded files with a ListView

The home page uses a generic CBV called ListView to display the list of files. The implementation is standard except for the default query which orders by largest primary key (in lieu of a date value).

class FileListView(ListView):
 
    model = FilebabyFile
    queryset = FilebabyFile.objects.order_by('-id')
    context_object_name = "files"
    template_name = "filebaby/index.html"

I won’t dwell on this since it is outside the scope of this tutorial. It does have working pagination and a Foundation 4 compatible Django messages implementation which might be of interest to some.

That’s it for the tutorial. Go build your app now.

Thought questions

  • Have you ever implemented an upload app via a FileField using view functions?
  • Do you find the generic Class-based View (CBV) method an improvement over functional views?
  • What URL scheme do you prefer for your file uploads and why?

Tags: , , , , , , , , , , ,

LinkedIn now with less salt!

Funny tweets about LinkedIn

Even the finest sword plunged into salt water will eventually rust.     Sun Tzu *

Yesterday, the word of the day at LinkedIn was “Salt” (in the cryptographic sense and not the NaCl sense). Some dodgy fellow made off with at least some of the LinkedIn password database. Those passwords were not stored in cleartext (thank Jupiter) but the hashes weren’t salted. This means tools like John The Ripper can be used to find the original password and that is exactly what happened.

If you are a software developer and you work on public facing web sites, here is the LinkedIn lesson:

  1. Always use salt with your password hashing scheme
  2. Use slow hashing functions like bcrypt or scrypt rather than faster hashing functions like MD5, SHA, etc.

* Note on epigram: Security nerds love to quote Sun Tzu and this was the only Sun Tzu quote I could find that had some salt in it.

Install secure Webmin 1.580 on Ubuntu 12.04 LTS Precise Pangolin

Webmin welcome screen welcomes

Installing Webmin on Ubuntu 12.04 LTS Precise Pangolin is quite simple. This article will walk you through the complete installation of Webmin 1.580 including the upgrading of the self-signed certificate to a 2048-bit key (a 512-bit key is the default).

This is my system:

$ uname -a
Linux brasenose 3.2.0-24-generic-pae #37-Ubuntu SMP Wed Apr 25 10:47:59 UTC 2012 i686 i686 i386 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 12.04 LTS
Release:	12.04
Codename:	precise
$ openssl version
OpenSSL 1.0.1 14 Mar 2012

That last check is pretty important. If you don’t have OpenSSL installed you are not going to be able to run Webmin over TLS so make sure it is installed.

My demonstration system is a minimal system with only a SSH Server installed and a static IP set-up.

Install Webmin

Things have come a long way in the Webmin world and some cranky old Perl dependencies have now been flushed from the code. Unfortunately, there is no specialized Ubuntu version, so aficionados need to install the Debian version and make manual changes. Fortunately, installing the Debian package is simple. First we need to add the official Webmin repository to our list of software packages:

$ sudo vi /etc/apt/sources.list

Add the following line to the bottom of the file:

64
deb http://download.webmin.com/download/repository sarge contrib

This adds the Webmin Debian repository to your package list. Wondering why the repo release code name is ‘Sarge’? My guess is that it simply never got changed once Debian moved on to Etch in 2007 because it works fine. Sarge was an ancient Debian release from the late pleistocene and it hasn’t been ’round these parts for many moons.

Now we need to add Webmin author Jamie Cameron’s public key to our keyring. Do this from your home directory:

$ cd ~
$ wget http://www.webmin.com/jcameron-key.asc
--2012-04-29 01:34:19--  http://www.webmin.com/jcameron-key.asc
Resolving www.webmin.com (www.webmin.com)... 216.34.181.97
Connecting to www.webmin.com (www.webmin.com)|216.34.181.97|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1320 (1.3K) [text/plain]
Saving to: `jcameron-key.asc'

100%[======================================>] 1,320       --.-K/s   in 0s      

2012-04-29 01:34:19 (41.4 MB/s) - `jcameron-key.asc' saved [1320/1320]
$ sudo apt-key add ~/jcameron-key.asc
[sudo] password for kelvin: 
OK

Now we can install Webmin from the repo we added:

$ sudo apt-get update
...
Fetched 12.6 MB in 37s (333 kB/s)                                              
Reading package lists... Done
$ sudo apt-get install webmin
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  apt-show-versions libapt-pkg-perl libauthen-pam-perl libio-pty-perl
  libnet-ssleay-perl
The following NEW packages will be installed:
  apt-show-versions libapt-pkg-perl libauthen-pam-perl libio-pty-perl
  libnet-ssleay-perl webmin
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
Need to get 16.1 MB of archives.
After this operation, 100 MB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Get:1 http://download.webmin.com/download/repository/ sarge/contrib webmin all 1.580 [15.8 MB]
Get:2 http://ca.archive.ubuntu.com/ubuntu/ precise/main libnet-ssleay-perl i386 1.42-1build1 [184 kB]
...
Setting up libnet-ssleay-perl (1.42-1build1) ...
Setting up libauthen-pam-perl (0.16-2build2) ...
Setting up libio-pty-perl (1:1.08-1build2) ...
Setting up libapt-pkg-perl (0.1.25build2) ...
Setting up apt-show-versions (0.17) ...
** initializing cache. This may take a while **
Setting up webmin (1.580) ...
Webmin install complete. You can now login to https://brasenose:10000/
as root with your root password, or as any user who can use sudo
to run commands as root.

Webmin TLS certificate warning

Webmin now is running on port 10000 but you can inspect the TLS properties and see that it is using a 512-bit key. Your browser may warn you of the weak default cryptographic key. That sort of thing is fine if you’re living in North Korea, but we need to upgrade it to use a 2048-bit key like all the cool kids.

The username and password for Webmin is the same as any user that has sudo rights on the system. My username is therefore ‘kelvin’ and my password is ‘PASSWORD’. LOL. No, I’m not going to tell you my password…

Upgrade the self-signed SSL Certificate

Webmin upgraded 2048-bit key warning

Upgrading the Webmin certificate reduces TLS warnings

OpenSSL will be used to generate the needed keys and certificates. We are going to make a self-signed certificate which means that it will raise warnings, scary red flags, a Cthulhu and whoknowswhatelse in most browsers. So if this system will be used by easily frightened system admins (most are) then you might want to get a properly signed certificate from a Certificate Authority instead. Having said that (and alienated most of my readership) let’s get on with it.

The self-signed certificate will be valid for 1825 days or 5 years which is also how long your OS will be maintained by Canonical. Simply change the value after the ‘days’ attribute in the command to meet your needs.

Use OpenSSL to make a private key and a self-signed certificate in one badass command:

$ cd /etc/webmin
$ sudo openssl req -newkey rsa:2048 -days 1825 -nodes -x509 -keyout server.key -out server.crt
[sudo] password for kelvin: 
Generating a 2048 bit RSA private key
.............................................................................................+++
.........+++
writing new private key to 'server.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:CA
State or Province Name (full name) [Some-State]:British Columbia
Locality Name (eg, city) []:Victoria
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Kelvin Wong Heavy Industries S.p.A.
Organizational Unit Name (eg, section) []:Network Operations
Common Name (e.g. server FQDN or YOUR name) []:brasenose.kelvinwong.ca
Email Address []:postmaster@kelvinwong.ca

Okay, so how cool was that? Now you have to make your artifacts usable and safe. First, concatenate the private key and the certificate into a single PEM file that Webmin can understand (tee used for piping because I’m cool and I can read Wikipedia). Second, set the correct permissions and file ownership.


$ pwd
/etc/webmin
$ cat server.crt server.key | sudo tee server.pem > /dev/null
$ sudo chmod 600 server.pem server.key server.crt
$ sudo chown root:bin server.pem server.key server.crt
$ ls -l server.*
-rw------- 1 root bin 1610 Apr 29 13:33 server.crt
-rw------- 1 root bin 1704 Apr 29 13:33 server.key
-rw------- 1 root bin 3314 Apr 29 13:45 server.pem

Now you need to tell Webmin to use your new upgraded certificate.

$ sudo vi /etc/webmin/miniserv.conf

Change the certificate name:

26
keyfile=/etc/webmin/server.pem

Then restart Webmin:

$ sudo invoke-rc.d webmin restart
Stopping Webmin server in /usr/share/webmin
Starting Webmin server in /usr/share/webmin
Pre-loaded WebminCore

Your Webmin installation is now totally badass like a Honey Badger.

Webmin 2048-bit key details

Success upgrading Webmin TLS to 2048-bit key

Question: What changes do you make to your Webmin configuration so that it runs well on Ubuntu?

Tags: , , , , , , , ,

Github says: Please audit your SSH keys

I got the following from Github after their benign hacker incident:

Please audit your SSH keys
On Sunday March 4, 2012 a security vulnerability related to SSH keys (public keys) was discovered. For your protection and to prevent unauthorized access we have disabled your public keys until you approve them.

They want me to audit my SSH keys (a simple process). First, find your public key that you use on GitHub (probably in your .ssh directory if you are using a Mac). Then get its fingerprint. Here’s how you do that on a Mac:

Trinity:~ kelvin$ ls -l .ssh/id_rsa*
-rw-------  1 kelvin  staff  1743 Sep 11  2009 .ssh/id_rsa
-rw-r--r--  1 kelvin  staff   400 Sep 11  2009 .ssh/id_rsa.pub
Trinity:~ kelvin$ ssh-keygen -lf .ssh/id_rsa
2048 XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX .ssh/id_rsa.pub (RSA)
Trinity:~ kelvin$

Using ssh-keygen you can get the fingerprint from your private key filename (it will look for your public key for you). That long list of “XX:XX” things will be a hexadecimal number that matches the key fingerprint at the bottom of the GitHub SSH audit page. If it doesn’t match then either Egor hacked you or you might have used a different key (keep looking!).

Tags: , , , , ,

HMCS Corner Brook rechristened HMCS Smiley

HMCS Corner Brook as smiley

Offered without comment. Original here.

Tags: ,

Install webby Postgres 8.4 on CentOS 6.2

At the end of this walkthrough you will have the PostgreSQL 8.4 database installed on CentOS 6.2 ready for use with your web projects. Postgres 8.4 is not the latest version, but it is stable and good enough for web development purposes. This set-up is “webby” in the sense that the it should be familiar to web developers.

Prerequisites

You need to be familiar with basic Linux system administration including editing configuration files with text-editors like vi or emacs.

This is our system. It is a basic CentOS 6.2 installation with a static IP:

$ uname -a
Linux schettino.kelvinwong.ca 2.6.32-220.4.1.el6.i686 #1 SMP Mon Jan 23 22:37:12 GMT 2012 i686 i686 i386 GNU/Linux
$ cat /etc/redhat-release
CentOS release 6.2 (Final)

Install Postgres

Installation of Postgres with yum is simple:

[kelvin@schettino ~]$ sudo yum install postgresql-server
[sudo] password for kelvin: 
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
 * base: mirror.its.sfu.ca
 * extras: mirror.its.sfu.ca
 * updates: mirror.its.sfu.ca
base                                                     | 3.7 kB     00:00     
extras                                                   | 3.5 kB     00:00     
updates                                                  | 3.5 kB     00:00     
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package postgresql-server.i686 0:8.4.9-1.el6_1.1 will be installed
--> Processing Dependency: postgresql-libs(x86-32) = 8.4.9-1.el6_1.1 for package: postgresql-server-8.4.9-1.el6_1.1.i686
--> Processing Dependency: postgresql(x86-32) = 8.4.9-1.el6_1.1 for package: postgresql-server-8.4.9-1.el6_1.1.i686
--> Processing Dependency: libpq.so.5 for package: postgresql-server-8.4.9-1.el6_1.1.i686
--> Running transaction check
---> Package postgresql.i686 0:8.4.9-1.el6_1.1 will be installed
---> Package postgresql-libs.i686 0:8.4.9-1.el6_1.1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package                  Arch        Version                 Repository   Size
================================================================================
Installing:
 postgresql-server        i686        8.4.9-1.el6_1.1         base        3.3 M
Installing for dependencies:
 postgresql               i686        8.4.9-1.el6_1.1         base        2.7 M
 postgresql-libs          i686        8.4.9-1.el6_1.1         base        201 k

Transaction Summary
================================================================================
Install       3 Package(s)

Total download size: 6.2 M
Installed size: 28 M
Is this ok [y/N]: y
Downloading Packages:
Setting up and reading Presto delta metadata
Processing delta metadata
Package(s) data still to download: 6.2 M
(1/3): postgresql-8.4.9-1.el6_1.1.i686.rpm               | 2.7 MB     00:01     
(2/3): postgresql-libs-8.4.9-1.el6_1.1.i686.rpm          | 201 kB     00:00     
(3/3): postgresql-server-8.4.9-1.el6_1.1.i686.rpm        | 3.3 MB     00:01     
--------------------------------------------------------------------------------
Total                                           1.5 MB/s | 6.2 MB     00:04     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : postgresql-libs-8.4.9-1.el6_1.1.i686                         1/3 
  Installing : postgresql-8.4.9-1.el6_1.1.i686                              2/3 
  Installing : postgresql-server-8.4.9-1.el6_1.1.i686                       3/3 

Installed:
  postgresql-server.i686 0:8.4.9-1.el6_1.1                                      

Dependency Installed:
  postgresql.i686 0:8.4.9-1.el6_1.1    postgresql-libs.i686 0:8.4.9-1.el6_1.1   

Complete!
[kelvin@schettino ~]$

The server is installed along with the required client programs.

Configure Postgres – Initialize and start service

After installing Postgres you will need to initialize the database (once only):

[kelvin@schettino ~]$ sudo service postgresql initdb
Initializing database:                                     [  OK  ]

Set the server to restart on reboots and start the postmaster service:

[kelvin@schettino ~]$ sudo chkconfig postgresql on
[sudo] password for kelvin: 
[kelvin@schettino ~]$ sudo service postgresql start
Starting postgresql service:                               [  OK  ]

Configure Postgres – Set superuser password

Now let’s set a password for the superuser (the postgres user) using the PostgreSQL interactive terminal. Jump into the postgres user by using su (with the dash to get a login shell):

[kelvin@schettino ~]$ su -
Password: 
[root@schettino ~]# su - postgres
-bash-4.1$ psql
psql (8.4.9)
Type "help" for help.

postgres=# \password postgres
Enter new password: 
Enter it again:
postgres=# \q
-bash-4.1$

Configure Postgres – Activate password authentication

By default, the server uses ident as defined in the “PostgreSQL Client Authentication Configuration File”. If you open up pg_hba.conf you can see this default configuration:

67
68
69
70
71
72
73
74
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
 
# "local" is for Unix domain socket connections only
local   all         all                               ident
# IPv4 local connections:
host    all         all         127.0.0.1/32          ident
# IPv6 local connections:
host    all         all         ::1/128               ident

Ident is a mapping of local system users (see /etc/passwd for list of system users) to Postgres users. I have never found this authentication method useful for any of the web development work that I have done. I always change it to “md5″ which allows you to create arbitrary users and passwords. Let’s change the server’s client configuration file (I assume you are still using the postgres user shell):

-bash-4.1$ whoami
postgres
-bash-4.1$ vim /var/lib/pgsql/data/pg_hba.conf

Change the “ident” methods to “md5″ methods at the bottom of the pg_hba.conf file:

67
68
69
70
71
72
73
74
75
76
77
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
 
# "local" is for Unix domain socket connections only
local   all         all                               md5
# IPv4 local connections:
host    all         all         127.0.0.1/32          md5
# IPv6 local connections:
host    all         all         ::1/128               md5
# If you don't want to open Postgres to the Internet
# don't enable this line
host    all         all         0.0.0.0/0             md5

By default, Postgres binds only to localhost and you will need to explicitly tell it to bind to your machine’s IP address. The setting is in postgres.conf. If you don’t need remote access you can skip this.

-bash-4.1$ vim /var/lib/pgsql/data/postgresql.conf

Change the listen_addresses setting to an asterisk to listen to all available IP addresses:

57
58
59
60
61
62
63
# - Connection Settings -
 
listen_addresses = '*'
#listen_addresses = 'localhost'         # what IP address(es) to listen on;
                                        # comma-separated list of addresses;
                                        # defaults to 'localhost', '*' = all
                                        # (change requires restart)

Restart your postgres server (exit postgres user into the root shell):

-bash-4.1$ exit
logout
[root@schettino ~]# service postgresql restart
Stopping postgresql service:                               [  OK  ]
Starting postgresql service:                               [  OK  ]
[root@schettino ~]#

Open Firewall (optional)

If you want remote access to the server on Postgres port 5432 you will have to open a port on the firewall. If you still are the root user, type the following:

[root@schettino ~]# whoami
root
[root@schettino ~]# vim /etc/sysconfig/iptables

You can just copy the SSH port rule in iptables and modify the port number from 22 to 5432. Add the following rule just below the SSH port rule and above the rejection rule for the INPUT chain:

10
-A INPUT -m state --state NEW -m tcp -p tcp --dport 5432 -j ACCEPT

When changed, it should look like this:

2
3
4
5
6
7
8
9
10
11
12
13
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 5432 -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

Reload the rules:

[root@schettino ~]# service iptables restart
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Unloading modules:                               [  OK  ]
iptables: Applying firewall rules:                         [  OK  ]
[root@schettino ~]# exit
logout
[kelvin@schettino ~]$

Try it out with pgbench (optional)

To demonstrate the basic use of your new Postgres server, you can try out pgbench which is in the postgresql-contrib RPM. Let’s install it, create a new user, create a new database and run pgbench against it:

[kelvin@schettino ~]$ sudo yum install postgresql-contrib
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
 * base: mirror.its.sfu.ca
 * extras: mirror.its.sfu.ca
 * updates: mirror.its.sfu.ca
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package postgresql-contrib.i686 0:8.4.9-1.el6_1.1 will be installed
--> Processing Dependency: libxslt.so.1(LIBXML2_1.0.18) for package: postgresql-contrib-8.4.9-1.el6_1.1.i686
--> Processing Dependency: libxslt.so.1(LIBXML2_1.0.11) for package: postgresql-contrib-8.4.9-1.el6_1.1.i686
--> Processing Dependency: libxslt.so.1 for package: postgresql-contrib-8.4.9-1.el6_1.1.i686
--> Processing Dependency: libossp-uuid.so.16 for package: postgresql-contrib-8.4.9-1.el6_1.1.i686
--> Running transaction check
---> Package libxslt.i686 0:1.1.26-2.el6 will be installed
---> Package uuid.i686 0:1.6.1-10.el6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
 Package                   Arch        Version                Repository   Size
================================================================================
Installing:
 postgresql-contrib        i686        8.4.9-1.el6_1.1        base        346 k
Installing for dependencies:
 libxslt                   i686        1.1.26-2.el6           base        448 k
 uuid                      i686        1.6.1-10.el6           base         54 k

Transaction Summary
================================================================================
Install       3 Package(s)

Total download size: 848 k
Installed size: 3.3 M
Is this ok [y/N]: y
Downloading Packages:
Setting up and reading Presto delta metadata
Processing delta metadata
Package(s) data still to download: 848 k
(1/3): libxslt-1.1.26-2.el6.i686.rpm                     | 448 kB     00:00     
(2/3): postgresql-contrib-8.4.9-1.el6_1.1.i686.rpm       | 346 kB     00:00     
(3/3): uuid-1.6.1-10.el6.i686.rpm                        |  54 kB     00:00     
--------------------------------------------------------------------------------
Total                                           520 kB/s | 848 kB     00:01     
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : uuid-1.6.1-10.el6.i686                                       1/3 
  Installing : libxslt-1.1.26-2.el6.i686                                    2/3 
  Installing : postgresql-contrib-8.4.9-1.el6_1.1.i686                      3/3 

Installed:
  postgresql-contrib.i686 0:8.4.9-1.el6_1.1                                     

Dependency Installed:
  libxslt.i686 0:1.1.26-2.el6              uuid.i686 0:1.6.1-10.el6             

Complete!
[kelvin@schettino ~]$ which pgbench
/usr/bin/pgbench

Create a new Postgres user by using the createuser wrapper (the P switch allows you to set a password for your new user):

[kelvin@schettino ~]$ su -
Password: 
[root@schettino ~]# su - postgres
-bash-4.1$ createuser -P francesco
Enter password for new role: [password for user francesco]
Enter it again: 
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
Password: [password for postgres]
-bash-4.1$

Make a new database named “winnings” and change the owner to “francesco”:

-bash-4.1$ createdb -O francesco winnings
Password: [password for postgres]
-bash-4.1$ 

Now we can fill it up with pgbench:

-bash-4.1$ pgbench -i -U francesco winnings
Password: [password for user francesco]
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
creating tables...
10000 tuples done.
20000 tuples done.
30000 tuples done.
40000 tuples done.
50000 tuples done.
60000 tuples done.
70000 tuples done.
80000 tuples done.
90000 tuples done.
100000 tuples done.
set primary key...
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_branches_pkey" for table "pgbench_branches"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE:  ALTER TABLE / ADD PRIMARY KEY will create implicit index "pgbench_accounts_pkey" for table "pgbench_accounts"
vacuum...done.
-bash-4.1$ pgbench -c 4 -S -t 2000 -U francesco winnings
Password: [password for user francesco]
starting vacuum...end.
transaction type: SELECT only
scaling factor: 1
query mode: simple
number of clients: 4
number of transactions per client: 2000
number of transactions actually processed: 8000/8000
tps = 4836.016718 (including connections establishing)
tps = 5052.773057 (excluding connections establishing)
-bash-4.1$ pgbench -c 4 -t 2000 -U francesco winnings
Password: [password for user francesco]
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 4
number of transactions per client: 2000
number of transactions actually processed: 8000/8000
tps = 237.345234 (including connections establishing)
tps = 237.889294 (excluding connections establishing)
-bash-4.1$

You can clean up the database by dropping the “winnings” database and dropping “francesco”:

-bash-4.1$ dropdb winnings
Password: [password for postgres]
-bash-4.1$ dropuser francesco
Password: [password for postgres]
-bash-4.1$ 

Enjoy your webby Postgres!

Caveat! If you have an Apache/PHP5 server that wants to talk to your Postgres, you will have to set the appropriate SELinux boolean to allow the communication: setsebool -P httpd_can_network_connect_db 1

Tags: , , , , , , , , , ,

Hacked on a Friday

It’s not the first time that I’ve been hacked and I’m quite sure that it won’t be the last time either. Today, I found out that my web host (Dreamhost) was hacked.

Part of being hacked is trying to figure out what was taken. It takes time to review logs, so I’m not too concerned that Dreamhost can’t answer questions about what data was compromised.

The thing that gets me is that they never emailed me to let me know what was going on. I found out about the breach by reading Tech Crunch. That’s really what I’m grumpy about tonight…that and the passwords I have to reset.

Update > 12h on

Dreamhost sent an email overnight with password advice. I’m still not impressed by the 12-hour delay.

Tags: , ,

The Burzynski Clinic

Dr Stanislaw Burzynski runs an alternative cancer treatment clinic in Texas. Someone claiming to represent the Burzynski Clinic tried to silence teenage blogger/skeptic Rhys Morgan in a fascinating email exchange:

You probably haven’t heard of a man named Stanislaw Burzynski. He offers a treatment called antineoplaston therapy, which he claims can treat cancer, in a centre called the Burzynski Clinic in Houston, Texas. That’s quite a claim, but the Nobel Prize Committee does not need to convene quite yet, because this treatment has been in non-randomised clinical trials since its discovery by Burzynski some 34 years ago.

Tags: ,

Multi-site Solr for Drupal Notes

Last month I published an article on setting up Solr web search for multiple Drupal web sites. Since then I have received several emails about the set-up. Here are some of my replies:

Can I set up Solr on a separate machine?

Yes. To do that you will have to open the port on your iptables (if it is running). Then configure Tomcat to listen on that address and port in your configuration file. Then you have to set up authentication (probably just basic authentication using a Realm) for your Context.

Why not just use multi-core Solr?

You can and should use multi-core Solr if you are setting up commercial hosting and you have the technical knowledge to set-up separate Solr cores. Multi-core Solr would likely use less resources too. The set-up I describe works fine for a few Drupal web sites and can be maintained using files provided by the project maintainers. If you are only supporting a few sites (like maybe 3) I wouldn’t bother trying anything more complex.

Sun JVM works better

I have heard that Sun’s JVM works better than OpenJDK when running Solr but I haven’t had any problems running OpenJDK. I have been running a couple of small Solr instances for over a year on OpenJDK and Tomcat 5.5 without any issues related to the JVM.

Can I use Solr version X (instead of 1.4.X)?

I have no idea. ApacheSolr recommends Solr 1.4.x in the release notes and unless you need additional Solr features only available in later versions, I recommend that you stick with the supported version.

Will this work on my busy web site?

Define “busy“.

Drupal is very flexible and if you mostly publish content and occasionally your users search for stuff, then the demands on your search service will be minimal and my set-up would probably work fine.

If you run a search service and index other websites by scraping and republishing scraped content in Drupal, then your search is going to get a lot more use since it is the primary feature. In that case, you would probably be better off running Solr on a dedicated machine (or machines).

Why is my Tomcat on port 8983?

Jetty is likely listening on port 8983 (were you following the Solr tutorial and left it running?). Make sure that you are running Tomcat and that it is listening on port 8080.

Drupal: “Your site was unable to contact the Apache Solr server”

Like it says, your Drupal install couldn’t find the Solr server. Check your Drupal settings. If they are not exactly right you will see this error. Check your system using “netstat -anp” and “lsof -i” run as root. Make sure that Tomcat is listening on port 8080. Check your logs.

Why doesn’t my set-up work?

There are a lot of reasons why your Solr set-up isn’t working. UNIX is very good at telling you when things go wrong by writing everything in its logs. Check your SELinux settings using sestatus. Check your audit logs. Check your web server logs. Check your catalina.out log. Try Googling the error codes or the main body of the error message.

Feedback welcome

I’m always glad to hear from readers via comments here on the blog, or by trackbacks or via email.

Tags: , , , ,

Multi-Site Solr for Drupal 6 Search on Tomcat 6 / CentOS 6

ApacheSolr for Drupal 6 improves on the out-of-the-box search experience for Drupal users. The easiest way to get Solr running on your Drupal web site is to use the hosted service provided by Acquia; it is way easier than running your own Solr. You simply point your queries to their Solr server and you’re done.

For various reasons, you might want to run your own Solr web service on your own machine. In this article, I will walk you through setting up a working Solr installation using Tomcat 6 on CentOS 6. The end result of this walkthrough will be two separate Solr indexes (via two separate Solr web apps) for two different web sites running on a single Tomcat. I will assume that you are using Acquia’s Drupal (which ships with SolrPHPClient).

Warning: This article assumes all services are on a single machine (suitable for a small organization). Running Solr on a separate machine is possible but raises security implications that are outside the scope of this article.

These are the tasks that we will work on:

  1. Set-up Solr
  2. Set-up Tomcat
  3. Tweak CentOS security thinger (SELinux)
  4. Configure Acquia Drupal

Prerequisites

The prerequisites are:

  • CentOS 6 Web Server w/ PHP 5.3, MySQL 5, Tomcat 6, Java 6 (all services running w/ no problemos)
  • Acquia Drupal 6 installed
  • Familiarity with Drupal (basic skills – enabling modules, setting permissions on nodes, etc)
  • Familiarity with Java & Tomcat (basic skills)
  • Familiarity working with Linux in a terminal and vi (intermediate skills)

This is my system (a web server set-up with Anaconda):

# uname -a
Linux templeton.localdomain 2.6.32-71.29.1.el6.i686 #1 SMP Mon Jun 27 18:07:00 BST 2011 i686 i686 i386 GNU/Linux
# cat /etc/redhat-release
CentOS Linux release 6.0 (Final)
# yum list installed | grep mysql-server
mysql-server.i686       5.1.52-1.el6_0.1  @updates
# yum list installed | grep php
php.i686                5.3.2-6.el6_0.1   @updates                              
php-cli.i686            5.3.2-6.el6_0.1   @updates                              
php-common.i686         5.3.2-6.el6_0.1   @updates                              
php-gd.i686             5.3.2-6.el6_0.1   @updates                              
php-mysql.i686          5.3.2-6.el6_0.1   @updates                              
php-pdo.i686            5.3.2-6.el6_0.1   @updates                              
php-pear.noarch         1:1.9.0-2.el6     @anaconda-centos-201106051823.i386/6.0
php-xml.i686            5.3.2-6.el6_0.1   @updates
# java -version
java version "1.6.0_17"
OpenJDK Runtime Environment (IcedTea6 1.7.5) (rhel-1.31.b17.el6_0-i386)
OpenJDK Client VM (build 14.0-b16, mixed mode)
# yum list installed | grep tomcat6
tomcat6.noarch          6.0.24-24.el6_0   @updates                              
tomcat6-el-2.1-api.noarch
tomcat6-jsp-2.1-api.noarch
tomcat6-lib.noarch      6.0.24-24.el6_0   @updates                              
tomcat6-servlet-2.5-api.noarch
# /sbin/service tomcat6 status
tomcat6 (pid 1790) is running...                           [  OK  ]
# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /selinux
Current mode:                   enforcing
Mode from config file:          enforcing
Policy version:                 24
Policy from config file:        targeted

Notice the hashmark (#) as my terminal prompt. It denotes that I am executing all these commands as root (use ‘su -’). You can also prefix the following commands with ‘sudo’.

Download Solr

Obtain a copy of the Solr tarball from a nearby mirror:

http://www.apache.org/dyn/closer.cgi/lucene/solr/

Select Solr 1.4.1 or the latest recommended Solr:

ie. http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/

I’m using the 54M GZipped Tarball and downloading it using wget:

# wget http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
--2011-09-02 02:06:05--  http://apache.sunsite.ualberta.ca//lucene/solr/1.4.1/apache-solr-1.4.1.tgz
Resolving apache.sunsite.ualberta.ca... 129.128.5.190
Connecting to apache.sunsite.ualberta.ca|129.128.5.190|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 56374837 (54M) [application/x-tar]
Saving to: “apache-solr-1.4.1.tgz”

100%[=============================================>] 56,374,837   261K/s   in 6m 20s  

2011-09-02 02:12:42 (145 KB/s) - “apache-solr-1.4.1.tgz” saved [56374837/56374837]

# tar zxvf apache-solr-1.4.1.tgz
apache-solr-1.4.1/client/
...
# pwd
/root

Copy the Solr package somewhere reasonable like in the /opt folder:

# mkdir -p /opt/solr
# cp -r -p /root/apache-solr-1.4.1 /opt/solr

Link it (the Solr WAR file) to the Tomcat library directory:

# ln -s /opt/solr/apache-solr-1.4.1/dist/apache-solr-1.4.1.war /usr/share/tomcat6/lib/solr.war

In the future, when you upgrade your software, install the Solr upgrade and update the symlink.

Create Solr directories

You need to choose where your Solr indexes will be kept. I put them into the /var directory and that’s where I’m assuming that you will put yours:

# mkdir -p /var/solr
# cp -r -p /opt/solr/apache-solr-1.4.1/example/solr/ /var/solr/
# mv /var/solr/solr /var/solr/example.com
# ls -l /var/solr/example.com/
total 12
drwxr-xr-x. 2 root root 4096 Sep  2 02:44 bin
drwxr-xr-x. 3 root root 4096 Sep  2 02:44 conf
-rw-r--r--. 1 root root 2259 Sep  2 02:44 README.txt

Each domain has its own Solr indexes located in ‘data‘ and its own configuration files in ‘conf‘. There are two optional directories: ‘bin‘ (for replication scripts) and ‘lib‘ (for plugins). Unless your other apps use them, chances are they will be missing.

Install Drupal ApacheSolr plugin protwords, schema and solrconfig

You should already have Acquia Drupal 6 running or Drupal 6 with the ApacheSolr plugin installed. You can copy the ‘protwords.txt’, ‘schema.xml’, and ‘solrconfig.xml’ files from the plugin directory in your respective distribution rather than downloading it, but adjust the paths accordingly.

If you don’t already have the ApacheSolr plugin, get it from the Drupal web site.

http://drupal.org/project/apachesolr

Choose the latest Tarball and use wget to download it to your server, then copy the ApacheSolr configuration files (and backup originals using ‘b’ flag):

# wget http://ftp.drupal.org/files/projects/apachesolr-6.x-1.5.tar.gz
# tar zxvf apachesolr-6.x-1.5.tar.gz
...
# echo 'If ur root cp may give u a scary msg next cmd! Ignore it! Y to overwrite!'
If ur root cp may give u a scary msg next cmd! Ignore it! Y to overwrite!
#
# cp -b -p -f apachesolr/protwords.txt /var/solr/example.com/conf
# cp -b -p -f apachesolr/schema.xml /var/solr/example.com/conf
# cp -b -p -f apachesolr/solrconfig.xml /var/solr/example.com/conf
#
# echo 'Fix group so tomcat can use this!'
Fix group so tomcat can use this!
#
# chown -R root:tomcat /var/solr/example.com
# chmod -R 775 /var/solr/

Warning! If you are not using the Acquia distribution and instead installed the ApacheSolr plugin from the main Drupal web site then you should check that you have a copy of the SolrPhpClient (version r22 – see module README for the gory details). The Acquia distribution includes the correct SolrPhpClient (so you might want to use that instead?).

Make the two Solr instances for the two domains

This walkthrough will create two domains, but you can create more. Using the example.com folder as a prototype, just recursively copy it twice to make two domains (use ‘p’ switch to ‘preserve’ the file permissions and settings):

# cp -r -p /var/solr/example.com /var/solr/www1.kelvinwong.ca
# cp -r -p /var/solr/example.com /var/solr/www2.kelvinwong.ca

If the future, to add a new domain, copy the example.com folder you just made and customize it. This will also work for additional domains that you want to support.

Configure Tomcat 6

It’s All About Context: The Context element represents a web application run within a particular Tomcat virtual host. Each web application is based on a Web Application Archive (WAR) file or a corresponding unpacked directory. The web application used to process each web request is determined by matching the request to the path of each Context. You may define as many Context elements as you wish, but each Context MUST have a unique path. More on Context

Contexts are no longer put into Tomcat’s server.xml file since that file is read only at server start-up. Instead Contexts are placed into a folder hierarchy under CATALINA_BASE (on CentOS 6 it is /etc/tomcat6). Create and configure the following files:

# touch /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml
# touch /etc/tomcat6/Catalina/localhost/www2.kelvinwong.ca.xml
# chown tomcat:root /etc/tomcat6/Catalina/localhost/{www1.kelvinwong.ca.xml,www2.kelvinwong.ca.xml}
# chmod 664 /etc/tomcat6/Catalina/localhost/{www1.kelvinwong.ca.xml,www2.kelvinwong.ca.xml}

Tomcat will use these files to find the WAR and deploy the application using the settings in the Context. Note: Contexts can be overridden (they often are) and there are more than a few in Tomcat. Review Tomcat’s documentation if they give you any trouble.

Make sure your Context fragments have .xml suffixes!

Place the following into /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml

# vi /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/usr/share/tomcat6/lib/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="/var/solr/www1.kelvinwong.ca" override="true" />
</Context>

The Context fragment is simply telling Tomcat where to find the Context root (document base). It is an absolute path to its web app archive (WAR) file. CrossContext allows Solr to get a request dispatcher from ServletContext.getContext() for access to other web apps on the virtual host. The Environment tag defines the ‘solr/home‘ setting and allows it to be overridden. That’s all you need.

Change the other fragment:

# vi /etc/tomcat6/Catalina/localhost/www2.kelvinwong.ca.xml

Change the paths:

<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/usr/share/tomcat6/lib/solr.war" debug="0" crossContext="true" >
   <Environment name="solr/home" type="java.lang.String" value="/var/solr/www2.kelvinwong.ca" override="true" />
</Context>

Bind Tomcat to Local Port

By default, Tomcat listens on port 8080. The default iptables ruleset in CentOS 6 does not allow remote connections to port 8080. For our purposes this is fine since we want our Drupal sites to connect locally on port 8080. Local good, remote bad.

You can also tell Tomcat to bind to localhost and not any of the other network adapters. Open Tomcat’s server.xml file:

# vi /etc/tomcat6/server.xml

Change Tomcat’s binding address to the localhost address (127.0.0.1) in the Connector tag:

69
70
71
72
73
74
    <Connector port="8080" protocol="HTTP/1.1" 
               connectionTimeout="20000" 
               redirectPort="8443" 
               URIEncoding="UTF-8"
               maxHttpHeaderSize="65535"
               address="127.0.0.1" />

Solr is a web service that takes many requests from Drupal using the HTTP GET method, similar to you typing into your browser’s web address bar. These requests routinely get very long; you can increase the GET request character limit by increasing the maxHttpHeaderSize attribute (from 8k to 64k as shown). To handle non-English characters, you should also set the request encoding to UTF-8. The Connector as-shown does both.

Restart Tomcat to reload the server.xml file:

# /sbin/service tomcat6 restart
Stopping tomcat6:                                          [  OK  ]
Starting tomcat6:                                          [  OK  ]

View Solr Admin (optional)

You should now be able to view the Solr administration page if you open a local web browser on the server. If you don’t have a desktop on the server (as should be the case), you can use a text-browser like elinks.

View http://localhost:8080/www1.kelvinwong.ca/admin:

# elinks http://localhost:8080/www1.kelvinwong.ca/admin

You should see the Solr administration page in your browser.

SELinux

“Apache Solr: Your site was unable to contact the Apache Solr server,” reports Drupal; SELinux chuckles.

SELinux is enabled by default on CentOS 6, so you will likely have it running and it will not appreciate Apache trying to talk to Tomcat/Solr on port 8080 (check /var/log/audit/audit.log):

type=AVC msg=audit(1315100262.891:17629): avc:  denied  { name_connect } for  pid=2064 comm="httpd" dest=8080 scontext=unconfined_u:system_r:httpd_t:s0 tcontext=system_u:object_r:http_cache_port_t:s0 tclass=tcp_socket

type=SYSCALL msg=audit(1315100262.891:17629): arch=40000003 syscall=102 success=no exit=-13 a0=3 a1=bfbe6590 a2=b70426f4 a3=11 items=0 ppid=2060 pid=2064 auid=500 uid=48 gid=48 euid=48 suid=48 fsuid=48 egid=48 sgid=48 fsgid=48 tty=(none) ses=4 comm="httpd" exe="/usr/sbin/httpd" subj=unconfined_u:system_r:httpd_t:s0 key=(null)

You can either turn off SELinux (not recommended) or fix the attributes so that SELinux allows Apache to talk to Tomcat. The handy tool sealert gives helpful advice:

# sealert -a /var/log/audit/audit.log | less

Summary:

SELinux is preventing the http daemon from connecting to itself or the relay ports

Detailed Description:

SELinux has denied the http daemon from connecting to itself or the relay ports. An httpd script is trying to make a network connection to an http/ftp port. If you did not setup httpd to make network connections, this could signal an intrusion attempt.

Allowing Access:

If you want httpd to connect to httpd/ftp ports you need to turn on the
httpd_can_network_relay boolean: "setsebool -P httpd_can_network_relay=1"

Fix Command:

setsebool -P httpd_can_network_relay=1

Additional Information:

Source Context                unconfined_u:system_r:httpd_t:s0
Target Context                system_u:object_r:http_cache_port_t:s0
Target Objects                None [ tcp_socket ]
Source                        httpd
Source Path                   /usr/sbin/httpd
Port                          8080
Host                          <Unknown>
Source RPM Packages           httpd-2.2.15-5.el6.centos
Target RPM Packages           
Policy RPM                    selinux-policy-3.7.19-54.el6_0.5
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Plugin Name                   httpd_can_network_relay
Host Name                     templeton.localdomainPlatform                      Linux templeton.localdomain                              2.6.32-71.29.1.el6.i686 #1 SMP Mon Jun 27 18:07:00
                              BST 2011 i686 i686
Alert Count                   14First Seen                    Sat Sep  3 18:25:40 2011Last Seen                     Sat Sep  3 18:37:42 2011Local ID                      4b66d238-ddf7-4b74-bbe5-3fb54be5b3e4Line Numbers                  178, 179, 180, 181, 182, 183, 184, 185, 192, 193,
                              194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
                              204, 205, 206, 207, 208, 209, 210, 211

Once upon a time and a very good time it was there was a moocow coming down along the road and this moocow that was coming down along the road met a nicens little boy named baby tuckoo[1]

The quick fix is to set the network relay flag (‘P’ flag makes the change persistent across reboots):

# setsebool -P httpd_can_network_relay=1
# getsebool httpd_can_network_relay
httpd_can_network_relay --> on

You don’t need sealert to use setsebool but it is a useful utility to debug errors with SELinux. If you don’t have sealert installed, it is a simple thing to install it since it is part of the setroubleshoot package:

# yum install setroubleshoot

Configure Drupal to use Solr

Turning now to your Drupal installation…

Enable the Solr Search service module…

Configure the Apache Solr Search module by visiting http://www1.kelvinwong.ca/?q=admin/settings/apachesolr

Solr host name
localhost
Solr port
8080
Solr path
/www1.kelvinwong.ca

The Solr path is the name of your Context fragment minus the xml suffix (ie. /etc/tomcat6/Catalina/localhost/www1.kelvinwong.ca.xml)



The cron job indexes 50 nodes at a time by default. When indexed, you can then search for nodes by keyword.

Save the settings. You should see:

  • The configuration options have been saved.
  • Apache Solr: Your site has contacted the Apache Solr server.
  • Apache Solr PHP Client Library: Correct version “Revision: 22″.

Try a search

You can re-index the site by force or let cron do it gradually. Either way it take a while for Solr to process the data.


http://www1.kelvinwong.ca/?q=admin/settings/apachesolr/index


Once you have indexed your site and adjusted the permissions on the search form (so anonymous users can use the search form), visit it:


http://www1.kelvinwong.ca/?q=search/apachesolr_search


Intentionally misspell something and let Solr give you hints!

What about the other one??? www2?

Ah, yes…the other one is set-up in a similar manner, just use the following configuration in Drupal:

Solr host name
localhost
Solr port
8080
Solr path
/www2.kelvinwong.ca

Tags: , , , , , , , , , , , , , ,