DrupalDork.com was shut down on September 18, 2013. This is just a snapshot of the site as it appeared then.

DOPE #1: Simplenews Threaded Send

tl;dr: I release Simplenews Threaded Send this weekend.

Let me start by saying that I love working for Jackson River. About a month ago, management anounced that we would be trying out a monthly Day of Personal Enrichment, not unlike Google's 20% time, but more realistic - who can afford to spend a day a week goofing around? We've staggered them so that the whole company doesn't shut down for a day at a time, so my first DOPE was this past Friday and I wrapped up my project over the weekend.

First, a little back story. Last year, we rebuilt the site of a member of the US House of Representatives. One of their larger requirments was to implement a mass-mailer that could be used to send daily updates to other party members and constituents - several thousand emails per day, in total. They had several SMTP servers that had been sending these mailings for some time, so we were able to continue using them without worrying about getting blacklisted as spammers by mail services. We decided to use the Simplenews and SMTP Authentication Support modules to meet this requirement, but had to work around a few limitations:

  • Simlpenews sends messags during cron, one message at a time.
  • Drupal sets a semaphore variable during a cron run to ensure that only one cron process can be running at once.
  • Simplenews relied on this assumption, and did nothing to mark the batch of messages that were in progress.
  • A bug in Poormanscron exposed this problem in Simplenews: when two cron processes were allowed to run at once, multiple copies of the messages were sent.
  • The SMTP Authentication Support module only handled a single SMTP server.

Customizations

To get around these limitations and assumptions, I did the following:

  • Wrote a Multi SMTP module that allows multiple servers to be configured. It sits on top of the existing SMTP module, and just sets variables used by that module to determine which server to send mail to.
  • Wrote a Simplenews Threaded Send module that would start a new thread for each SMTP server. Each of these child processes would send a single batch of messages to one SMTP server.
  • Made some modifications to the Simplenews module to set a flag on messages that were in the process of being sent, to prevent duplicates.
  • Implemented other modules for some custom functionality: global unsubscribe, creating a user record for newsletter subscribers (Simplenews doesn't require a record in the users table), etc.

After ironing out the kinks, this solution has been working great for the client for several months. They've been really happy, and after showing it off, told us that other offices in the House were interested in making use of it.

DOPE project, and how it works

Which brings us, finally, to my DOPE project: refactoring Simplenews Threaded Send for public release. Don't Hack Contrib is only one step below Don't Hack Core, so I wanted to find a way to wrap the changes I had made to Simplenews into my module so that it could be installed and used without any hacks, like requiring patches to Simplenews.

I had started this process months ago on my own time, but was never able to make time to wrap it up, so it was the perfect project for a DOPE. I spent Friday and part of this weekend moving Simplenews customizations into Simplenews Threaded Send, ripping out the functionality specific to the client, and testing the hell out of it.

At the core, this module is based on socket connections. A Thread class (written by a former co-worker) adds pseudo-threading by opening new connections to the server. For each SMTP server that's avaialable, the module will create a Thread object. When start() is called on the object, it opens a socket connection back to the same server, to a URL defined by this module with a callback that will send a batch of email. The Thread object writes the request headers to the socket and then returns, so that the module can continue processing without waiting for the response (which takes several seconds, since the "child" process is sending a batch of email). This way, multiple threads can be processing in parallel…sort of.

At half-second intervals, each thread is polled to see if the response has fully loaded. If it has, the Thread object is re-used to open another connection, until the whole message queue has been sent. This will work around slow SMTP servers: while the thread that's sending on the slow server waits for the batch to finish, the other threads may be able to send two or three batches. A slow server won't hold up the whole process.

If an SMTP server fails to send, it will be marked as "inactive" for four hours. This was added to prevent the module from trying over and over to use a server that's not available: occasionally, an SMTP server would get overloaded, restarted by someone else, or otherwise become unavailable for a time, so we just gave it a cooling period before attempting to use it again. Any messages that had been "claimed" by the thread for that server will be freed up after an hour, so that another server can process them during a later cron run.

Updates and Cleanup

Cleaning this up for release required several changes:

  • Removed the cURL fallback method. Early on, we had some issues with SELinux on the server blocking socket connections, and had to use cURL to make the queries as a fallback method. But, I couldn't find a way to open a cURL connection and let processing continue without completing the response. I could open connections on multiple servers at one time, but had to wait for all of them to finish before sending another round of batches through. This meant that the faster servers would sit idle while a slow server was finishing its batch. Thankfully, the SELinux configuration was fixed, so this method was only in use for a little while. I didn't even want to include it in the released module, so I pulled it all out.
  • Moved Simplenews customizations into Simplenews Threaded Send. This meant copying some functions from Simplenews into the module to be renamed and modified slightly. For example simplenews_threaded_send_get_spool() is a copy of simplenews_get_spool(), but it marks the messages that are in process to prevent two threads from processing the same batch.
  • Replaced a hacky $send_full_spool bool flag with a proper $batch_size int argument. Before starting the full send process, the module will send one message on each SMTP server. This will cause unavailable servers to be marked as inactive for the next four hours, so that it won't waste time trying to process an entire batch against a server that's just timing out. This argument never really made sense, especially because it would cause the simplenews_throttle variable to be set to 1 for one run, then re-set to what it was before - very hacky. The new $batch_size argument gets passed down the processing stack, which makes it easier to override later on for other reasons.
  • Implemented hook_schema_alter() to handle new columns needed in simplenews_mail_spool table. Previously, these changes were just made right in the Simplenews module.

While I was at it, I also updated the Multi SMTP module to give more helpful messages to admins, and to add a check for the necessary PHPMailer library used by the SMTP Authentication Support module. These modules can be found here:

Release Plan

The version of Simplenews Threaded Send that's available on drupal.org right now is marked as 6.x-1.0-beta2. While the client has been using the module successfully for a number of months, so much refactoring was done that I would like for it to get some more testing in the wild before I make a point release.