They explained that one of the emails they sent to a customer was missing a period (.) in the email body. What was more confusing is it only happened to this specific customer, when the same email was sent to a different customer the period was not missing.
Recently during my 1 on 1 with my manager we talked about a current project that one of our other teams are working on. This sparked a memory from one of the previous projects I worked on long ago.
About 7 years ago I worked on a project where we built a solution for a client that allowed the client to consolidate all their document templates into a single system, think of it as a version control system for the documents they were sending to their customers.
At the time the client used Microsoft Word templates with placeholders in the document. Every time an employee of our client needed to send out a document via email or needed to print a document that needed to be sent out by the postal services to the customer the employee would have to replace all the placeholders within the document (first name, last name etc).
I remember at one time multiple templates floating around where some versions of the templates were out of date. Some of these templates used outdated terms and conditions where other templates used an outdated company logo or the incorrect font and so on, it had become unmanageable and they asked us to come up with a solution.
We ended up with a solution that allowed the client to have a central place to keep track of all the templates that would eventually be used to generate PDF documents, text messages and the body of emails.
An example of a template they could set up was a welcome letter to a new customer. For each type of communication they could configure a template for each delivery method, one for email, one for text message and one for the printed version that could get mailed out by the postal service. They could end up with a welcome letter that could be sent out by email, text message or by mail. The contents of each template (for the welcome letter) could be different based on the delivery mechanism (email, text message or mail). On the email variant they might use HTML tables and other primitive styling patterns whereas on the mailed version they could add infographics, for the text message template they might have only added a short welcome message.
After a few months of the system being out in the wild (or it could have been more than a year) I got a call from one of the managers that used the software we built.
They explained that one of the emails they sent to a customer was missing a period in the email body. What was more confusing is it only happened to this specific customer, when the same email was sent to a different customer the period was not missing.
Here is an example of the welcome email they might have received where the period was visible:
Here is the same email being sent to a different recipient but the period being missing:
If you missed it, here is the missing period:
While I was on the call with the manager I verified that the source code of the template in fact did contain the period the manager said was missing in the email body. I wrote down the name of the template (as each template has a name and version) so that I could test it myself after the call ended. I told the manager I would have to investigate further and it might take a while.
Once the manager hung up I copied the source code of the template being used in production onto my local version of the software and I generated the email body as a preview. In the preview of the email body I could still see the period. This specific template could be used as a printed version (to be mailed out) or an email body; the printed version generated from this template also correctly displayed the period.
I was already scratching my head at this point as I could clearly see the period in the source code and on the preview in the software we built but the manager insisted the period was missing from the email body this specific customer received.
I decided to trigger the code that would send the email. Our local environment was set up to send emails to your localhost on a certain port, then with a fake SMTP server like SMTP4dev you could receive the email and then view the email with a locally installed email client, Outlook in my case.
When I viewed the local email in Outlook I could still see the period correctly showing up in the email body.
Each time a template was used to generate an email, PDF or text message the code would replace the placeholders with actual content. An example of the placeholders that would be replaced was the customer’s first name, last name etc. This meant each email sent out was unique in terms of the content the email body contained.
I was able to locate the email that was sent to the customer. I could see the values used for the various placeholders in the email. I ended up sending a second email on my local environment but this time with the exact same values for the placeholders that were used for this specific customer.
When I viewed the local email in Outlook I confirmed that the period was in fact missing.
This means the period being missing was very specific to the content of the email that this specific customer received.
I tried various things, I tried to determine if the period character in the template was perhaps encoded when we copied it to the template or if the period was perhaps not a period but a different character being displayed as a period, basically grasping at straws.
While trying the above I sent the email to my localhost after each change I made to the template (this was like working on a website before all the current developer tools were available and hitting refresh and trying again). I noticed that when the period character was moved within the template from its current position, let’s say from position 5 on line 4 to position 6 in line 4 that the period would suddenly be visible when viewing the email on my local Outlook. At last, I had a lead!
Now I knew how to reproduce the issue, but I still did not know the reason for why it was happening. I could move the period with blank spaces as a potential solution and call it a day but I wanted to get to the bottom of why this was happening in the first place.
I started to debug the code and step through the code that was generating the email and saving it to the local database. There was an additional step that happened, once the email was inserted into the database (scheduled to be sent) we had a CRON job that periodically picked up emails that needed to be sent and then sent out the email.
I verified that the code that persisted the email to the database did not alter the template in any way other than replacing the placeholders with the customer’s information. I focused my investigation to the code being invoked by the CRON job (the scheduler that sends out the emails).
I stepped through the code the CRON job was invoking. Some of this code we borrowed from a previous project that was done by one of our other teams a while back. A portion of this code implemented a SMTP client. I avoided this code for a while but I had no other choice to step into it.
After stepping through the code multiple times and reading the comments in the code I noticed that one of the functions in the code would ensure each line in the email body is not longer than a certain amount of characters. If the line exceeds this limit it would create a new line and move the remainder of the email content to the new line and continue, rinse and repeat.
This was implementing the following part of the SMTP spec:
The maximum total length of a text line including the <CRLF> is 1000
octets (not counting the leading dot duplicated for transparency).
This number may be increased by the use of SMTP Service Extensions.
While stepping through the code I evaluated the variable containing the email body and I noticed that the line that contained the missing period started with the period character. This means the previous line hit the limit of the line length rule and a new line was created and the period character was moved to the next line. Here is an example:
Original email body:
Email body after the custom SMTP client formatted the body (excluding other formatting)
Note that line 5 starts with a period.
After some digging trying to learn about the SMTP client implementation I ended up on a page on the internet that contained the spec for Simple Mail Transfer Protocol (SMTP)
As I read through the spec I noticed the following:
Since the mail data is sent on the transmission channel, the end of
mail data must be indicated so that the command and reply dialog can
be resumed. SMTP indicates the end of the mail data by sending a
line containing only a "." (period or full stop)
This was linked to a different section in the spec called: 4.5.2. Transparency.
I navigated to this section and when I read the following I almost jumped out of my chair:
SMTP client
Before sending a line of mail text, the SMTP client checks the
first character of the line. If it is a period, one additional
period is inserted at the beginning of the line.
SMTP Server
When a line of mail text is received by the SMTP server, it checks
the line. If the line is composed of a single period, it is
treated as the end of mail indicator. If the first character is a
period and there are other characters on the line, the first
character is deleted.
If you missed it, the following was of importance:
The SMTP client spec:
Before sending a line of mail text, the SMTP client checks the
first character of the line. If it is a period, one additional
period is inserted at the beginning of the line.
The SMTP server spec:
If the first character is a
period and there are other characters on the line, the first
character is deleted.
The SMTP server spec clearly explained what was happening in our use case (the disappearance of a period).
I updated the code to handle the addition of a second period if a line started with a period and the line had other characters. Then when the SMTP server removes the period upon receiving the email there still remains a period (this is implemented in whatever SMTP server is receiving the email and is outside of our control).
I resent the original email to my localhost using the same recipient details that was experiencing this issue, this time the period was no longer gone, it had re-appeared, almost like magic.
We shipped the fix and let the manager know the issue is resolved.
Seeing that the SMTP client code was borrowed from a previous project we thought it good to let our other teams know about this bug in case they needed to patch it as well. They thanked us and we called it a day.
This is usually where a story like this ends, but not this one. If this story ended here I would have probably not remembered this bug.
A few months later.
My manager walks out of his office and starts with something along the lines of “Hey team, you remember that bug we had with the missing period a while ago?”.
If you have ever seen a dog or a wild animal move their ears while they are listening to a sound in the distance, I’m pretty sure this was how my ears were moving when I heard the words “that bug we had with the missing period”.
It seems one of our other teams haven't gotten around to patching this bug in their code. Unfortunately the system they maintained had sent out a bunch of very important emails to customers informing them about their new monthly premium they needed to pay. As luck would have it a handful of these email bodies had the period just in the right spot (or wrong spot, depending on your view).
In some of the emails the customers received, the new monthly premium was missing its decimal separator, in this case it was missing a period.
This meant some customers received emails informing them their new premium was now $2700 instead of $27.00.
Here is an example of the email that correctly contained the period:
Here is an example of the email some unlucky customers received that was missing the period:
Luckily this bug was very dependent on the length of each line in the email body. The template contained some placeholders like a customer’s first name and surname.
This meant only a certain amount of customers had the exact amount of characters in their first name and surname to cause the period to show up as the first character of a new line causing the period to disappear.
The code got patched immediately as the team knew exactly what the problem was, they thanked us again and we went back to work.
This was such a delight to read. Thanks for sharing.
A missing period can surely cause a 2700 monthly fee. Works as designed, move on ;-)