Zend Framework 2 - Zend\Mail\Message::fromString() wrongly interprets headers with country-specific characters - SOLUTION
Wojciech Zielinski
IT Executive | Project/Program Manager | Transformation Manager | Agile Coach | SAFe RTE | Prince2 Practitioner | ITIL | SciFi Author
This article has been originally published on 28.04.2016 at LinkedPHPers blog. Due to problems with Blogspot solution it has been moved to LinkedIn articles area.
While working on some project of mine I encountered a problem in Zend\Mail\Message::fromString() method. Since I've been looking for a solution in internet for quite some time and haven't found one - I decided to describe in here what I managed to create to fix it. I do not know whether it is most elegant solution or not - the good thing is that it works. If some of you would like to reuse the solution or maybe increase it's functionality - please feel free to share your thoughts :)
One of the functionalities of the system I am working on is filtering incoming and outgoing emails using PHP script. The script is designed as ZF2 Console application. Basically it takes an email from Sendmail queue using output_filter and processes it using php .../index.php process incoming ... console command. Some more information on how to achieve that you can find here.
One of the first steps that are being taken "inside" the script is converting the raw message that is passed from console into Zend\Mail\Message object - or actually object of a class that inherits from Zend\Mail\Message. ZF2 provides a special static method for it which is Zend\Mail\Message::fromString($rawMessage). The method actually goes OK - since it takes the RAW message and created the object. The problems however starts when you try to reach any of message headers using class methods. Header getters are implemented in an lazy-loading pattern - so the fact object consists "wrong" headers is not recognized once the object is created by Zend\Mail\Message::fromString() method.
After quite extensive debuging, send plenties of emails there and back I recognized the problem actually is about headers that consists local characters (for Poland examples of them might be: ?, ?, ń, ?, ? etc. - other countries might have more or less of such as well). Since messages were sent and received using different mailservers (Exchange, sendmail, Lotus) and different email clients (Outlook, Thunderbird, Windows Mail) - the headers in which these characters appeared were very different - e.g. Exchange (or Outlook - haven't checked that in 100% :) ) was adding Topic-Thread header (as far as I know for message threading) and the full topic was put in there - so this header, among others (such as Topic) also was throwing Exception once tried to be retrieved from the object.
The problem with Zend\Mail\Message::fromString() method was that it actually didn't allowed to specify encoding for the data that is retrieved from RAW message. It simply takes the data and puts into some storage inside the object - and the data itself is retrieved on a lazy-loading basis.
So once you try to retrieve the header - the getter is throwing an Exception that data is in wrong format.What I did (and yes - here comes the solution - finally :) ) was creating an inheriting class and changing the way Zend\Mail\Message::fromString() method works. Please find the code below:
public static function fromString(string $rawMessage, string $encoding = 'UTF-8') {? ?
$message = parent::fromString($rawMessage);? ? ? ?
foreach($message->getHeaders()->toArray() as $headerName => $headerValue) {
try {? ? ? ? ? ? ? ?
$message->getHeaders()->get($headerName);? ? ? ? ? ?
}
catch (\Zend\Mail\Header\Exception\InvalidArgumentException $e) { // catches only if Header is wrongly structured? ? ? ? ? ? ? ?
$message->getHeaders()->removeHeader($headerName);? ? ? ? ? ? ? ?
$header = new \Zend\Mail\Header\GenericHeader();? ? ? ? ? ? ? ?
$header->setEncoding($encoding);? ? ? ? ? ? ? ?
$header->setFieldName($headerName);? ? ? ? ? ? ? ?
$header->setFieldValue($headerValue);? ? ? ? ? ? ? ?
$message->getHeaders()->addHeader($header);? ? ? ? ? ?
}? ? ? ?
}? ? ? ?
$message->getHeaders()->setEncoding($encoding); // All headers are encoded in $encoding? ? ? ?
return $message;? ?
}
So what you can see in here is that I am introducing additional parameter $encoding for this method that allows you to specify the encoding for the message. The parameter has a default value of UTF-8 - mainly so method will be compatible downwards (eventhough PHP is throwing a Notice it is not :) ).Then it fires the standard, parent method to create an object - as I described before it will not fail yet :)Then we have a code that tries to "cleanse" the headers - by calling everyone of them and catching the InvalidArgumentException if thrown. If this exception is raised (and catched by try...catch... ?code) - the header is removed and recreated, but with proper $encoding.After all headers are checked and potentially replaced by correctly-encoded ones - I am calling additionally the setEncoding() method against all the headers again - for all headers that were not "replaced" in foreach()?{ try...catch.. } block - so all headers will be encoded in same encoding.
I know you might think why don't you simply execute this setEncoding() method for all existing headers ? Well - I tried - and this one fires the exception as well :) So as I assume, the method simply calls every header, but once calling - the lazy-loading causes Exception to be thrown.
And that's it. Please feel free to share your thoughts on this article - if you find it useful or not. Or if something can be improved :)If you share the positive feedback - I will try to publish more if such quite detailed articles of solutions or problems I encountered already :)