CAPTCHA the wrong way - Trenitalia late trains reimbursement form

CAPTCHA the wrong way - Trenitalia late trains reimbursement form

Have you ever encountered a CAPTCHA? It is a very diffused method in order to avoid BOT usage of online services provided theoretically only to human beings.


CAPTCHA are usually very effective in blocking BOT that want to exploit online services, however, the standard test are increasingly weak against OCR techniques. I wrote a post recently about it.


In this post, I want to present a real case. I work in a consultancy company and this comprise a great deal of travelling. I love to travel by train when possible. This mean of travel has great upsides: no driving stress, additional working time (or seldomly leisure time - my Kindle is always in my travel bag), great service on board and at the main stations (love the frequent traveller points) and partial freedom in changing your departure time. Then, sometimes there is also a downside - trains may be late.

I hate forms, however the reimbursement need to go through form filling. Be advise, the form cannot be filled the same day, you need to do it one day later. What’s in the form? Well, your contact information, the late train ID, your ticket reservation ID...and yes submitting resolving a CAPTCHA.

As I stated, I hate forms, especially those who can be filled automatically. As part of my job I developed some script to exploit several RPA tasks exploiting Java, Python and a great deal of libraries both on Browsers and on Office platform. When I need to evaluate the feasibility of an RPA project which also exploit third party services first question (after the check with the layer - of course) is - are there any CAPTCHA?

So, this case is no different. There is a CAPTCHA, but at first sight it seems a very easy one to defeat with OCR. I downloaded the CAPTCHA image just to check if my simplest script that exploit TESSERACT library could defeat it. Yes it could!

Well, then - potentially, from technical side - I could have written a simple script that with the right information as input it may automatically fill the reimbursement online form request! Better, I could automatically fetch the information (i.e. Train ID, Departure Station, Arrival Station) from my Gmail account and automatically fetch train status from Trenitalia website and, if the train was late, trigger the RPA and fill the form…

Sure, I was going a bit too fast...so fast that it took me a bit of time to realise that the PNG file with the CAPTCHA had the same name of the CAPTCHA solution. This is really bizarre, how could you possibly embed the solution of the CAPTCHA in its URI, in the webpage source code….where every automation script reads in order to fill textbox and submit forms?

The answer is just one...CAPTCHA the wrong way!

Please do not get me wrong. I am a frequent traveller and in most of cases the overall service (App, Ticketing, Stations, Trains, Personnel) is great. But as digital consultant this use-case is worth of mention -> take it as a free consultance for an improvement!

要查看或添加评论,请登录

Steven Tait的更多文章

社区洞察

其他会员也浏览了