Category:

[How-To]Bypass CAPTCHA with Python && Tesseract :D

Il captcha è stata la morte per gli spam bot.
Attraverso il CAPTCHA è possibile rendere più sicure le nostre applicazioni da spammers e bot.

Da wikipedia:

A CAPTCHA is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a person. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade. Because other computers are assumed to be unable to solve the CAPTCHA, any user entering a correct solution is presumed to be human. Thus, it is sometimes described as a reverse Turing test, because it is administered by a machine and targeted at a human, in contrast to the standard Turing test that is typically administered by a human and targeted at a machine. A common type of CAPTCHA requires the user to type letters or digits from a distorted image that appears on the screen.

Un esempio:

Vediamo oggi come bypassare queste protezioni :D

La prima cosa che ci server è un ocr.
Io vi consiglio Tesseract.
La guida per installare tesseract su linux debian like:
[BackBox-Ubuntu]Install Tesseract-3.00-OCR

Ok adesso vediamo come bypassare i captcha molto semplici es:
<img src="captcha.php" alt="">

Innanzi tutto vediamo come funzionano:

Ok io ho creato per voi, un captcha online sul mio blog:
www.clshack.it/captcha/
Perciò scrive OK se inserite il captcha giusto :D

Quindi i passaggi che eseguo in python sono i seguenti:
-mi salvo il cookie che mi genera la pagina;
set_cookie = urlopen(URL_BASE).headers.getheader("Set-Cookie")
sess_id = set_cookie[set_cookie.index("=")+1:set_cookie.index(";")]

-prendo l’immagine che corrisponde al cookie che mi ha generato l’immagine:
# construct headers dictionary using the PHPSESSID
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/2010010' \
'1 Firefox/4.0.1',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'en-us,en;q=0.5',
'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.7','Cookie':'PHPSESSID='+sess_id}
#save img#
localFile = open('img.jpeg', 'wb')
localFile.write(urlopen(Request(URL_BASE+"captcha.php",headers=headers)).read())
localFile.close()
#END-save img#

-converto e leggo il testo dell’immagine;
#img to text#
convert = system("convert img.jpeg ocr.tiff");
read = system("tesseract ocr.tiff result");
result = popen("cat result.txt", "r");
result = result.read()
#END-img to text#
result=str(result).replace("\n","")

-invio i dati alla pagina;
# encode my POST parameters for the capcha page
data = urlencode( [("captcha",result)] )
# send captcha
print urlopen(Request(URL_BASE,headers=headers),data).read()

Il codice completo:
from os import popen, system
from urllib import urlencode
from urllib2 import urlopen, Request
URL_BASE='http://www.clshack.it/captcha/'
# extract my PHPSESSID by loading a page from the site
set_cookie = urlopen(URL_BASE).headers.getheader("Set-Cookie")
sess_id = set_cookie[set_cookie.index("=")+1:set_cookie.index(";")]
# construct headers dictionary using the PHPSESSID
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/2010010' \
'1 Firefox/4.0.1',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'en-us,en;q=0.5',
'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.7','Cookie':'PHPSESSID='+sess_id}
#save img#
localFile = open('img.jpeg', 'wb')
localFile.write(urlopen(Request(URL_BASE+"captcha.php",headers=headers)).read())
localFile.close()
#END-save img#
#img to text#
convert = system("convert img.jpeg ocr.tiff");
read = system("tesseract ocr.tiff result");
result = popen("cat result.txt", "r");
result = result.read()
#END-img to text#
result=str(result).replace("\n","")
# encode my POST parameters for the capcha page
data = urlencode( [("captcha",result)] )
# send captcha
print urlopen(Request(URL_BASE,headers=headers),data).read()

Un esempio di esecuzione:
clshack@lb:~$ python captcha.py
Tesseract Open Source OCR Engine v3.01 with Leptonica
Page 0
ERROR->TRUE CAPTCHA:43608s
clshack@lb:~$ python captcha.py
Tesseract Open Source OCR Engine v3.01 with Leptonica
Page 0
OK
clshack@lb:~$

Risultati:
5/9.
Buon divertimento :D
Un articolo simile qui:
http://r00tsec.blogspot.com/2012/01/v-behaviorurldefaultvmlo.html

Share it:

Related posts:

  1. [BackBox-Ubuntu]Install Tesseract-3.00-OCR
  2. [WordPress]From XSS to Admin {Bypass WPNONCE}
  3. [PHP] WordPress External Login || Auto Login
  4. PyLoris l'evoluzione in python di Slowloris
  5. Arp Poisoning: Dns Spoof with Ettercap [bypass proxy]
  6. [TUTORIAL] Blind SQL INJECTION for MySql
  7. DRIL: Reverse IP with Bing API [python example]
  8. cookieGrep – An easy cookie analyzer tool.



You can leave a response, or trackback from your own site.
  • http://sixthevicious.wordpress.com/ Six110

    Con tesseract ci ho fatto anche un modulo per SpamAssassin!

  • clshack_

    @six:
    Penso sia uno dei migliori ocr open source :)

  • http://systemoveride.net SYSTEM_OVERIDE

    Bell’articolo Alessio, interessante :D

  • clshack_

    Grazie :D