html - Strange Output from Python urllib2 -

April 15, 2011

i read source code of webpage using urllib2; however, i'm seeing strange output i've not seen before. here's code (python 2.7, linux):

import urllib2 open_url = urllib2.urlopen("http://www.elegantthemes.com/gallery/") site_html = open_url.read() site_html[50:]

which gives output:

'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xe5\\ms\xdb\xb6\xb2\xfel\xcf\xe4?\xc0<s[\x9a\x8a\xa4^\xe28u,\xa5\x8e\x93\xf4\xa4\x93&\x99:9\xbdw\x9a\x8e\x07"'

does know why it's showing output , not correct html?

the http response being sent site gzipped content , hence strange output. urllib not automatically decode gzip cntent. there 2 ways solve -

1) decode zipped content before printing -

import urllib2 import io import gzip  open_url = urllib2.urlopen("http://www.elegantthemes.com/gallery/") site_html = open_url.read() bi = io.bytesio(site_html) gf = gzip.gzipfile(fileobj=bi, mode="rb") s = gf.read() print s[50:]

2) use requests library -

import requests r = requests.get('http://www.elegantthemes.com/gallery/') print r.content

Search This Blog

ANgular

html - Strange Output from Python urllib2 -

Comments

Post a Comment

Popular posts from this blog

c# - Validate object ID from GET to POST -

node.js - Custom Model Validator SailsJS -

php - Find a regex to take part of Email -