SO I have a script that pulls information from an event webpage. URL is this: http://everguide.com.au/melbourne/event/2012-jul-14/colour/
This php script is calling a python script (its part of a for loop):
${"tmp" . $i} = utf8_encode (exec("python myscrape.py ${"eu" . $i}"));
It passes a URL. The python script is this:
# -*- coding: utf-8 -*-
import sys
URL = sys.argv[1]
#$URL = 'http://everguide.com.au/melbourne/event/2012-jul-14/colour/'
import urllib2
req = urllib2.Request(URL)
response = urllib2.urlopen(req)
html = response.read()
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html.decode('utf-8'))
soup.prettify()
import re
for node in soup.findAll(itemprop="name"):
n = ''.join(node.findAll(text=True))
for node in soup.findAll(itemprop="url"):
v = ''.join(node.findAll(text=True))
for node in soup.findAll("div", { "class" : "time" }):
d = ''.join(node.findAll(text=True))
for node in soup.findAll("a", { "id" : "ctl00_holderBody_ctl00_lnkCat" }):
c = ''.join(node.findAll(text=True))
vu = v
vu.encode('utf-8', 'xmlcharrefreplace')
re.escape(vu)
print n,"|", d,"|", vu,"|", c
Which works really well, but only returns up to the or pipe before VU - it cant go past that!
The UTF-8 encoding is set on all files, HTML and php.
When there is a special character in the V variable, it breaks and stops. If there are no special characters, it works perfectly.
Expected output is:
Colour | 14 July @ 7:30PM | 1000 £ Bend | Clubs & Parties
This ouutput can be seen when running the script on the server (with same python command) but over PHP - i cant get the Venue string back in!
Please help
Rick