Personal Feed Framework (PFF)
From UrbanWiki
Tutorial on PERFF: I created this tutorial for JSON clients and servers a couple of days ago for new members of our group. I embedded everything into a doc file: http://wiki.urban.cens.ucla.edu/images/e/ef/JSON_Brief_Tutorial.doc
- below are some notes that should help people understand the perff code
Contents |
[edit] WSGI
- WSGI (Web Server Gateway Interface) is a Python standard for communication between web servers and applications
- spec: http://www.python.org/dev/peps/pep-0333/
- a simple echo server code sample:
# wsgiref is built-in
from wsgiref import simple_server
# a callable (function in this case) to handle http requests
# environ is a dictionary that contains details about the request
# start_response is called to set http response status and headers
# the function returns the actual response as an iterable (list)
# the echo_app assumes some text has been posted to the server and simply
# echo them back
def echo_app(environ, start_response):
content_length = environ["CONTENT_LENGTH"]
# environ['wsgi.input'] is input stream to the raw post data
# MUST supply the amount to read indicated by content_length
content = environ['wsgi.input'].read(int(content_length))
headers = [('content-type', 'text/plain'),
('content-length', str(len(content))]
start_response('200 OK', headers)
return [content]
# server will listen on port 8000, using the echo_app callback to handle incoming requests
# each request spawns a new thread, done by the wsgi implementation
server = simple_server.make_server('', 8000, echo_app)
server.serve_forever()
- Instead of functions, the callback can be any callable object in python
- sample echo_app using an echo class:
from wsgiref import simple_server
class Echo(object):
def __init__(self, prepend):
self.prepend = prepend
# implement __call__ to make this class callable
def __call__(self, environ, start_response):
content_length = environ["CONTENT_LENGTH"]
content = environ['wsgi.input'].read(int(content_length))
body = self.prepend + ' ' + content
headers = [('content-type', 'text/plain'),
('content-length', str(len(body))]
start_response('200 OK', headers)
return [body]
# instantiate an Echo object and pass as the callable
# now requests to the server will receive back their posted
# message with "Echoing back" prepended
server = simple_server.make_server('', 8000, Echo("Echoing back"))
server.serve_forever()
- You can create chains of wsgi middleware to process each request
- example http basic auth middleware:
import base64
from wsgiref import simple_server
class HTTPBasicAuth(object):
# this class is a wsgi middleware that does http basic authentication
# using a supplied auth function
# calls the actual app to handle requests if user has been authenticated
def __init__(self, app, auth_func):
self.app = app
self.check_password = auth_func
# if not authorized, send back a 401 response
# else pass the request along
def __call__(self, environ, start_response):
if not self.authorized(environ):
return self.auth_required(environ, start_response)
return self.app(environ, start_response)
# checks if user is authorized
# environ['PATH_INFO'] contains the url after domain name
# e.g. www.yahoo.com/check --> environ['PATH_INFO'] = '/check'
def authorized(self, environ):
if not self.path_needs_auth(environ['PATH_INFO']):
return True
auth_header = environ.get('HTTP_AUTHORIZATION')
if not auth_header:
return False
auth_type, encoded_info = auth_header.split(None, 1)
assert auth_type.lower() == 'basic'
decoded_info = base64.b64decode(encoded_info)
username, password = decoded_info.split(':', 1)
if not self.check_password(username, password):
return False
# middleware can modify environ before passing requests on
environ['REMOTE_USER'] = username
del environ['HTTP_AUTHORIZATION']
return True
def path_needs_auth(self, path):
#do url checks here
if path == "/restricted":
return True
return False
def auth_required(self, environ, start_response):
headers = [('Content-type', 'text/html'),
('WWW-Authenticate', 'Basic realm="cens"')]
start_response('401 Authentication Required', headers)
return ["""
<html>
<head><title>Authentication Required</title></head>
<body>
<h1>Authentication Required</h1>
Bad username or password
</body>
</html>"""]
# the actual request handler sends back a greeting and the url path
def app(environ, start_response):
user = environ.get('REMOTE_USER')
if not user:
user = "random person"
content = "Welcome %s! Path is %s." % (user, environ['PATH_INFO'])
start_response('200 OK', [('content-type', 'text/plain'),
('content-length', str(len(content)))])
return [content]
# this naive password checker only accepts an user with username and password of 'cens'
# a more realistic checker will perhaps open a database connection and check user info
def check_password(username, password):
if username != 'cens' or password != 'cens':
return False
return True
server = simple_server.make_server('', 8080, HTTPBasicAuth(app, check_password))
server.serve_forever()
- there can be any number of middleware chains like this, each processing a different part of a request
- notable/useful environ variables:
- REQUEST_METHOD - GET or POST
- PATH_INFO - The remainder of the request URL's "path", designating the virtual "location" of the request's target within the application. This may be an empty string, if the request URL targets the application root and does not have a trailing slash
- QUERY_STRING - The portion of the request URL that follows the "?", if any. May be empty or absent.
- CONTENT_LENGTH - The contents of any Content-Length fields in the HTTP request. May be empty or absent
- CONTENT_TYPE - The contents of any Content-Type fields in the HTTP request. May be empty or absent.
- HTTP_ variables - Variables corresponding to the client-supplied HTTP request headers (i.e., variables whose names begin with "HTTP_"). The presence or absence of these variables should correspond with the presence or absence of the appropriate HTTP header in the request
- url reconstruction method - reconstruct full original url requested (contributed by Ian Bicking)
from urllib import quote
url = environ['wsgi.url_scheme']+'://'
if environ.get('HTTP_HOST'):
url += environ['HTTP_HOST']
else:
url += environ['SERVER_NAME']
if environ['wsgi.url_scheme'] == 'https':
if environ['SERVER_PORT'] != '443':
url += ':' + environ['SERVER_PORT']
else:
if environ['SERVER_PORT'] != '80':
url += ':' + environ['SERVER_PORT']
url += quote(environ.get('SCRIPT_NAME',''))
url += quote(environ.get('PATH_INFO',''))
if environ.get('QUERY_STRING'):
url += '?' + environ['QUERY_STRING']
- python paste is a wsgi implementation/server/library that provides useful middleware and other fancy stuff
- http://pythonpaste.org/do-it-yourself-framework.html go here for more examples
[edit] JSON
- Javascript Object Notation - a lightweight data interchange format
- http://www.json.org/ this site covers it pretty well and provides the full grammar
- PERFF uses this format extensibly
- JSON strings can be mapped directly to python dictionaries and lists
- http://pypi.python.org/pypi/simplejson use simplejson library to convert between python objects and JSON strings
import simplejson
jsonString = """ {"a":1, "b":2, "c":"Cat", "list":[1, 23, "blah", [34, "z"], {"dict":"nested"}, 70] } """
pyDict = simplejson.loads(jsonString)
# pyDict is now the dictionary that corresponds to jsonString, nested lists and all
# note that keys and string elements will be converted to python unicode strings
pyDict['c']
# this gives u'Cat'
print pyDict['c']
# this prints Cat
print simplejson.dumps(pyDict)
# this prints '{"a":1, "b":2, "c":"Cat", "list":[1, 23, "blah", [34, "z"], {"dict":"nested"}, 70] }'
- sample client to post json data to a server:
import httplib, simplejson
json = {"startProb":None,"trace":[{"updated": "2007-11-21 19:29:51", "created":"2007-11-21 19:29:51", "sbid": "1142",
"phone_time": "2007-10-09 16:30:11", "lon":"-118.246063333", "collection":"6b84ce7c-9867-11dc-b11c-001c42ec2666",
"gpsSpeed": "0", "freeway": 0, "havSpeed": "0", "lat": "34.1483683333", "id":"20e37962-9868-11dc-b11c-001c42ec2666",
"mapActivityClass": "still"},{"updated":"2007-11-21 19:29:51", "created":"2007-11-21 19:29:51", "sbid":"1143","phone_time":
"2007-10-09 16:30:16", "lon": "-118.246063333", "collection":"6b84ce7c-9867-11dc-b11c-001c42ec2666", "gpsSpeed": "0",
"freeway": 0,"havSpeed": "0", "lat": "34.1483683333", "id": "2125907c-9868-11dc-b11c-001c42ec2666", "mapActivityClass":
"still"}]}
jsonString = simplejson.dumps(json)
conn = httplib.HTTPConnection("localhost:8000")
headers = {"Content-type": "text/plain", "Content-length":str(len(jsonString)) }
conn.request("POST", "/activity", jsonString, headers)
response = conn.getresponse()
print response.status, response.reason
print response.read()
conn.close()
[edit] Connecting to PostgreSQL database from Python
- psycopg2 - http://www.initd.org/tracker/psycopg/wiki/PsycopgTwo
- python dbapi2 spec - http://www.python.org/dev/peps/pep-0249/
import psycopg2
try:
conn = psycopg2.connect("dbname='template1' user='dbuser' host='localhost' password='dbpass'");
except:
print "I am unable to connect to the database"
# use cursor objects to execute queries and get back results
# psycopg2 is thread safe level 2, which means threads can share the module and connection, but not cursors
cur = conn.cursor()
# execute a sql statement
cur.execute("""SELECT a, b from table""")
# cur.rowcount specifies the number of rows that the last executeXXX() produced
if cur.rowcount > 0:
# fetch one row, or next row as a sequence (e.g. tuple or list)
row = cur.fetchone()
a1 = row[0]
b1 = row[1]
# fetch all remaining rows as list of lists
rows = cur.fetchall()
a2 = rows[0][0]
b2 = rows[0][1]
cur.close()
conn.close()
[edit] PostGIS sample queries
- PostGIS documentation: http://postgis.refractions.net/documentation/
- PostGIS quick guide: http://www.bostongis.com/postgis_quickguide.bqg
- creating a table with a geometry column by normally creating a table, and then use AddGeometryColumn:
CREATE TABLE sample(
sbid INTEGER PRIMARY KEY,
imei_phone VARCHAR(20),
lat NUMERIC,
lon NUMERIC,
phone_timestamp TIMESTAMP,
exposure FLOAT,
);
-- SELECT AddGeometryColumn([schema_name], [table_name], [column_name_to_add], [SRID], [geometry_type], [dimension])
SELECT AddGeometryColumn('test', 'sample', 'the_geom', 4326, 'POINT', 2)
CREATE TABLE exposures(
exposure FLOAT
);
SELECT AddGeometryColumn('test', 'exposures', 'the_geom', 4326, 'POLYGON', 2)
-- AddGeometryColumn ensures that metadata info about the geometry column added (e.g. srid, dimension, type, table) will be
inserted into the postgis 'geometry_columns' table (if using current schema, schema_name can be omitted)
-- likewise, use DropGeometryColumn or DropGeometryTable when deleting
-- srid of 4326 is usually right for gps data
- Creating index on geometry columns:
-- CREATE INDEX [indexname] ON [tablename] USING GIST ( [geometryfield] ); CREATE INDEX polygon_index ON exposures USING GIST (the_geom ); -- Collect statistics after making index: VACUUM ANALYZE [table_name] [column_name]
- Sample SQL queries:
- distance between two points:
SELECT distance_sphere(GeomFromText('POINT(lon1 lat1)', 4326), GeomFromText('POINT(lon2 lat2)', 4326));
-- lon1, lat1, lon2, lat2 are longitude latitude values, return value is in meters
-- GeomFromText (or PointFromText) takes a Well Known Text representation of a geometry and returns the geometry specified
- polygon that contains a point:
SELECT AsText(the_geom), attributes
FROM polygon_table
WHERE the_geom && GeomFromText('POINT(lon lat)', 4326) AND Contains(the_geom, GeomFromText('POINT(lon lat)', 4326));
-- AsText function returns Well Known Text representation of geometry (here the geometry is from the geomtry column named
'the_geom' in table 'polygon_table'
-- the && operator makes use of geometry index if one was created on the_geom, uses bounding box tests to filter out
- updating points with exposure info:
UPDATE TABLE sample
SET exposure = (SELECT exposure
FROM exposures
WHERE exposures.the_geom && sample.the_geom AND
Contains(exposures.the_geom, sample.the_geom))
WHERE sample.exposure IS NULL;
-- here we're setting the exposure info in the sample table according to which polygon in the exposures table contains the
lat lon point
- closest roads/points from a point (e.g. within 100 meters):
SELECT s.road, s.distance FROM (SELECT road, distance(transform(GeomFromText('POINT(lon lat)', 4326), 26911),
transform(the_geom, 26911)) AS distance
FROM a_streets_table
WHERE expand(transform(GeomFromText('POINT(lon lat)', 4326), 26911), 100) &&
transform(the_geom, 26911)) AS s
WHERE s.distance < 100 ORDER BY s.distance;
-- the transform function transforms a geometry from one projection to another so we can compare distance in meters using
srid 26911
-- use function ST_DWithin if using PostGIS 1.3.1+
SELECT road, distance(transform(GeomFromText('POINT(lon lat)', 4326), 26911),
transform(the_geom, 26911)) as distance
FROM streets_table
WHERE ST_DWithin(transform(GeomFromText('POINT(lon lat)', 4326), 26911), transform(the_geom, 26911), 100)
- other potentially useful postgis functions:
- line_locate_point and line_interpolate_point - can be used to map a point to the closest point to a nearby road
- centroid - returns the center of a polygon or multipolygon
- intersects - tests if two geometries spatially intersect
- see postgis documentation chapter 6 for full list of functions

