your daily cup of tea™

powered by

Categorizing data feeds at CityBikes

Heads up for a boring, but needed, post on categorization. CityBikes (API) aggregates to date 176 data sources of different bike sharing systems. One of the main efforts of this project is keeping everything in order to avoid duplicating implementation of feeds that share the same (or at least similar) syntax. In order to understand how to add a new city to the project it’s necessary to know the categorization of feeds inside CityBikes. Let’s make some analysis to be able to name the feeds for what they are.

The server side of the project has three different components: pybikes, gyro and api. Usually pybikes is the place to hack around when adding a new city. It is a python module that abstracts all the scraping and modeling of bike sharing feeds or, as I call it, bike sharing at your fingertips!

Pybikes currently has 12 modules named after the systems generic codename or company name: PBSC’s bixi (New York Capital BikeShare, Montreal Bixi, London Barclays Cycle Hire, …), ClearChannel’s smartbike (Barcelona Bicing, Mexico EcoBici, Zaragoza bizi), JCDecaux’s cyclocity (Paris Velib, Brisbane CityCycle, Valencia Valenbisi, …), etc. To feed these modules without mixing data and implementation there’s something we call data files, that include a bunch of meta data about the system (city, coordinates, info).

Here’s a basic set of questions that will help define the nature of a feed:

  1. Is there an interactive map available somewhere?:
    1. Yes: Congrats, feed can be included in CityBikes!
    2. No: Is there any page showing the counts of the stations that we can use to aggregate the data into a community map? (ex: bicicard):
      • Yes: Feed can be included, but a list of coordinates of the stations must be included somewhere, for instance using a kml or a geojson file (ex: bicicard module)
      • No: Nothing can be done except from contacting the city council or the company involved. Good luck.
  2. Is there any XHR pointing to a data feed?
    1. Yes: Is it in a parsable format (JSON, XML, …)?
      1. Yes: Awesome
      2. No: easy-level trickery will be needed (html / css selectors)
    2. No: Therefore, the data is contained in the represented html file, some regex trickery will be needed.
  3. Is there any extra XHR when accessing the information of an specific station?
    1. Yes: we call these asynchronous: to keep an updated list of stations we will have to do as many requests as stations are on the system (yuck!)
    2. No: we call these synchronous: only one request per timeframe will be needed to keep this system updated.
  4. Are there any extra steps required to get to the feed or map page such as setting a cookie or an specific header entry?
    1. Yes: There will be an extra request to get to there. Annoying but not a problem.
    2. No: Good.

Some examples:

That’s all for today. Now that we have this official categorization of feeds we can work on a series of posts on how to add each type of system to CityBikes.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.