This month in Citybikes #202509
Years ago I used to post on twitter every new thing going on in Citybikes. But twitter is a wasteland. Mastodon is cool, but social networks nowadays are meant to be consumed on the instant, so whatever gets posted there matters for a week and then gets buried in the noise.
I usually write long form posts on this blog and whenever I wanted to write something about Citybikes I got a writer’s block thinking on the sheer size of what I wanted to write and how far away in time I need to go to recap. It’s been 15 years of running this project. For example, I would love to write a post about the historical bike share datasets we are now publishing as parquet files, or the technical details behind the new progressive web app.
For a change, let’s try something simple: a monthly changelog. I might still write a post about something in particular from time to time if I feel like it. But the idea is to keep these on point and straightforward so I can come back to these later and reflect. Not that it really matters on today’s internet, these are more for me than for you.
Without further ado
Increasing pybikes health
All the action this month has gone working on pybikes. The general plan was to bring the health of the python library up. Health is the ratio between working systems and failing systems.
Ideally it will be at 100% health, but anything around 90% is acceptable. We can’t control bike share systems shutting down or changing their implementation. But having systems that are outdated does not help anybody. For example, right now it is at 95%, with 32 of the 798 supported systems failing. These reports are generated when tests run.
Automated syncs with nextbike
Nextbike is a german provider of bike share systems, with operations in many countries. We have had support for nextbike since 2014, making it one of the oldest implementations surviving the passage of time in pybikes.
Being a large provider, nextbike creates and shuts down many systems. Sometimes these are systems that are in a testing phase with a city council, and they are scrapped off after a year. I used to get very anxious about pull requests adding nextbike systems to pybikes. First, because nobody is really using this data for anything useful. Second, because I know they are going to become a legacy that I will need to remove at some point. And finally, because these are usually large and difficult to review. I like to think of the networks we support at Citybikes as a curated list of interesting systems. Going for adding more systems just for the sake of having higher numbers does not make much sense to me. But after time I have come to realize this is just my perspective from the other side as a maintainer. For the rest of the people, the more networks supported, the better.
Nextbike publishes a feed with all their data: http://maps.nextbike.net/maps/nextbike-live.json, and this feed can be used to add instances to pybikes. Pybikes expects instances to live in a json file, so we can’t do any server side processing of this file to generate the instances on runtime. The contract with pybikes is that it is a library that works locally with instances declared on the meta json files, curated by the community. If you want to change the name of a bike share system, you just edit one of these meta files and submit a pull request. text + git is our interface and this is how I like it.
How can we keep this thing both maintainable and healthy?
It’s relatively easy to generate the instances based on the nextbike feed and I have had different forms of such a script for years. The problem is when you want to go for correctness and for such case, there’s no perfect script that will cover all the corner cases. This time I have sorted it out by writing a script that gets you half the way there. And, surprisingly, it’s more than enough. Most of the work needed was on changing the feed to conform to certain conditions, so the script can work with it.
Every morning a workflow runs the script and creates a pull request with the changes (if any). I can go then to that PR and review it or make changes if necessary. No more broken nextbike systems!
- https://github.com/eskerda/pybikes/pull/837
- https://github.com/eskerda/pybikes/pull/838
- https://github.com/eskerda/pybikes/pull/845
- https://github.com/eskerda/pybikes/pull/847
- https://github.com/eskerda/pybikes/pull/849
My main takeaway from this is to not always try to aim to full correctness. If you told me years ago this could be automated, I would have called you a fool. And today, I would call you a fool if you told me I need to write these by hand. I might still need to review these by hand, edit their name or the name of their city, but it’s significantly less work to maintain now.
Github actions hate
I entered a rabbit hole while writing the github workflow though. I am using https://github.com/peter-evans/create-pull-request action to generate the pull request which really is the most straightforward way of doing it (if you want to use github actions, that is). By default, it uses GITHUB_TOKEN
and when using it, it does not trigger additional workflows on the pull request. Taken from the horses mouth:
When you use the repository’s
GITHUB_TOKEN
to perform tasks, events triggered by theGITHUB_TOKEN
, with the exception ofworkflow_dispatch
andrepository_dispatch
, will not create a new workflow run. This prevents you from accidentally creating recursive workflow runs. For example, if a workflow run pushes code using the repository’sGITHUB_TOKEN
, a new workflow will not run even when the repository contains a workflow configured to run whenpush
events occur. For more information, see Use GITHUB_TOKEN for authentication in workflows.
The alternative is to create a Github app to generate these tokens, or create a personal access token with fine grained permissions and impersonate the bot. At that point my head was already hurting, so I went for a PAT with fine grained permissions for now, which I guess is still a huge security liability. This workflow only runs on master
, so assumedly I do not need to think on the risks of this running on forks and only members with repo access could leak it. I guess, tokens be tokens, and for anything serious I should be running this on my servers instead. But my general impression with github actions always ends up the same. Things are easy until they are not.
Russian systems
None of the systems we support in Russia worked, so I removed them 8d5a9b and 81cda9. For a while, the problem was the servers were rejecting non-russian ips (this could be easily solved by using a proxy in Russia, but I really do not want to pay for exit nodes). This time, they changed implementation and while looking into it I was surprised they use supabase as the backend. Speaking of tokens, I really hope this one on their app has been deactivated and it’s using some kind of auth after signing in or something else, because if not, woops.

Bringing back support for systems in Finland
There’s open data feeds available from https://digitransit.fi/. They were one of the first to create an open data portal for transport feeds, using OpenTripPlanner (OTP). Back in April they deprecated the feeds using OTP v1. This meant updating the otp parser to use the graphql feeds from their OTP v2 instance.
We only supported Helsinki, but being that their feed covers more cities we added them too. It includes feeds from freebike, donkeyrepublic and smoove. Of interest, some of the systems added use dockless bikes, so this was a good opportunity to test our own implementation of free floating bikes on pybikes. These are the full changes https://github.com/eskerda/pybikes/pull/848
Sometimes adding feeds from open data portals instead of the implementers directly feels weird. Both freebike and donkeyrepublic offer their feeds as GBFS, so we could go for that instead. What’s actually happening is that we are taking a feed that has been converted between different formats and specifications (implementor (GBFS) -> open data portal (otp) -> citybikes). Given that we also offer a gbfs endpoint now we have gone full circle! But when available, it’s always a good idea to take feeds from open data portals, since at least we know its license.
The cities we are covering are: Helsinki, Lappeenranta, Kotka, Kouvola, Turku, Kuopio, Lahti and Tampere.
In case you are curious what a test report ends up looking like, here’s a sample for the digitransit run.
Closing remarks
So I think this was all the major changes this month. Keeping health up on pybikes is a long-running never ending task, but this year we have managed to bring it up from 80% (156 failing systems) to 95% (32 failing systems, which is a cute number). We have removed lots of systems and added some, and hopefully we pass the 800 mark before the end of the year. Huge thanks to the many contributors that made this possible, and without doubt to NLnet for their ongoing support!
This post has turned out rather long, so I hope I can keep it up next month :)