your daily cup of tea™

powered by

An Architecture of One’s Own

One day I attended a talk about multi-agent systems communicating through ZMQ. That same day I decided I would use it to build the next version of the systems running Citybikes. I felt it was an elegant solution to a problem I had when trying to add bike-share historics. Using agents I could plug in and out of the project without affecting the overall stability. The API would be one agent, historics would be another agent, an old version of the API would be another agent, and so on.

I dreamt of bringing up an agent that would connect to a publisher spitting out json through a socket and graphing bike share availability in real time, for no practical reason whatsoever other than it sounded fucking cool. The thing is, what I had running worked fine and there was no reason to do a
full rewrite besides getting rid of MongoDB.

Over time the codebase turned into legacy code: python 2, running a very old version of mongodb, a very old version of the pymongo driver, a very old version of RQ and little motivation to change things that worked just fine. Every month the mongodb process went down, but restarting it fixed the problem. My solution was a script that restarted the process every day and so, I stopped getting grafana alerts. The API also turned legacy, using a very old version of flask. And still, everything worked fine – I just focused on maintaining pybikes and living my life.

I really do not remember what prompted me to start the rewrite. Existential dread, python 2 EOL, someone complaining that I wasn’t running updates often enough, a new server being unable to run my own software without alternative repos, or this nagging feeling that I had been close to 10 years talking about this multi agent paradigm shit without really trying it. Around June I was doing job interviews and bored of it I started working on the rewrite that became hyper.

Citybikes Hyper is a high-concurrency task scheduling system built for scraping bike-sharing networks using pybikes. Network updates are published over a ZeroMQ PUB socket, allowing external components to subscribe to live data updates.

I hope the guild of architecture astronauts welcomes me

Ok, be wary of buzzwords. It wraps pybikes around a thread pool to use async on it, and start a shit-ton of async workers that consume update tasks from a local queue. Since getting info from the internet is mostly IO bound, async works very well. In the past I was using hard queue systems (celery and then RQ), and always struggled with resources. The system is highly (hyper?) configurable and I probably went overboard with it.

In less than a month I had a full rewrite of my systems already running in production, including the API, for which I just threw away all the old code. No fancy models, no nothing. A subscriber writing to a sqlite db and an api.py file querying the sqlite.

I set it up on a new server, ran it for two days, pointed the DNS record to the new server, killed the old servers and that was it. Nobody noticed, except now my Linode bill had two entries less. But now, oh boy, I had checked another of my conditions for adding historics to citybikes:

A system I could plug in and out of the project without affecting the reliability of the whole system.

This avoids the locking me again in the mongodb scenario: free to try out as many over hyped datastores I feel like without compromising on a particular one. Heck, if I had wanted I could have written a compatibility layer to keep the old systems running as they-were.

I know shitting on micro-services and event driven systems is the new thing, but believe me that for me, on this particular scenario, it is working out fine and using less resources than before, but maybe that is just because I wrote the previous version ages ago.

And with this out of the way, I was finally free to work on adding historical data to Citybikes.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.