20190704

Spike Notebooks - Better Agile Discovery for Developers

Spike Notebooks - Better Agile Discovery for Developers

tldr; apply data science tooling to software development research spikes.

Problems & Rabbit Holes

I don't like it when my team come to me with a problem and then don't offer up a solution. Sometimes they can see what is going wrong, occasionally they think they have a idea of how to solve it. Probably worst of all they have no idea how long solving the issue is going to take. Another common pain point is the phrase "yes I have looked at it and ..." which normally means they have just read a blog post like this one ;) and are walking blindly into someone else's solution.
In an agile process (during backlog refinement) this blind uncertainty is problematic and your typical agile team will attempt to resolve this through a spike, which is "a small task done to reduce uncertainty about a larger task". It is usually aimed at finding out why, what or how the next step should be completed.
The issue that consistently crops up is that the work done on the spike is valuable to the organisation as a whole, as it is a form of research, and what often happens is that some of the results of the spike will filter into a README.md, code comments or a gist or pastebin (if you are lucky) - but the majority of the technical effort is lost in a developer workspace or a defunct code branch. There is no real way to merge the spike work into the whole project reliably.

I have seen developers that will happily march into the thick of the problem and just start "fixing" things in the hope that it all comes our right and proper in the end. But I also see those same developers getting stuck in the rabbit hole: and by this I mean that for example, in order to solve problem (a) they introduced library (b) which caused a version incompatibility with (c) whose update requires a monkey patch (d) which meant we had to introduce CI/CD builds for component .... and so on down the rabbit hole. In other words, directly solving the problem means solving lots of problems as a chain.
At the other end of the scale a developer might read a blog or the documentation and establish that the solution is theoretically possible. This might be enough if the docs and the software are reliable, but this plan will still carry risk because it has not been proven at the critical points - and it is this risk that will blow any estimate of the resulting implementation work.
Solutions that require architectural changes are particularly prone to the kind of problems described above, either spending too long spiking the architecture or not de-risking the architectural change enough.

What is needed is a way of demonstrating that a solution exists, proving the risky parts of the solution and showing how they fit together. This might involve some coding, perhaps stepping into a rabbit hole, but then being able to backtrack on that and take a different path without having to revert a branch or comment out great swathes of code - all the time preserving the discoveries made and the thinking behind it. The goal needs to be to gather enough information about the solution to make a firm estimate of the work to implement with minimal uncertainly.

Welcome Data Scientists

Over the last 2 years I have been running a variety of transitional development teams, teams of data scientists and mixed data+engineer teams. The data scientists have brought in new skills, new approaches and previously unseen tools to the world of software development. Jupyter Notebooks are one such tool. A developer might think of a a notebook as a collection of markdown documentation, executable code snippets and recorded view-able results (including graphs and images); here is an example.
Photo by FloorTwelve on Unsplash

So can you use notebooks for spikes? They need to be:

  • Easy for the team to use: file CRUD, correct language
  • Replicate enough production code to work for a spike
  • Review-able by other team members
  • Kept with the code, but not overlapping it
  • Link-able, documented and demonstrable
  • Isolated enough to run alternate versions
  • Lead into clear decision making (e.g. LADR)
  • Seed TDD with fixtures and case outcomes
  • Supporting teamwork

Are notebooks easy enough to use?
Notebooks used to be hard to run and quite a learning curve, but that is changing. The Jupyter Docker Stacks project is making ease of use a reality. And efforts like Jupyter Lab are all heading towards better drop-in work environments. The following is an example of how to create the basic python3 jupyter lab on a developer laptop using docker (because you are all using containers now right?):
docker run -d --name jupyter -v /home/${USER}/workspace :/home/jovyan/work -e GEN_CERT=yes -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 jupyter/minimal-notebook
Then you can add (or compose) your own language kernel. There are many to choose from covering JavaScript, Java, Kotlin, TypeScript, Ruby, Go, C# and obviously Python. If you are using R or Python ML/data science libraries (e.g. tensorflow) then you can get them by using a more specialised stack image without having to go through an extra install step. Scripting languages are more aligned to this form of development, so there are more hoops to jump through to get a compiled language working.

Can you work on the spike?
This really depends on how well your software is written and structured. The principle is that you pull in the code you don't want to change, write the code you do want to change in the notebook and then execute the code and/or tests to see the differences. The notebook magic %load can be used to pull in your existing code or snippets of it. The %%script -bg magic allows you to start supporting systems (e.g. databases), and the %run magic will start your program from its entry point. If you need to temporarily overwrite code you can use %save to replace or patch a file.

How does the team review the spike?
I expected this to be a straightforward step, but it is more complicated than you think. Notebooks are stored in JSON format with a lot of timestamps and binary blocks and so a typical code pull review system will treat them as diff-able, which means you have to always click through to view the original. You can export the notebook to PDF or markdown and review that, but that takes you outside the main code process. Github renders the notebook in a readable form, but it is a bit flaky on larger notebooks. Bitbucket requires a plugin to work. The online renderers like https://nbviewer.jupyter.org/ will only show public notebooks. This leaves you with 2 choices: 1) run a private nbviewer, or 2) use a browser plugin. We opted for the plugin because of an IT policy, and I am planning on releasing the plugin to the chrome web store soon. By sticking to a process that closely follows code review the existing agile steps still apply. Incidentally there is an option to export the notebook as reveal.js slides.

Where does the notebook live?
We have ended up putting one notebook file (.ipynb) next to the code for each question we are asking, and then have a notebook at a higher level to link them and represent the spike as a whole. This takes advantage of the markdown documentation feature, which can create hyperlinks between relatively addressed files. So, for example, a high level notebook might be .../Spike - How do we make database X globally available.ipynb, which might refer to .../terraform/myproject/Can we tunnel Redis across regions.ipynb and .../src/mycomponent/dao/Can we encode X data in Redis.ipynb; then you might also have .../test/What is the failure rate of packets over tunnels.ipynb; and finally you might add later in the process .../test/What is the cost of using GCP spanner.ipynb. This accumulates knowledge and experience alongside the code it is affecting. Notice that spikes are not about answering all the questions, just the critical ones and the most unknown, engineers should be able to accurately estimate the work to complete at the end of the spike sprint. When we re-structure we move notebooks to an archive directory so that they don't get buried in the git history. We aim to create new notebooks rather than change notebooks because of the problems merging large blocks of high-change JSON.

How does this help with the development process?
In the example above we have an overall unknown (summary), an infrastructure unknown (temporary infrastructure), a code unknown (temporary code) and 2 operational unknowns (graphs and charts). At any point someone can jump back in and re-run the notebook and see if the result changes - for example if GCP Spanner pricing changes. In theory you can use notebooks for tracking A/B testing, but that is a whole different conversation.
Notebooks not only provide a neat way of hanging on to the knowledge created during the spike but also provide a means to quantify the benefits and risks of a change proposal (e.g. it will cost $900/m to expand to Australia maintaining 3-9s error rates). These kinds of information would normally get estimated and end up in someones drive as a spreadsheet instead of being openly referable in the code base. It is this transparency that provides the key benefit of this approach.
It is straightforward to reference a spike notebook if you are using decision tracking (e.g. LADR). You can use the %save magic or just use additional files to create data for TDD, especially if your spike is calling a real API with sample data making the fixture data as realistic as possible.

Who works on the notebook?
This really depends on the spike, but a common format is pairing a product manager/owner with a developer to flesh out the notebooks to start with and then let the developer loose from there. The nature of notebooks makes it hard to share them for co-editing, so physical (or virtual via screen-share) pairing works best. The pair will usually present their summary and selected highlights at a weekly show-and-tell.

Spike Notebooks

Spike Notebooks provide a clean and concise way to capture the researching of new software solutions. Handled well they can dramatically improve the transparency of the work done on the spike and lead to better sprint estimates, decision making and BDD/TDD approaches.