Sam's Technical Blog: June 2016

Tuesday, June 28, 2016

Critical Mass of a Community

The holy grail of the Agile Ventures community, and perhaps any community, is to achieve "Critical Mass". That's the point at which the community becomes self-sustaining and activity proceeds without relying on any one particular individual to keep it going. "Critical Mass" is a term from physics which describes the threshold weight of nuclear material required to create a nuclear explosion.

In nuclear material it's the movement of particles called "neutrons" that cause individual atoms (in particular the atomic nuclei) to split apart, or undergo what's called Nuclear Fission. What makes a nuclear explosion possible is that this process of fission releases additional neutrons, which can go on and cause other atoms to split apart. If you have a large enough amount of the right material it's almost inevitable that each neutron generated will collide with another atom as it travels through the material, which generates more neutrons which collide with other atoms and so on. This is called a chain reaction. Have too little material and the neutrons will be leave the material without having hit other atoms, and the chain reaction dies out.

Let's explore the analogy with a community, in particular a pair programming community. Each pairing session could be considered an atom. Assuming the you have one pairing session take place (and it goes reasonably well), you'll end up with two people who are interested in pairing again. They'll then be searching for other pairing sessions, but if there are none available, or none that they happen to be interested in (wrong programming language or platform) then it's likely these two will drift off and perhaps not try to pair in the same community again. However if these two do find other pairing sessions, you can see how the single successful pairing event can lead to two more. Assuming those sessions go well, you have four people now looking to pair and so on.

Under the right conditions you can get a chain reaction. It requires a critical mass of people taking part in pairing sessions. Ideally whenever anyone wants to find a pair, there is always someone there ready to go. Of course all this depends on people being able to find and join pairing sessions and also for them to go well.

Too few people and there's just not that many opportunities for pairing; but lots of people is not enough. Imagine that lots of people are trying to pair but that problems with the interface mean that people trying to join a pairing session end up in the wrong location. No pair partner, no pairing. Michael and I uncovered one problem with the AgileVentures interface last week. Hangouts that had people in them were being reported as "not live" after 4 minutes. This meant that on a fair number of occasions people attempting to join a hangout for pairing or for a meeting would find themselves on their own in a separate hangout.

We've just rolled out a fix for this and hopefully this will be another step towards achieving critical mass in the community. It's unlikely to be the only step required as having a good pairing experience is more complex than nuclear fission. We also want to adjust our user experience to maximise the chances of a good pairing experience for everyone. It's not clear the best way to do that but clearly getting two people into the same hangout at the same time is an important pre-requisite. Things that we're exploring include adding automated "pair rotation" timers to the hangout itself; having users rate their pairing experience; reflecting pairing activity through user profiles and so on.

We need to carefully monitor the changes and fixes we just made to see how the proportion of pairing event participation changes, and continue our Agile iterative process of making small changes and reflecting on their effect. Making it more obvious which events are live might lead to more disruption in pairing events, or it might make observing ongoing pairing events easier, and that might make people more or less inclined to initiate their own pairing events. It's not simple, but with careful measurement hopefully we can find that sequence of changes to the user experience that will lead to critical mass!

Friday, June 24, 2016

Analyzing Live Pair Programming Data

The recent focus for the AgileVentures (WebSiteOne) development team is trying to make the process of joining an online pair programming as smooth as possible. The motivation is two-fold; One, we want to have our users to have a good experience of pairing, and all the learning benefits it brings; Two, we want to have large numbers of users pairing so that we can get data from their activity to analyse. The latter motivation sort of feeds the first one really, since the point of analysing the data is to discover how we can serve the users better, but anyhow ... :-)

Several months back we created an epic on our waffle board that charted the flow from first encountering our site to taking part in a pairing session. We identified the following key components:

Signing Up
Browsing existing pairing events
Creating pairing events
Taking the pairing event live
Event notifications
Event details (or show) page
Editing existing events

The sequence is only approximate as signing up/in is only required if you want to create an event, and not required for you to browse and join events. The important thing was that there were various minor bugs blocking each of the components. We set about trying to smooth the user experience for each of the components, including sorting out GitHub and G+ signup/signin issues, providing filtering of events by project, setting appropriate defaults for event creation and ironing out bugs from event edit and update, not to mention delivering support for displaying times in the users timezone, and automatically setting the correct timezone based on the user's browser settings.

There are still other points that could be smoothed out, but we've done a good portion of the epic. The question that partly troubles me now is how to "put it to bed". A new epic that contains only the remaining issues is probably the way to go, but finally we've got to the point to start analysing some data, since we've got the notifications for the edX MOOC pairing activity flowing to the MOOC Gitter chat fairly reliably and we've just broken through on removing key confusions about joining an event, and working out some problems about the event displaying whether it is live.

This last element is worth looking at in a little more detail as it strongly affects the type of data we are gathering. Creating (and tracking) Google Hangouts for pairing from the AgileVentures site involves creating a Google Hangout that has a particular plugin, called HangoutConnection, that knows the server side event it is associated with. This was originally designed by Yaro Apletov and is written in CoffeeScript. It gets loaded when the hangout starts and attempts a connection back to the main AgileVentures site. Given successful contact an EventInstance object is created in association with the event. This EventInstance includes information about the hangout such as the URL, so that other people browsing the site can also join the hangout without being specifically invited. The HangoutConnection continues to ping the site every two minutes assuming the hangout is live, the plugin hasn't crashed and so on.

What Michael and I identified on Wednesday was that only the first of these pings actually maintained the live status, making it look like all our pairing hangouts were going offline after about 4 minutes. This had been evidenced by the "live now" display disappearing from events somewhat sooner than appropriate. This might seem obvious, but the focus has been on fixing many other pressing issues and usability concerns from the rest of the epic. Now that they are largely completed this particular problem has become much clearer (also it was obscured for the more regular scrums which use a different mechanism for indicating live status). One might ask why our acceptance tests weren't catching this issue. The problem here was that the acceptance tests were not simulating the hit of the HangoutConnection to our site. They were manipulating the database directly, thus as is often the case, the place where the bug occurs is just in that bit that wasn't covered by a test. Adjusting the tests to expose the problem made the fix was relatively straightforward.

This is an important usability fix that will hopefully create better awareness that hangouts are live (with people present in them), and increase the chances of people finding each other for pairing. There's a lot more work to do however, because at the moment the data about hangout participants that is sent back from HangoutConnection gets overwritten at each ping. The Hangout data being sent back from HangoutConnection looks like this:

{
"0" => {
"id" => "hangout2750757B_ephemeral.id.google.com^a85dcb4670",
"hasMicrophone" => "true",
"hasCamera" => "true",
"hasAppEnabled" => "true",
"isBroadcaster" => "true",
"isInBroadcast" => "true",
"displayIndex" => "0",
"person" => {
"id" => "123456",
"displayName" => "Alejandro Babio",
"image" => {
"url" => "https://lh4.googleusercontent.com/-p4ahDFi9my0/AAAAAAAAAAI/AAAAAAAAAAA/n-WK7pTcJa0/s96-c/photo.jpg"
},
"na" => "false"
},
"locale" => "en",
"na" => "false"
}
}

Basically the current EventInstance will only store a snapshot of who was present in the hangout the last time the HangoutConnection pinged back; and data from pings after the first two minute update has been being discarded. We're about to fix that, but here's the kind of data we can now see about participation in hangouts:

#participants #hangouts
Monday:
1 *
Tuesday:
1 *
Wednesday:
1 ****
2 *
3 **
Thursday:
1 ****
Friday:
1 *
2 *
3 *
Saturday:
1 *
Sunday:
1 ***************
2 *
3 *
Monday:
1 ******************************
2 ****
3
4 **

The above is just a snapshot that corresponds to the MOOC getting started; we're working on a better visualisation for the larger data set. We can see a clear spike in the number of hangouts being started, and a gradually increase in the number of hangouts with more than one participant, remembering that the participant data is purely based on who was present at two minutes into the hangout.

If the above data was reliable we might be saying, wow we have a lot of people starting hangouts and not getting a pair partner. That might be the case, but it would be foolish to intervene on that basis using inaccurate data. Following the MOOC chat room I noticed some students at the beginning of the course mentioning finding hangouts empty, but the mood music seems to have moved towards people saying they are finding partners; and this is against the backdrop of all the usability fixes we've pushed out.

To grab more accurate participation data we would need to do one or more of the following:

adjust the EventInstance data model so that it had many participants, and store every participant that gets sent back from the HangoutConnection
store the full data sent back from every HangoutConnection ping
switch the HangoutConnection to ping on participant joining and leaving hangouts rather than just every two minutes
ruthlessly investigate crashes of the HangoutConnection

With reliable data about participation in pairing hangouts we should be able to assess some objective impact of our usability fixes as they roll out. We might find that there are still lots of hangouts with only one participant, in which case we'll need to investigate why, and possibly improve awareness of live status and further smooth the joining process. We might find that actually the majority of hangouts have multiple participants, and then we could switch focus to a more detailed analysis of how long participants spend in hangouts, getting feedback from pair session participants about their experience, and moving to display pairing activities on user profiles to reward them for diligent pairing activities and encourage repeat pairing activities.

Personally I find this all intensely fascinating to the exclusion of almost everything else. There's a real chance here to use the data to help adjust the usability of the system to deliver more value and more positive learning experiences.

Monday, June 20, 2016

Moving Beyond Toy Problems

What is life for? To follow your passion. What is my passion? I find myself frustrated with the closed source, closed development nature of many professional projects; and on the flip side equally frustrated with the trivial nature of academic "toy" problems designed for "learning".

I love the "in principle" openness of the academic sphere and the "reality bites" of real professional projects. I say "in principle" about academic openness, because while the results of many experiments are made freely available, sharing the source code and data (let alone allowing openness as to the process) is often an afterthought if it is there at all. The MOOC revolution has exposed the contents of many university courses which is a fantastic step forward, but the contents themselves are often removed from the reality of professional projects, being "toys" created for the purpose of learning.

Toy problems for learning makes sense if we assume that learners will be intimidated or overwhelmed by the complexity of a real project. Some learners might be ready to dive in, but others may prefer to take it slow and step by step. That's great - I just don't personally want to be spending my time devising toy problems, or at least not the majority of my time. Also it seems to me that the real learning is the repeated compromises that one has to make in order to get a professional project out the door; balancing the desire for clarity, maintainability, readability and craftsmanship against getting features delivered and actually having an impact that someone cares about.

Professional projects are typically closed source, closed development; although there are more and more open source projects in the professional sphere; the basic idea seems to be: we are doing something special and valuable, and we don't want you to see our secret sauce, or the mistakes we are making along the way. Thus it might be considered anti-competitive for a company to reveal too much about the process it uses to develop its software products. That said, companies like ThoughtBot publish their playbook, giving us an insight into their process and perhaps increasing our trust that their process is a good one. Even so we don't get to see the "actual" process, and so that's not ideal for others trying to learn, but then most companies are not trying to support the learning process for those outside.

Personally I want to have a global discussion that everyone can take part in, if they want to. I want an informed debate about the process of developing software where we have real examples from projects - real processes - where we can all look at what actually happened rather than relying on people's subjective summaries.

Maybe this is just impossible, and an attempt at the pure "open development" process of AgileVentures is destined to fail because by exposing exactly how we do everything we can't build up value to sustain our project? That's what professional companies do right? They have a hidden process, focus attention on the positive results of that process and then increase the perception that they have something worth paying for. To the extent that they are successful they are building up reputation that will sustain them with paying customers, because those customers are inclined to believe the chance is good they'll get value for money.

If the customer had total transparent access to every part of what goes on, they could just replicate it themselves right? :-) Or a competitor could provide the same service for less? However there's a strength in openness - it shows that you believe in yourself and you demonstrate that you've followed the hard path through the school of knocks and maybe you are the right people to be able to adapt fast to the next change, even if others could copy aspects of what you do.

Everyone should have the opportunity to learn and boost their skills by taking part in such an open process. The shy might not want to participate directly, but they can learn by observing a real process that they won't have till they can actually start in a job. It's the old catch-22 "no job, no experience; no experience, no job".

This is what I stand for, what AgileVentures stands for. An open process beyond open source, we call it "Open Development". Meetings with clients, planning meetings, coding sessions, everything is open and available to all. Where customers have confidential data, that is hidden, but otherwise we are as open as possible. Of course that generates too much material for anyone to absorb, and we need help curating it, but the most amazing learning happens when new folk engage and take part in the process - thinking about the code in their pull requests in relation to the needs of a non-technical client being articulated by a project manager who cares about their learning, but also about the success of the project. Come get involved, be brave, be open and get that experience that you can't get without a job, or that you can't get in your current job.

Tuesday, June 28, 2016

Critical Mass of a Community

Friday, June 24, 2016

Analyzing Live Pair Programming Data

Monday, June 20, 2016

Moving Beyond Toy Problems

Sam's Technical Blog

Followers

Sam's Academic Research Papers

Blog Archive

About Me