Work

Operational paradigm shift

Increasing the operational scalability of a crowd-sourced app testing platform and building deeper engagement with trusted members of the crowd through opportunities and incentives.

Company

Global App Testing

Role

Head of Product & Design

Date

March 2024

Tools

Miro

Context & objective

Project Platform was a top priority project for the company in 2024. The goal was to reduce operational overhead in order to increase the company's crowd-testing platform scalability and profitability. I was the executive owner of the project and I also worked as an individual contributor, serving as the Product Manager and Product Designer for a team of 5 software engineers. This project continued a theme of digital transformation for the company, shifting its operational paradigm. I started with data analysis, where I saw that we spent a lot of time moderating the thousands of results reported by our community of testers each month. Comparing this with other areas of internal effort and after discussions with the team, I prioritised solving the problem of operational overhead as a dependency in the moderation process with the objective of reducing that dependency by more than 80%.

Interviews

I began by speaking with each member of the project team about moderation. In order to achieve our goal, two ideas quickly surfaced: removing the moderation process entirely, and using members of the crowd as moderators to replace test managers. However, there was a lot of fear, uncertainty, and doubt regarding these two ideas.

The main concern from stakeholders was ensuring that we did not degrade the quality of results delivered to customers.

I also caught up with an internal team member, a test manager, who wasn't in the project team. They were hired after someone had noticed the high quality of work they produced as a member of the testing community. This conversation inspired me to keep pursuing the idea of how we might overcome internal fears and achieve our objective.

Task modelling

After reviewing my notes from the interviews, it seemed clear to me that I needed to understand what the moderation process was in more detail. So I set up some time to work with two test managers and learn from them.

I set up video calls with each of them to observe them moderating a set of results and then facilitated a discussion with them about the process.

I was surprised to learn that there wasn't a predefined process that they both followed, but encouraged that we could achieve significant impact. After reviewing my notes from the observation sessions and the discussion with them, I mapped out a model of the moderation task with them (synthesising what each of them did and applying a linear structure to it) in Miro, shown below.

"Acceptance rate"

Doing away with moderation would immediately reduce the operational overhead involved, but I wanted to invalidate the assumption that we could remove it, before proceeding. I used a metric that we already tracked called the 'acceptance rate' to explore the effect of removing moderation.

The acceptance rate metric helped me quickly invalidate the assumption of removing moderation entirely and allowed to focus on the next assumption: we can use the crowd to moderate results.

Notebook explorations

After modelling the task of moderation, I spent some time doing some divergent thinking and exploration in my notebook. This helps me challenge and reframe problem statements and approach solutions with a first principles mindset.

I explored the 'weekend problem' (see below) and its components to identify possible solutions. Some, but not all of these ideas progressed.

I explored the role of test managers (often called simply 'TMs') and an idea for creating community test managers (or 'CTMs') who could perform moderation and other TM activities for us.

I explored how we might structure payouts for crowd moderators.

Below are a few photos of some of the notebook explorations I did exploring moderation.

The "weekend problem"

I designed Project Platform as a cross-functional problem-solving team that included myself, another product squad (PM + Designer), two dedicated test managers from the Operations department, and a stakeholder from the Operations department.

We broke with the convention of 'shifts' with 24/7 coverage which meant that the test managers in the project only worked regular office hours Monday–Friday, which left zero support for managing tests over the weekend. The stakeholder from the Operations department and myself filled-in over the weekend, but this was not a long-term solution. We needed to find solutions that would allow us to support running tests over the weekend, without operational overhead.

Experiment

To address the concerns of stakeholders and help reduce the fear and uncertainty of the test managers, I designed an experiment that would help us test the assumption that crowd moderation would not degrade the quality of test results we delivered to customers.

I reached out to the team who manages and supports the tester community and asked for recommendations of 5 people that they thought were reliable and professional. I reached out to each of those individuals and recruited them to participate in the experiment to moderate results.

I sent each of them a set of 25 test case results in a spreadsheet and asked them to track how long it took them to finish moderating. I then analysed the results and put them into Miro for everyone to see (shown left).

The results were clear:

All 5 agreed on the moderation outcome
It took less than an hour of their time
3 of the moderators went above and beyond, improving the quality of the results (editing comments left by the reporting testers)

Strategic options

After reviewing the results of the experiment with the team, we were excited to move forward with the idea of using trusted members of the crowd to take on moderation work from our internal test managers, but there was a high-level decision that I felt needed to be made before I began creating wireframes:

Where should the 'community test managers' do moderation?

I set up a discussion with the engineers on my team to explore the pros and cons of different strategic options. After the discussion we did a blind vote of which option each of us preferred and there was a clear winner that I agreed with which gave me confidence moving forward.

Re-using existing UI that we already had in place, and segmenting user roles allowed us to simultaneously move faster with lower scope and pursue engineering initiatives to simplify our system.

Insight

After the discussion with the engineers, I opened my notebook to get started on the solution for crowd moderation using the same interface that our test managers were already using to manage tests and moderate results.

By focusing on the strategic options first, I realised that we didn't need a lot of new UI in order to achieve our goal. It seemed to me that all we would need would be implementation guidance for the engineers to achieve the following:

Segmenting users to differentiate between test managers and community test managers
Assigning community test managers to a test, and
Restricting community test manager access in the UI to only the data required for moderation

To comply with our security policies and GDPR, I mapped out exactly what views and information community test managers could access, shown below. I sent this over to our security and compliance officer for an asynchronous review and received a 👍 from them the next day.

I put a Miro board together and tagged the lead engineer on the team asynchronously and by the end of the day I had a 👍 on what needed to be implemented, shown below. Ultimately, the team delivered the solution in less than a week.

Feedback

After we finished the first test where a community test manager was used, I reached out to them for their first impressions and feedback. Over the next week, I conducted similar feedback gathering with the other community test managers after they had completed moderation.

I also coordinated with customer success to monitor the customer's perception of quality in these tests over several weeks. There was no noticeable drop in quality from any customer.

After a small iterative improvement to the process based on feedback from the community test managers, our stakeholders and the project team celebrated the ~99% reduction in operational overhead. We were also confident in automating the entire process from assignment through to payment, which moved us closer to solving the 'weekend problem'.

Challenges & solutions

There were several complexities, uncertainties, and constraints that I dealt with in this project:

Complexity: Internal team member's had high levels of FUD about giving members of the crowd access to such a sensitive activity – I resolved this through dialogue and running experiments to test our assumptions.
Complexity: I needed to maintain GDPR and ISO 27001 security compliance – I resolved this by reviewing sensitive information and PII in the interface and quickly double-checked with the security & compliance officer.
Uncertainty: There was a lot of pessimism that the quality of the results delivered to customers would be noticeably degraded – I resolved this by monitoring result quality and gathering customer feedback.
Constraint: As the Head of Product & Design, I had limited time to devote to solving this problem and dealt with a lot of context-switching throughout – I resolved this by adopting a maker's schedule for a few mornings as well as relying on asynchronous communication with various team members.
Constraint: The "weekend problem" needed to be resolved quickly to address stakeholder concerns – while this project did not resolve the "weekend problem" entirely, it effectively cut it in half. The first half, evaluating requests and launching tests, would be solved a few weeks later.

Outcome

~99%

Reduction

Internal effort as a dependency in the moderation process

Internal effort was only needed to review any moderation review requests raised by testers

This success story laid the foundation for moving more work done by TMs over to the community, whenever automation and AI were not feasible

14 hours

Internal effort saved per test

Internal effort saved / test

This solution freed up internal effort to be spent on other activities

$854

Reduction in cost per test

About

Contact