Tuesday, 23 March 2021

If Software Development Has Peaked, why do Projects still fail? Part 1

by Aiden Gallagher & Peter Reeves (our podcast)

Non-Software Considerations

There are lots of reasons a project might fail, but not all of them are software related, nor are they the sole dependencies of the technical members of a team. The following concerns are based on observation from projects that have failed before.

1. Partial Decision Makers

The first stage in any project is to get an understanding of the business goals. This isn’t necessarily requirement gathering right away but simply ensuring that the value of the project and the expected return on investment are understood. It also means understanding what, if not done, will cause those goals or the project itself to become uneconomical.


In one example, a car sales company has an implementation team with two tasks that they need to complete

  • A new application to allow users to ‘test drive’ the car in an interactive application. The new app will increase sales by 30% to about £300,000 per year.  
  • Updating how user order changes are sent to manufacturers from a manual email to automatic real-time event sending. This will save 5 workdays a month at around £10,400 savings per year. 


If the team has limited resources but not key information such as knowledge of the business goals or financial implications. For example, when deciding which task to prioritise, if the technical SMEs on the ground do not know that each day where the new app is not functional the business is losing money. 


However, the team regularly ‘feel’ the impact of having to send order updates via email which is time consuming and error prone. They think this is a relatively simple implementation that will save someone more time that could be allocated to other projects such as the new app.


The team will pick the emotionally preferable task as opposed to what matches business and financial goals. What seems like a sound decision based on technical facts has a secondary impact on the business.

 Remediation:

  • Put deliverables into context for the implementation team including why it’s important for the business and the impact it will have e.g. This new feature will bring $X value
  • Ensure improvement tasks are quantified e.g. adding a pipeline to add a new security key will save us 3 days per quarter and will take 10 days to complete
  • Ensure task planning take goal importance into consideration and make them real and relatable e.g. ‘Get NewProject into Production’ is a technical goal, but the business context might be NewProject needs to have 100 users by the next Sprint to be profitable. The quicker it’s in production the more likely the ability to meet the goal which should drive planning.

Warning Signs:

  • During planning meetings or informal discussions the team seem confused about the approach or question the sensibility of the plan. If there is disconnect it suggests the business goals related to each task isn’t well understood

2. Requirement Gathering is slow

Once a new task has been assigned to a team, it takes time to get all the requirements. However, in more agile projects, getting code into Production quickly and securely is a key benefit as is being able to pivot the strategy based on updated requirements. 


For example, a new feature lets a user set their address when they sign up for services, but feedback shows the user needs to be able to update the address; the requirements are that the update should be instantaneously pushed to the database.


The implementation team need to know what requirements they have on the database and vice versa but getting access to the specialists can be difficult as they are already committed to other project deliverables. 

Testing requires both teams to provide resources, access control is little understood by the new development team and as development begins changes might be needed which do not work with the system being integrated with. 


In the database example, the database team confirm that an update can be made to the database, the application team allow unlimited daily updates so if a mistake is made the new address can be resubmitted. However, the database processes data in batches with a first in, last out process which means the first - wrong address - is updated to the records.


Small gaps in product functionality, product knowledge and an inability to get the time of the relevant teams or specific members of the team exacerbate this problem, especially when applications are integrating with lots of other systems.   

Remediation:

  • Define the limits of integrated systems so the design can be iteratively updated but within the confines of what is possible e.g. the database can accept any inbound data format but has a hard limit of 1000 characters and 1MB.
  • For features, products and projects requiring external integration with another team, line of business or organisation then a dedicated resource should be requested to ensure iterative design can be reviewed at the same cadence as changes
  • For new or unfamiliar features and products, the team; designers, developers and architects should be given sufficient time to prepare and gain education/experience. Where relevant and possible, the team should look to ‘loan’ relevant resources from within the organization
  • Lock-in requirements for development cycles and allow new and updated requirements at the next design and development iteration. If this isn’t possible the current development cycles should be pushed back to a design cycle. This reduces changes further in the route-to-live workflow.

Warning Signs:

  • Trivial issues such as size limits or unsuitable data formatting are being found in the development or testing phases. This suggests that design and architect team members are not familiar or up to date with requirements
  • Team raises that there is little knowledge of a certain area
  • The team are handling lots of new items with lots of unknowns e.g. lots of new versions of products are coming into play at the same time. 

3. T Shaped Experts

It is becoming easier and easier to begin working with a product with only a little deep knowledge of it, its features, or the platform it is deployed to. A subject matter expert who once specialised in a single product might now be expected to handle a wider spectrum of associated tasks including security, testing and integration with other tools. There is also a requirement to understand and be able to deploy in Kubernetes, apply relevant checks and measures through into production and provide support in production whilst developing new features.


Not all SMEs are able to quickly build up knowledge of lots of different products and have the ability to design and implement them to the same standard that a specialised SME could do, this might mean that good practices (learnt over time) are not incorporated causing later problems in the project. In theory with quickly moving teams, the same thing could be replicated multiple times over a product, project or applications life as the same ‘lesson’ is learnt by new people.


To accommodate this, small teams with a common ‘wide’ knowledge base and individual deep technical skills work together to form a robust skill base. E.g. an integration SME, an API SME, a security SME, a product support SME collectively complete the tasks on a team. But finding the right people can be hard.

Remediation:

  • Provide sufficient time and resources for skilling up in new technologies
  • Where possible, allow SMEs to be placed on programs and projects that need them e.g. pooling a certain skill type (cryptography) rather than expecting everyone to have a deep knowledge of it
  • Communicate with the team to find interesting roles for each team member, especially when learning new skills. For example, by allowing team members to self-select tasks during planning. 
  • Allow for longer development and delivery timelines when team members are learning or unfamiliar with a concept or feature
  • Pair team members with more skilled members assisting less skilled members on their deep skill subject 

Warning Signs:

  • The team is not performing as well as previously e.g. delivering at the same cadence
  • Lots of small issues are arising that have been found when other team member performed similar tasks. This suggests a new team member is learning something the team has skills to perform, the new team member should be assisted by experience.
  • Members of the team are showing fatigue or high levels of stress. They may even have told a team mate they are struggling.

4. Simple does not Equal Cheap

There is an expectation that when something sounds simple it should be easy and therefore cheap. Take for example, buying coffee from an online coffee delivery service. There are three functions; selecting your order, paying for your order, planning a delivery route to ensure coffee remains hot and arrives in accordance with Service Level Agreements (SLAs).


Selecting and processing of an order seems the simplest. However, simple in itself does not mean the design itself is simple; the stock list needs to be up to date, the online application needs to synchronise with the in-store systems, the online system needs a queueing system to handle demand throughput, the order has now been paid for but needs to remain on the system until delivered meaning the system and its messages needs to be highly available.


The feature itself whilst seemingly ‘simple’ has lots of other considerations at the technical level which are not always immediately obvious. This either means things are initially missed meaning lots of retrofitting after launch, or lots of prework to ensure the system functions efficiently and as expected, all which all has a cost. 

 Remediation:

  • Ensure that all project members including management and delivery leads have a good understanding of the effort requirements relating to different pieces of work
  •  Trust technical leads on the considerations, requirements and need to understand certain tasks
  • Perform rigorous planning sessions to ensure all the work for a task is understood
  • Avoid planning with preconceptions of ‘simple’ – lots of simple tasks can take a long time

Warning Signs:

  • Lots of unplanned work is being scheduled
  • Tasks are taking longer than originally planned for
  • Technical timings for tasks are being driven by business requirements e.g. Feature needs to be in by next week, but with current resources the estimate is two weeks’ worth of work 

5. Cultural and Task Disparity Across Organizations

At an organisational level, all goals are generally aligned at the highest level; make a profit, open new stores, increase sales, improve costs etc. but on a team by team and project by project level the prioritisation of these goals varies.


As with “Partial Decision Makers”, an infrastructure provisioning team charged with decreasing costs by 10% may not see a new feature project as high a priority as server utilisation work. This might be because there isn’t a wider understanding of the organisations primary goals and each projects advancement of those goals, or at times a differing opinion of what is going to provide the best results.


When this involves separate organizations and the two beginning to collaborate to integrate systems, a problem arises where both organizations don’t have the business context of the teams that they are working with. If there is more value for one team than the other the collaboration and drive to succeed don’t marry well and can lead to delays or work not completed efficiently or effectively. 


A separate concern is when the culture within teams, whether cross-organization or internal, are very different. This could be work hours, lead times to complete work, processes that are seen as too quick (fragile or insecure) or too process ridden (wasteful or costly) and even the way communication takes place. Think of an Agile or Lean team versus a more traditional waterfall implementation, where on one side changes should be possible quickly with new evidence, on the other careful planning ensures budget is well understood and teams do not get overloaded. 

Remediation:

  • Agree methods of working ahead of project initiations such as: how often people come into the office, general availability hours (10am – 3pm), communication techniques etc.
  • Complete an all-hands to allow each team to describe what they are working on and more important ‘why’. What is the business benefit and how will it impact others and customers? Try not to make this “We completed this task ahead of time” but instead “We completed this task ahead of time which means we saved on costs and customers should not get X benefit.”
  • Do a review of past projects to show what tangible benefits came from them in the long term e.g. over a year, two years etc.

Warning Signs:

  • Required resources are not available when needed
  • Two teams/projects have been given goals that do not have aligned solution or implementation, but one is reliant on the other
  • A non-dedicated resource is provided. Usually this is because a resource is pooled or responds only to tickets. This can mean issues aren’t resolved quickly because there isn’t the capacity to do so.

6. Governance Perception

Governance is fast becoming embedded into more modern deployment strategies. The introduction of ‘production like’ as early as possible has always been a goal, but often the red tape has now been cut up to allow for faster route-to-live. In some cases the impact of not following these processes can be felt for years and in more extreme cases the lifespan of the project.


Site reliability engineers are more often being embedded into project teams becoming more flexible as they understand first-hand the development requirements and changes. This means processes are embedded and refined to ensure maximum compliance and forward planning for production support whilst removing pointless processes that genuinely block progress or agility.


However, some of the processes are still insisted upon – rightly – because of the experience of previous deployments which are common no matter what development style has been used to generate the features, how flexible your systems are or how many approvals are needed. 

When these are continued to be seen as ‘blockers’, ‘delays’ or ‘pointless’ and common-sense governance practices are ignored, then a program becomes more likely, and maybe even inevitably likely to fail.

 Remediation:

  • Promote the retrospective sharing amongst teams of governance pitfalls that have been met and solutions that have overcome them
  • Review governance processes regularly to validate the precaution cost against the risk cost. If the precaution is more expensive can be it be scaled back to a cheaper alternative. 
  •  Invite site reliability engineers, security engineers and production support to review the processes in place. Are they more or less restrictive? Have better systems been put in place elsewhere?

Warning Signs:

  • Release of code is taking more time than the design and development
  • Work in progress is getting ‘stuck’ at governance steps with high ‘time in activity’ e.g. An approval process taking several days.
  • Valid governance processes are causing discontent amongst the team. It might be that the risk cost is not understood or that the precautions are overly restrictive.

7. Who ‘owns’ this work?

Scope creep has and probably always will be a major issue in the planning and implementation of a project (see Requirement gathering is slow). This doesn’t change with Agile which, although flexible, is better when focused on specific aims, new tasks inserted two days into a new sprint can cause havoc and mean a rapid realignment of other expectations.


When new tasks are combined with collaboration with many teams, the who and when element of the task can make for a constrained relationship. This might mean other work suffers but any hostility from the stress of change can damage cooperation in the future and the general progress of a project.


This usually can be resolved by planning for the ‘unknown’ and fostering positive relationships. This also requires a level of reasonableness, in that an acceptance that new work will likely mean a push out on timeline of current work or an increased cost to complete the new work to the same timelines.


In lean enterprise this can be alleviated if the same team manages from development through to Production and if changes are contained with less close coupling. 

Remediation:

  • Be open to new requirements but ensure there is a process for getting them added to the backlog of work to be properly measured and dependencies managed with the relevant teams
  • Reduce dependencies on other teams as much as possible. Teams with greater internal control - either because they own or have the ability to develop in their own isolated space - can accommodate new requirements easier. However, the team still need to know how to do the work
  • Allow headroom for unforeseen tasks in the planning. This might mean having some resources every cycle available to down tools and help with defects, new requirements etc.

Warning Signs:

  • New project announced
  • Change of organizational direction
  • News of competitors, regulatory changes or new promotions. 

8. Communication is Key

In films, books, tv etc. as you watch the plot unravel the common theme is usually – all of the drama was avoidable if the characters would communicate better or at all. The same happens to be true for projects.


A typical example might be where two teams have individual timelines and deadlines for completion, they may report to different line management and their goals might be different. If something changes in one of the plans e.g. deadlines, then there needs to be communication to ensure that that can be accommodated by the relevant people and that the prioritisation of any changes to both schedules makes the most sense for the whole organization.


Communication is also integral within a team, where team members are saying they sincerely cannot achieve deadlines AND remain secure and compliant either the ramifications need to be understood and accepted or the timelines need to be extended.


This is the hardest of the non-software issues to ‘fix’ because there is a fine line between communicating and talking. Communication shouldn’t be a distraction to complete work; however a lack of communication can quickly break down a project.


Trust is also important to any communicating team(s). A project manager needs to trust when something cannot be done either in time or securely, a developer needs to trust that a prioritisation of a task has business success in mind. If a team risks anger, embarrassment or even the breakdown of a vendor/partner relationship they will not feel comfortable communicating a true representation of the world.

Remediation:

  • Create an environment where feedback and concerns can be raised and are positively acted on
  • Ensure where plans are changed and altered there is a good communication channel with other dependent teams. Where possible ensure there is a good understanding of how other teams work through playbacks, demos and in some cases secondments

Warning Signs:

  • Communication between teams might be forthright e.g. a ‘tone’ in an email
  • Team members raise issues to others in the team and not to management
  • There is a misunderstanding of how other teams operate e.g. why things take ‘so long’ or why there is a delay or why work can’t be put into the other teams workflow straight away

No comments:

Post a Comment