Overview:
- The idea
- Maintaining (refactoring)
- Rewriting
- Age of AI, and why we should switch from Maintain-first (Refactor-first) to Rewrite-first?
- Conclusion
The idea
If you have worked in projects with technical debt, at some point you had to ask yourself should we Maintain (Refactor) or Rewrite? By Maintain I mean fixing bugs, adding small features, and refactor problematic parts. There are some main factors that play a role in this decision e.g architecture in place, expertise of software engineers in the team/company, the amount of technical debt, difficulty of refactor, existing test coverage, coupling and cohesion of the component(s) (or microservices or services), time, money, available resources, and goals.
Recently after working on analysis of a somewhat big feature (that would take about 6 months to implement), I have been thinking more about software design and costs. My conclusion is that we have transitioned to a new Era of software design, what I call “Era of Replaceable Software Components”, not only in Backend but also in Frontend. In my opinion, we are still 5 to 10 years away from a single AI agent creating a software product from 0 to 100 percent or Dynamic Chat UIs like the ones that OpenAI and Google have created to become the main interface for users.
Writing this, I’m thinking about a software product that has lots of tech debt, obviously if the code base is fresh it won’t need major Refactoring.
In the age of AI, I believe we need to shift our mentality from Maintain-first (or refactor-first) to Rewrite-first. Currently, it’ll be cheaper and faster to Rewrite rather than maintain considering the end result. Let me talk you into it.
Maintaining (Refactoring)
Examples of maintenance are adding a button, adding a small feature, fixing bugs, or sometimes actually refactoring a part of code to make it better. When we refactor (maintain) we do not want to take the risk of a complete rewrite, we want to save time and money, we want to clean up and patch, we want to pace ourselves, we don’t want to risk breaking the product.
Advantages of Maintaining (Refactoring)
- Lower risk of breaking the product, because we’re going to change the parts that need to be changed
- Less code needs to be written because specific parts of code is being touched
- It’s faster to achieve goals because we only fix the parts that are necessary (it’s more accurate to say it was faster, I’ll explain later)
- It’s cheaper (it’s more accurate to say it was cheaper, I’ll explain later)
- Changes are done in place, so there won’t be 2 systems that need to be maintained for a period of time
- Developers don’t need to learn new architecture patterns, or new tools, so less investment is needed
Disadvantages of Maintaining (Refactoring)
- Most of the time during refactoring, you won’t be able to upgrade the dependencies.
- These old packages will slow the developers down because their documentations are old and the way they do things are different from modern development practices. Also, search results mostly point to recent versions so finding solutions is harder
- The security fixes will become more scarce
- The performance will drop
- Maintainers might drop the support or archive the packages completely
- You won’t be able to solve the architectural problems. When the foundation is not put correctly, everything you build on top of it will be at the risk of a collapse
- Touching the spaghetti code always has the risk of breaking the whole product, need for rollback not only on the codebase but also on the data
- High Cognitive Complexity: our brains can keep track of limited number of things at the same time, the higher the number, the more exhausting the refactoring (maintaining) process will become.
- It’ll be slower in smaller scale. Each part that we refactor takes more time because we need to make sure we do not break the other working parts, e.g (talking out of experience) adding a new action (button) that normally will take an hour (frontend and backend) might take about 4-5 workdays in a codebase with technical debt
- If you are refactoring without a great analysis of side-effect, in the middle of refactoring, you might notice that you have to dig deeper and refactor other parts of the app that you don’t have any knowledge of, or break some processes that won’t show instantly.
- If your existing code does not have a good test coverage, you won’t be able to make sure the new code works properly unless you add tests to existing code, which means you still have to partially pay for your past mistakes!
As a side note, if you have not read the Refactoring Book by Martin Fowler, I highly recommend it.
Rewriting Code
When we rewrite a module, a microservice, a service, or even a complete application, we are looking forward to using what we have learned from user feedback, and our past mistakes to build something that fits the actual needs of the users, we also want to and can move fast.
Advantages of Rewriting
- We can move faster, because there is no technical debt to slow us down
- We can use new technologies, which means that we get many new features out of the box
- We can use new packages (dependencies), which means:
- we get the latest updates
- we have access to latest security fixes
- Better documentation means developers can move faster
- Better support
- community
- We can change the design (architecture) which means we won’t be slowed down or limited by existing architecture
Disadvantages of Rewriting
- It’s slower (it was slower, but not anymore, I’ll explain later).
- Depending on what you are refactoring you need to be dealing with stakeholders that understand software engineering and the fact that it’ll take time! If your stakeholders do not understand that maintenance (change management) is the most expensive / exhausting part of software development (changes being multiple times more time consuming than creating new components), you have already failed. This is where negotiation skills of the person in charge of the rewrite (might be engineering lead, or CTO) plays a big role in success.
- How are we going to keep our existing users happy?
- What happens to newly reported bugs?
- What about urgent new features?
- The answer to the last 3 questions is that It’s all about the management of deliverables. Obviously the more time you need to deliver sellable results, the smaller your deliverables need to be not to leave your stakeholders impatient. Usually stakeholders want to know as soon as possible what’s the bang for the buck!
- Split of resources: When you get in the rewrite process, you need to think about the resources that will maintain the working parts
- You need senior software engineers (IC4 – Owns complex projects and drives their direction. Displays in-depth and broad skills and expertise – or IC5 – A very experienced team member who leads large, high-priority projects that impact company-level goals) for a good rewrite.
- Data migrations might be required if you are dealing with bad DB design
- If you have not learned from your past mistakes, your team will make the same (or similar) mistakes during rewrite which will degrade your codebase fast. I’ve seen teams rewriting a spaghetti code into a new code base with lots of technical debt. Using architectural designs that your team is used to, using opinionated tools, and opinionated frameworks will be a lifesaver in this case.
- You need to make sure the new code still works correctly on edge cases of the main success flows of your application.
Age of AI, and why we should switch from Maintain-first (refactor-first) to Rewrite-first mindset?
Let’s first review what is different now: (This is based on my experience with Github Copilot)
- With copilot(s) we see about 70% to 85% increase in development speed depending on the task.
- Copilot(s) work much better with new, well-documented technologies
- Copilot(s) work much better in new codebase with standard architectures
- Copilot(s) are able to generate any kind of tests (unit tests, integration tests, functional tests, etc). The only thing you have to do is create one test, and keep the necessary files open in your IDE and then it can imitate for most cases.
- Copilot(s) are much better at debugging well-structured, typed and interfaced code
These huge changes, have drastic effect on our development process and require a shift from maintain-first to rewrite-first, here are the reasons:
- Speed of development:
- Time requirements of coding lambdas have decreased from day(s) to hour(s).
- Time requirements of setting up new architectures have gone from days to hours.
- Bootstrapping new projects is multiple times faster
- Creating components / modules is multiple times faster
- Creating microservices is multiple times faster
- according to my calculations while a person might take 3-4 weeks to add a feature and fix 2-3 bugs in the broken architecture or existing problematic code base, instead the same person will be able to rewrite the whole microservice, add the new feature, fix the same bugs, and sometimes increase the test coverage. (obviously, this is not a rule, it depends on many factors such as size of the unit of work that you are going to rewrite, expertise of your developers, etc.)
- Difference in end result:
- Let me put it this way, if you could spend 4 weeks and build an off-brand 100$ mobile phone, or spend 5 weeks and build an iPhone 15, which one would you choose? It’s kinda the same thing in many cases. If you do not have product-market fit yet, even 1 week extra does not make sense, but what if you have an stablished product? The amount of time that you’ll save as time goes by is such a big number that makes it worth to rewrit modules.
- If you have an established product, to make sure what I’m saying makes sense, do this:
- choose 2-3 module / component / microservice / service that you have been actively maintaining for the past 6 months
- calculate the amount of time that it has cost you to keep it working over the past 6 months
- Based on your experience, make a reasonable guess of the time you have to spend on this module in the upcoming 6 month
- The sum, is obviously the time cost of this module over a year
- now calculate how much would it cost you to rewrite the whole thing
- In the cases I have seen the amount of time needed for refactoring will be at least twice as much compared to rewrite. Even if it’s the same time, which one would you choose?
- now also consider, the state of your code base, performance, security, existing technical debt and your development (developer) experience as part of the difference in end result
- Although the quality of the code generated by AI might be lower in some cases, as long as you have correct architecture in place and can consider the generated code like “replaceable software parts”, you won’t be sacrificing your software’s quality or accumulate future costs
- Cost (Let’s talk money):
- Time is Money, so obviously difference in development time is expense. e.g cost of developing a new feature in old system vs new system
- Hidden costs that are not easily measurable but can be proven by collecting data:
- Infra costs: Can you save infra costs by rewriting?
- 3rd party service costs: Maybe you can move to cheaper 3rd party?
- Tools: Maybe there’s a new player in the market that is cheaper?
- Hidden coding costs: if developer has to spend 2 days to get a 2 hour task done because of tech debt, you have lost 14 hours in developer costs. These costs accumulate per developer per year. Calculate the “value added by development over the past year” and compare it to “what actually could’ve been the value added with same number of developers without tech debt”. You can go backwards as well, calculate the added value by each developer over the past year and compare it to their cost for company. You’ll end up with values like multiple thousand euros for a simple REST API endpoint or a new button in UI when you have large amount of technical debt and N-times cheaper values without.
- Developer experience: Most software developers are ambitious and eager to learn, working with stale codebase means that they will lose motivation, lose a lot of energy over self-made problems and in the end they leave. Usually tenure and productivity have causation relation, meaning the longer the developers stay in your company the cheaper they are (This is based on my experience, otherwise I watched a youtube video that the person was saying according to their data at Google tenure does not play a big role in productivity). Besides that, off-boarding costs, hiring costs, and onboarding costs, knowledge transfer, training costs should also be considered. So what is the average tenure of your software engineers? If it’s less than 3 years, you might have some big issues in your team or organization to solve.
Conclusion
Today’s market is more competitive than ever, so it’s necessary to think about software and its problems in a divide and conquer kind of way as big overhauls usually end-up being a disaster. That being said, although each software product or software development team has its own constraints and needs, IMO, huge changes brought about by AI (Copilot(s)) have made software development so much faster and as a result so much cheaper in the past year that maintaining old code bases with lots of technical debt does not make sense anymore. As a result we need a shift from Maintain-first (Refactor-first) to Rewrite-first mindset to move faster and grow our businesses.