Showing posts with label resilience. Show all posts
Showing posts with label resilience. Show all posts

Saturday, January 31, 2026

How to Fix NPEs forever

Null is biggest flaw of Java, saying some developers. Null causes NPEs causes crashing software. IMHO this is the same vibe as C vs Rust and I totally disagree. Crashing software is caused by developers and not by the programming language.

The easiest and best way to prevent NPEs is … proper exception handling. It is as simple as it sounds.

Bye the way, NPEs are often a side effect of OOP. All Objects are unsafe until you checked them. But checks are unsafe so go with exception handling.

 


 

Saturday, January 24, 2026

Increase resilience: Decouple external system calls

For most software companies resilience means that the replication factor is 2 or higher.

 

Well that's far from good. You resilience is still bad as with two instances of your weak micro service. 

One way to increase the internal resilience of your service is to decouple external service calls. This  includes Data Base operations, Kafka, HTTP. For HTTP calls it is very common and most developer understand that this HTTP calls can go wrong but Data Base calls, yes all external services are running over a network, firewall, load balancer, switches, ... So it's a good idea to make all external call more resilient:

  1. Put the external calls into own treads. This keeps your application running in case of an error or time out.
  2. And have a time out or watchdog on it. By the way, Java futures are a easy way to do it.
  3. Check the results also for write operations. 
  4. Extend logs and monitoring to recognize external call errors.

Wednesday, February 24, 2021

Reduce the code to the minimum

Reduce the code to the minimum but not less. This fundamental  principle of mechanical engineering fits also extremely well to software engineering and software resilience. Remove all unneeded wrapper, boiler plate code, indirections. 

  1. Less code mean less bugs
  2. Less code keeps your software side effect free
  3. Less code means less hidden knowledge 
  4. Less code - less dependencies
  5. Less code means less code to read
But be careful. Be sure that your code is well-covered from tests. Every mature project came's into a phase where the software developer delete more lines than they are writing new.

Tuesday, February 23, 2021

Exception handling - resilience in small

On of the most discussed software development topics is the right exception handling. Exception handling is one of the basic building blocks of resilience and without any surprise the correct exception handling depends on your application.

But here some well working tips to keep exception handling easy:

  1. Find out how important is this code place. If you compare it with a car, is it the breaker or is it the seat heating.
  2. If your code is the "seat heating" category then:
    1. Catch the exception
    2. Log the exception as error.
    3. Maybe count the errors as metric and put the metric on a dashboard.
  3. If your code is in the "brake" category then
    1. Maybe catch and log the exception with the context e.g. the related method parameter
    2. Throw re-throw the exception and handle it on the presentation layer (front end)
  4. If you can't clearly decide between the categories "seat heating" and "brake"
    1. handle the exception like category "seat heating"
    2. create an alert on the metric
  5. If you have an fallback use the fallback and count the usage via log message or as metric

Saturday, August 22, 2020

Rule: Never Deploy on Friday


 

Many developers know the unwritten rule: Never deploy on Friday. The reason for this rule is that nobody wants to repair a live system at the weekend.


If you live this rule, it's a very good indicator for three things they are wrong in your software development:

  1. Your application is not well tested for resilience e.g. with Chaos Engineering tools.
  2. Your deployment pipeline is not stable enough or you not trust them and your tests.
  3. Your not familiar with roll-backs or you have no feature toggles.
One of the major goal is to trust your software and the test so that you feel comfortable to deploy on every Fridays.

One of my major project goes online on Friday afternoon, most developers was not in the office, with success.