Comparison of strategies

Different strategies

Different coloring strategies lead to different results, but also have different performance. It all depends on preferences, what is the goal.

If one want visually balanced result, 'balanced' strategy could be the right choice. It comes with four different modes of balancing - 'count', 'area', 'distance', and 'centroid'. The first one attempts to balance the number of features per each color, second the area covered by each color, and two last based on the distance between features. Either represented by the geometry itself or its centroid (a bit faster).

Other strategies might be helpful if one wants to minimize number of colors as not all strategies use the same amount in the end. Or they just might look better on your map. Strategies used in greedy have two origins - 'balanced' is ported from QGIS while the rest comes from networkX.

Below is a comparison of performance and the result of each of the strategies supported by greedy.

import geopandas as gpd
import pandas as pd
from time import time
import numpy as np
import libpysal
import seaborn as sns

from greedy import greedy

When using 'balanced' strategy with 'area', 'distance', or 'centroid' modes, keep in mind that your data needs to be in projected CRS to obtain correct results. For the simplicity of this comparison, let’s pretend that dataset below is (even though it is not).

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))


Code below generates each option 20 times and returns the mean time elapsed together with the number of colors used.

strategies = ['balanced', 'largest_first', 'random_sequential',
              'smallest_last', 'independent_set',
              'connected_sequential_bfs', 'connected_sequential_dfs',
balanced_modes = ['count', 'area', 'centroid', 'distance']

times = {}
sw = libpysal.weights.Queen.from_dataframe(
    world, ids=world.index.to_list(), silence_warnings=True
for strategy in strategies:
    if strategy == 'balanced':
        for mode in balanced_modes:
            print(strategy, mode)
            timer = []
            for run in range(20):
                s = time()
                colors = greedy(world, strategy=strategy,
                                balance=mode, sw=sw)
                e = time() - s
            world[strategy + '_' + mode] = colors
            times[strategy + '_' + mode] = np.mean(timer)
            print('time: ', np.mean(timer), 's; ',
                  np.max(colors) + 1, 'colors')
        timer = []
        for run in range(20):
            s = time()
            colors = greedy(world, strategy=strategy, sw=sw)
            e = time() - s
        world[strategy] = colors
        times[strategy] = np.mean(timer)
        print('time: ', np.mean(timer), 's; ',
              np.max(colors) + 1, 'colors')

As you can see below, smallest_last and saturation_largest_first were able, for this particular dataset, to generate greedy coloring using only 4 colors. If one wants to use higher number than the minimal, 'balanced' strategy allows setting of min_colors to be used.

balanced count
time:  0.001084136962890625 s;  5 colors
balanced area
time:  0.040719664096832274 s;  5 colors
balanced centroid
time:  0.6460193037986756 s;  5 colors
balanced distance
time:  1.7454206824302674 s;  5 colors
time:  0.00638657808303833 s;  5 colors
time:  0.007817411422729492 s;  6 colors
time:  0.012545084953308106 s;  4 colors
time:  0.15774503946304322 s;  5 colors
time:  0.010410833358764648 s;  5 colors
time:  0.010940515995025634 s;  5 colors
time:  0.03293987512588501 s;  4 colors
times = pd.Series(times)
ax = times.plot(kind='bar')

Plot above shows the performance of each strategy. Note that the vertical axis is in seconds using log scale.

Resulting maps

Below are all results plotted on the map.

for strategy in times.index:
    ax = world.plot(strategy, categorical=True, figsize=(16, 12),
                    cmap='Set3', legend=True)

Balance by 'count' is the fastest of all algorithms, but not always leads to the optimal results. Colors can be close to each other and if the sizes of polygons are disproportionally distributed, it might not look nice:


Balance by 'area' tries to cover the same areas with each color. Consider the largest country - Russia uses color which is not used by many other:


Balance by distance between 'centroids' generate colors to be equally distributed across the map. However, using centroids might cause some inaccuracy (consider USA with Alaska and Hawaii):


Balance by 'distance' between polygons attempts to do the same as 'centroids', but using the whole geometries. For that reason, it can be really slow:


Strategies 'smallest_last' and 'saturation_largest_first' are the most effective for this particular dataset as they result in 4 colors only:

_images/output_9_6.png _images/output_9_10.png

Remaining strategies:

_images/output_9_4.png _images/output_9_5.png _images/output_9_7.png _images/output_9_8.png _images/output_9_9.png