Comparison of strategies ======================== Different strategies -------------------- Different coloring strategies lead to different results, but also have different performance. It all depends on preferences, what is the goal. If one want visually balanced result, ``'balanced'`` strategy could be the right choice. It comes with four different modes of balancing - ``'count'``, ``'area'``, ``'distance'``, and ``'centroid'``. The first one attempts to balance the number of features per each color, second the area covered by each color, and two last based on the distance between features. Either represented by the geometry itself or its centroid (a bit faster). Other strategies might be helpful if one wants to minimize number of colors as not all strategies use the same amount in the end. Or they just might look better on your map. Strategies used in ``greedy`` have two origins - ``'balanced'`` is ported from QGIS while the rest comes from ``networkX``. Below is a comparison of performance and the result of each of the strategies supported by ``greedy``. .. code:: python import geopandas as gpd import pandas as pd from time import time import numpy as np import libpysal import seaborn as sns sns.set() from greedy import greedy When using ``'balanced'`` strategy with ``'area'``, ``'distance'``, or ``'centroid'`` modes, keep in mind that your data needs to be in projected CRS to obtain correct results. For the simplicity of this comparison, let’s pretend that dataset below is (even though it is not). .. code:: python world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres')) Performance ----------- Code below generates each option 20 times and returns the mean time elapsed together with the number of colors used. .. code:: python strategies = ['balanced', 'largest_first', 'random_sequential', 'smallest_last', 'independent_set', 'connected_sequential_bfs', 'connected_sequential_dfs', 'saturation_largest_first'] balanced_modes = ['count', 'area', 'centroid', 'distance'] times = {} sw = libpysal.weights.Queen.from_dataframe( world, ids=world.index.to_list(), silence_warnings=True ) for strategy in strategies: if strategy == 'balanced': for mode in balanced_modes: print(strategy, mode) timer = [] for run in range(20): s = time() colors = greedy(world, strategy=strategy, balance=mode, sw=sw) e = time() - s timer.append(e) world[strategy + '_' + mode] = colors times[strategy + '_' + mode] = np.mean(timer) print('time: ', np.mean(timer), 's; ', np.max(colors) + 1, 'colors') else: print(strategy) timer = [] for run in range(20): s = time() colors = greedy(world, strategy=strategy, sw=sw) e = time() - s timer.append(e) world[strategy] = colors times[strategy] = np.mean(timer) print('time: ', np.mean(timer), 's; ', np.max(colors) + 1, 'colors') As you can see below, ``smallest_last`` and ``saturation_largest_first`` were able, for this particular dataset, to generate greedy coloring using only 4 colors. If one wants to use higher number than the minimal, ``'balanced'`` strategy allows setting of ``min_colors`` to be used. .. parsed-literal:: balanced count time: 0.001084136962890625 s; 5 colors balanced area time: 0.040719664096832274 s; 5 colors balanced centroid time: 0.6460193037986756 s; 5 colors balanced distance time: 1.7454206824302674 s; 5 colors largest_first time: 0.00638657808303833 s; 5 colors random_sequential time: 0.007817411422729492 s; 6 colors smallest_last time: 0.012545084953308106 s; 4 colors independent_set time: 0.15774503946304322 s; 5 colors connected_sequential_bfs time: 0.010410833358764648 s; 5 colors connected_sequential_dfs time: 0.010940515995025634 s; 5 colors saturation_largest_first time: 0.03293987512588501 s; 4 colors .. code:: python times = pd.Series(times) ax = times.plot(kind='bar') ax.set_yscale("log") .. image:: images/strategies/output_7_0.png Plot above shows the performance of each strategy. Note that the vertical axis is in seconds using log scale. Resulting maps -------------- Below are all results plotted on the map. .. code:: python for strategy in times.index: ax = world.plot(strategy, categorical=True, figsize=(16, 12), cmap='Set3', legend=True) ax.set_axis_off() ax.set_title(strategy) Balance by ``'count'`` is the fastest of all algorithms, but not always leads to the optimal results. Colors can be close to each other and if the sizes of polygons are disproportionally distributed, it might not look nice: .. image:: images/strategies/output_9_0.png Balance by ``'area'`` tries to cover the same areas with each color. Consider the largest country - Russia uses color which is not used by many other: .. image:: images/strategies/output_9_1.png Balance by distance between ``'centroids'`` generate colors to be equally distributed across the map. However, using centroids might cause some inaccuracy (consider USA with Alaska and Hawaii): .. image:: images/strategies/output_9_2.png Balance by ``'distance'`` between polygons attempts to do the same as ``'centroids'``, but using the whole geometries. For that reason, it can be really slow: .. image:: images/strategies/output_9_3.png Strategies ``'smallest_last'`` and ``'saturation_largest_first'`` are the most effective for this particular dataset as they result in 4 colors only: .. image:: images/strategies/output_9_6.png .. image:: images/strategies/output_9_10.png Remaining strategies: .. image:: images/strategies/output_9_4.png .. image:: images/strategies/output_9_5.png .. image:: images/strategies/output_9_7.png .. image:: images/strategies/output_9_8.png .. image:: images/strategies/output_9_9.png