Examples of applying filters and categorising tracks

Import essential libraries

[1]:
from pathlib import Path

from octant.core import TrackRun

Define the common data directory

[2]:
sample_dir = Path(".") / "sample_data"

Data are usually organised in hierarchical directory structure. Here, the relevant parameters are defined.

[3]:
dataset = "era5"
period = "test"
run_id = 0

Construct the full path

[4]:
track_res_dir = sample_dir / dataset / f"run{run_id:03d}" / period

Now load the cyclone tracks themselves

[5]:
tr = TrackRun(track_res_dir)
tr
[5]:
Cyclone tracking results
Number of tracks 671
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

Classify the tracks

Now, to label each of the tracks within tr according to a set of filters or criteria, classify() method should be used.

It has an alias: categorise().

Below are two examples: a simple one and a more advanced using a function with multiple arguments.

Simple functions as filters

As its argument, classify() takes a list of tuples in the form of

[
(<labelA>, [<func1>, <func2>, ..., <funcN>]),
(<labelB>, [<func1>, <func2>, ..., <funcN>]),
...
(<labelZ>, [<func1>, <func2>, ..., <funcN>]),
],

where labelA is assigned to a track if the track satisfies all the conditions given by [<func1>, <func2>, ..., <funcN>], which is a list of 1 or more functions. These functions expect 1 and only 1 argument - OctantTrack.

For example, it is possible to classify tracks by their lifetime, maximum vorticity, and distance travelled:

[6]:
conditions = [
    ("long_lived", [lambda ot: ot.lifetime_h >= 6]),
    (
        "far_travelled_and_very_long_lived",
        [lambda ot: ot.lifetime_h >= 36, lambda ot: ot.gen_lys_dist_km > 300.0],
    ),
    ("strong", [lambda x: x.max_vort > 1e-3]),
]
[7]:
tr.classify(conditions)
[8]:
tr.is_categorised, tr.is_cat_inclusive
[8]:
(True, False)
[9]:
tr
[9]:
Cyclone tracking results
Categories
671in total
of which247long_lived
of which18far_travelled_and_very_long_lived
of which5strong
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

NB By default, the categories are NOT “inclusive”, so all categories are independent.

In this case, “long_lived” do not include the 18 tracks, of which 15 are “far_travelled_and_very_long_lived” plus 5 are “strong”.

This is how the numbers change if the categorisation is inclusive, so the “long_lived” subset includes tracks are “far_travelled_and_very_long_lived”, and they both include the “strong” subset.

[10]:
tr.classify(conditions, inclusive=True)
[11]:
tr.is_categorised, tr.is_cat_inclusive
[11]:
(True, True)
[12]:
tr
[12]:
Cyclone tracking results
Categories
671in total
of which247long_lived
of which18far_travelled_and_very_long_lived|long_lived
of which3strong|far_travelled_and_very_long_lived|long_lived
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

Note that it automatically appends |<source_category> to the category labels.

Select one or multiple categories

Selection of tracks within a category can be done as following.

[13]:
tr.classify(conditions)
[14]:
tr
[14]:
Cyclone tracking results
Categories
671in total
of which247long_lived
of which18far_travelled_and_very_long_lived
of which5strong
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test
  • one category

[15]:
tr["strong"]
[15]:
lon lat vo time area vortex_type
track_idx row_idx
93 0 1.5 75.0 0.000418 2011-01-04 02:00:00 55228.76953 1
1 3.6 75.3 0.000403 2011-01-04 03:00:00 59789.96875 1
2 6.6 75.9 0.000403 2011-01-04 04:00:00 58341.80859 1
3 7.5 75.9 0.000421 2011-01-04 05:00:00 64806.83984 3
4 4.8 74.7 0.000431 2011-01-04 06:00:00 60390.07812 1
5 5.7 74.7 0.000449 2011-01-04 07:00:00 56248.26172 1
6 4.2 74.1 0.000444 2011-01-04 08:00:00 54530.76562 1
7 3.9 73.8 0.000436 2011-01-04 09:00:00 51079.36719 1
8 13.5 75.9 0.000470 2011-01-04 10:00:00 63191.57031 1
9 14.7 76.5 0.000484 2011-01-04 11:00:00 64762.19922 1
10 15.6 76.5 0.000516 2011-01-04 12:00:00 17545.86914 1
11 16.2 76.5 0.000528 2011-01-04 13:00:00 19758.55664 1
12 18.3 77.1 0.000524 2011-01-04 14:00:00 3609.31030 1
13 18.3 77.1 0.000749 2011-01-04 15:00:00 23973.10352 1
14 18.6 77.1 0.000935 2011-01-04 16:00:00 23797.58789 1
15 18.9 77.1 0.001016 2011-01-04 17:00:00 29484.75586 3
16 19.2 77.1 0.000953 2011-01-04 18:00:00 31794.38477 3
17 19.5 77.1 0.000889 2011-01-04 19:00:00 36683.43750 3
18 20.1 77.1 0.000815 2011-01-04 20:00:00 37243.85547 3
19 21.0 77.1 0.000724 2011-01-04 21:00:00 20711.82031 3
20 23.4 77.1 0.000725 2011-01-04 22:00:00 38624.17188 3
21 24.3 77.1 0.000815 2011-01-04 23:00:00 34974.03125 0
22 25.2 77.1 0.000875 2011-01-05 00:00:00 28838.16016 0
23 25.8 77.1 0.000853 2011-01-05 01:00:00 29850.99805 0
24 27.0 77.4 0.000763 2011-01-05 02:00:00 32273.25391 0
25 27.9 77.4 0.000795 2011-01-05 03:00:00 32486.57227 0
26 28.5 77.4 0.000776 2011-01-05 04:00:00 33306.07812 0
27 29.7 77.7 0.000727 2011-01-05 05:00:00 31798.09180 0
28 30.0 77.7 0.000766 2011-01-05 06:00:00 32786.72266 0
29 31.2 78.0 0.000711 2011-01-05 07:00:00 32779.28516 0
... ... ... ... ... ... ... ...
569 57 36.3 73.8 0.000865 2011-01-30 10:00:00 53348.92188 0
58 36.9 73.5 0.000867 2011-01-30 11:00:00 52527.41797 0
59 37.2 73.5 0.000825 2011-01-30 12:00:00 54444.71094 0
60 37.8 73.2 0.000768 2011-01-30 13:00:00 83394.43750 0
61 38.7 72.9 0.000728 2011-01-30 14:00:00 83738.90625 0
62 39.3 72.6 0.000686 2011-01-30 15:00:00 88395.61719 0
63 40.2 72.3 0.000621 2011-01-30 16:00:00 61690.12891 0
64 42.0 72.3 0.000602 2011-01-30 17:00:00 91611.58594 0
65 42.6 72.0 0.000626 2011-01-30 18:00:00 40642.46875 0
66 43.5 72.0 0.000652 2011-01-30 19:00:00 42448.89062 0
67 44.1 71.7 0.000658 2011-01-30 20:00:00 62467.35938 0
68 44.7 71.7 0.000662 2011-01-30 21:00:00 39607.70312 0
69 45.3 71.7 0.000622 2011-01-30 22:00:00 51207.28125 0
70 45.3 71.7 0.000600 2011-01-30 23:00:00 53267.82812 0
71 45.9 71.4 0.000554 2011-01-31 00:00:00 52988.43750 0
72 46.2 71.4 0.000532 2011-01-31 01:00:00 53947.32031 0
73 45.6 71.1 0.000495 2011-01-31 02:00:00 51971.46875 0
74 46.5 71.1 0.000479 2011-01-31 03:00:00 56551.71875 0
75 46.5 70.8 0.000435 2011-01-31 04:00:00 56018.83594 0
76 48.0 70.2 0.000429 2011-01-31 05:00:00 56886.23438 0
77 48.0 69.9 0.000453 2011-01-31 06:00:00 51720.19141 0
78 48.0 69.6 0.000497 2011-01-31 07:00:00 43837.63281 0
79 49.2 69.9 0.000508 2011-01-31 08:00:00 41850.42578 0
80 49.5 69.6 0.000475 2011-01-31 09:00:00 37235.42188 0
81 50.1 69.6 0.000485 2011-01-31 10:00:00 34360.56641 0
82 50.1 69.6 0.000354 2011-01-31 11:00:00 15396.19336 0
83 50.1 68.7 0.000459 2011-01-31 12:00:00 22106.04688 0
631 0 36.6 73.5 0.000371 2011-01-30 06:00:00 4751.18750 0
1 37.8 73.5 0.000310 2011-01-30 07:00:00 3506.60791 0
2 35.4 74.1 0.001001 2011-01-30 08:00:00 40545.02734 0

214 rows × 6 columns

  • several categories (AND operator)

[16]:
tr[["strong", "long_lived"]]
[16]:
lon lat vo time area vortex_type
track_idx row_idx
93 0 1.5 75.0 0.000418 2011-01-04 02:00:00 55228.76953 1
1 3.6 75.3 0.000403 2011-01-04 03:00:00 59789.96875 1
2 6.6 75.9 0.000403 2011-01-04 04:00:00 58341.80859 1
3 7.5 75.9 0.000421 2011-01-04 05:00:00 64806.83984 3
4 4.8 74.7 0.000431 2011-01-04 06:00:00 60390.07812 1
5 5.7 74.7 0.000449 2011-01-04 07:00:00 56248.26172 1
6 4.2 74.1 0.000444 2011-01-04 08:00:00 54530.76562 1
7 3.9 73.8 0.000436 2011-01-04 09:00:00 51079.36719 1
8 13.5 75.9 0.000470 2011-01-04 10:00:00 63191.57031 1
9 14.7 76.5 0.000484 2011-01-04 11:00:00 64762.19922 1
10 15.6 76.5 0.000516 2011-01-04 12:00:00 17545.86914 1
11 16.2 76.5 0.000528 2011-01-04 13:00:00 19758.55664 1
12 18.3 77.1 0.000524 2011-01-04 14:00:00 3609.31030 1
13 18.3 77.1 0.000749 2011-01-04 15:00:00 23973.10352 1
14 18.6 77.1 0.000935 2011-01-04 16:00:00 23797.58789 1
15 18.9 77.1 0.001016 2011-01-04 17:00:00 29484.75586 3
16 19.2 77.1 0.000953 2011-01-04 18:00:00 31794.38477 3
17 19.5 77.1 0.000889 2011-01-04 19:00:00 36683.43750 3
18 20.1 77.1 0.000815 2011-01-04 20:00:00 37243.85547 3
19 21.0 77.1 0.000724 2011-01-04 21:00:00 20711.82031 3
20 23.4 77.1 0.000725 2011-01-04 22:00:00 38624.17188 3
21 24.3 77.1 0.000815 2011-01-04 23:00:00 34974.03125 0
22 25.2 77.1 0.000875 2011-01-05 00:00:00 28838.16016 0
23 25.8 77.1 0.000853 2011-01-05 01:00:00 29850.99805 0
24 27.0 77.4 0.000763 2011-01-05 02:00:00 32273.25391 0
25 27.9 77.4 0.000795 2011-01-05 03:00:00 32486.57227 0
26 28.5 77.4 0.000776 2011-01-05 04:00:00 33306.07812 0
27 29.7 77.7 0.000727 2011-01-05 05:00:00 31798.09180 0
28 30.0 77.7 0.000766 2011-01-05 06:00:00 32786.72266 0
29 31.2 78.0 0.000711 2011-01-05 07:00:00 32779.28516 0
... ... ... ... ... ... ... ...
569 54 35.7 74.4 0.001075 2011-01-30 07:00:00 37938.28906 0
55 35.4 74.1 0.001001 2011-01-30 08:00:00 40545.02734 0
56 36.3 74.1 0.000975 2011-01-30 09:00:00 43897.41797 0
57 36.3 73.8 0.000865 2011-01-30 10:00:00 53348.92188 0
58 36.9 73.5 0.000867 2011-01-30 11:00:00 52527.41797 0
59 37.2 73.5 0.000825 2011-01-30 12:00:00 54444.71094 0
60 37.8 73.2 0.000768 2011-01-30 13:00:00 83394.43750 0
61 38.7 72.9 0.000728 2011-01-30 14:00:00 83738.90625 0
62 39.3 72.6 0.000686 2011-01-30 15:00:00 88395.61719 0
63 40.2 72.3 0.000621 2011-01-30 16:00:00 61690.12891 0
64 42.0 72.3 0.000602 2011-01-30 17:00:00 91611.58594 0
65 42.6 72.0 0.000626 2011-01-30 18:00:00 40642.46875 0
66 43.5 72.0 0.000652 2011-01-30 19:00:00 42448.89062 0
67 44.1 71.7 0.000658 2011-01-30 20:00:00 62467.35938 0
68 44.7 71.7 0.000662 2011-01-30 21:00:00 39607.70312 0
69 45.3 71.7 0.000622 2011-01-30 22:00:00 51207.28125 0
70 45.3 71.7 0.000600 2011-01-30 23:00:00 53267.82812 0
71 45.9 71.4 0.000554 2011-01-31 00:00:00 52988.43750 0
72 46.2 71.4 0.000532 2011-01-31 01:00:00 53947.32031 0
73 45.6 71.1 0.000495 2011-01-31 02:00:00 51971.46875 0
74 46.5 71.1 0.000479 2011-01-31 03:00:00 56551.71875 0
75 46.5 70.8 0.000435 2011-01-31 04:00:00 56018.83594 0
76 48.0 70.2 0.000429 2011-01-31 05:00:00 56886.23438 0
77 48.0 69.9 0.000453 2011-01-31 06:00:00 51720.19141 0
78 48.0 69.6 0.000497 2011-01-31 07:00:00 43837.63281 0
79 49.2 69.9 0.000508 2011-01-31 08:00:00 41850.42578 0
80 49.5 69.6 0.000475 2011-01-31 09:00:00 37235.42188 0
81 50.1 69.6 0.000485 2011-01-31 10:00:00 34360.56641 0
82 50.1 69.6 0.000354 2011-01-31 11:00:00 15396.19336 0
83 50.1 68.7 0.000459 2011-01-31 12:00:00 22106.04688 0

211 rows × 6 columns

In the same fashion, the size of each subset can be checked.

[17]:
tr.size("strong")
[17]:
5
[18]:
tr.size(["long_lived", "strong"])
[18]:
4

Group-by operation can also be used to iterate over tracks within a subset.

[19]:
tr["strong"].gb
[19]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f17ed69b898>

Where is the categorisation data stored?

After categorisation is applied to TrackRun, the attribute TrackRun.cats is assigned to a pandas.DataFrame with boolean values containing the True/False flags for each track and category.

[20]:
tr.cats.head(10)
[20]:
long_lived far_travelled_and_very_long_lived strong
track_idx
0 False False False
1 True False False
2 False False False
3 False False False
4 True False False
5 False False False
6 True False False
7 True False False
8 False False False
9 False False False

Note by default, classify() clears previous categories. To preserve them, use clear=False keyword.

A shortcut to view the available categories is

[21]:
tr.cat_labels
[21]:
['long_lived', 'far_travelled_and_very_long_lived', 'strong']

More complex functions as filters

It is possible to categorise tracks by their proximity to the coast (land) or other masked points in an array with geographical coordinates. For convenience, octant.misc module contains check_by_mask() function that checks if a cyclone track stays close to land points or domain boundaries for a long enough time. This function is essentially a wrapper around octant.utils.mask_tracks() function.

[22]:
import xarray as xr

from octant.misc import check_by_mask

First, reload the TrackRun just in case.

[23]:
tr = TrackRun(track_res_dir)

Load land-sea mask array from ERA5 dataset:

[24]:
lsm = xr.open_dataarray(sample_dir / dataset / "lsm.nc")
lsm = lsm.squeeze()  # remove singular time dimension

Importantly, the classify() method expects functions that only take 1 argument of type OctantTrack, so to use the function above, we need to construct a partial function using functools from the standard library.

[25]:
from functools import partial
[26]:
land_mask_fun = partial(check_by_mask, trackrun=tr, lsm=lsm, dist=75.)  # and leave other parameters default

This new function has been supplied with all the additional arguments, and can take only OctantTrack, which is exactly what classify() needs. It is then passed as a second filtering function to the list of conditions:

[27]:
new_conditions = [
    ("good_candidates", [lambda ot: ot.lifetime_h >= 6, land_mask_fun]),
    (
        "pmc",
        [
            lambda ot: ((ot.vortex_type != 0).sum() / ot.shape[0] < 0.2)
            and (ot.gen_lys_dist_km > 300.0)
        ],
    ),
]
[28]:
%%time
tr.classify(new_conditions, inclusive=True)
CPU times: user 22 s, sys: 8 ms, total: 22 s
Wall time: 22 s
[29]:
tr
[29]:
Cyclone tracking results
Categories
671in total
of which101good_candidates
of which36pmc|good_candidates
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

Categorise by percentile

TrackRun also has a method to select a subset of tracks depending on a statistic.

For example, to select tracks with maximum vorticity in the top 20% (greater than) you can do:

[30]:
tr.categorise_by_percentile(by="max_vort", subset="pmc|good_candidates", perc=80, oper="gt")
[31]:
tr
[31]:
Cyclone tracking results
Categories
671in total
of which101good_candidates
of which36pmc|good_candidates
of which7max_vort__gt__80pc|pmc|good_candidates
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

… or to find the weakest 5% of “good candidates”:

[32]:
tr.categorise_by_percentile("max_vort", subset="good_candidates", perc=5, oper="le")
[33]:
tr
[33]:
Cyclone tracking results
Categories
671in total
of which101good_candidates
of which36pmc|good_candidates
of which7max_vort__gt__80pc|pmc|good_candidates
of which6max_vort__le__5pc|good_candidates
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

Percentile selection with a custom function

Apart from available reducers of OctantTrack, e.g. max_vort, lifetime_h, total_dist_km, it is possible to construct a custom one and use it to select tracks above a certain percentile.

This is achieved by passing a tuple of (label, function) as by= argument. In this case the function will be applied to each of the tracks within TrackRun and it should return only one value.

For example:

[34]:
import numpy as np
def fun(track):
    """Find easternmost point of the track."""
    return np.nanmin(track.lon.values)

tr.categorise_by_percentile(by=("easternmost", fun), subset="good_candidates", perc=10, oper="le")

tr
[34]:
Cyclone tracking results
Categories
671in total
of which101good_candidates
of which36pmc|good_candidates
of which7max_vort__gt__80pc|pmc|good_candidates
of which6max_vort__le__5pc|good_candidates
of which11easternmost__le__10pc|good_candidates
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test

Clear categories

Categories can be removed one by one or all together. It is also possible to “overwrite” the inclusivity within this function.

[35]:
tr.clear_categories(subset="good_candidates", inclusive=False)
[36]:
tr
[36]:
Cyclone tracking results
Categories
671in total
of which36pmc|good_candidates
of which7max_vort__gt__80pc|pmc|good_candidates
of which6max_vort__le__5pc|good_candidates
of which11easternmost__le__10pc|good_candidates
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test
[37]:
tr.clear_categories()
[38]:
tr
[38]:
Cyclone tracking results
Number of tracks 671
Data columns lon, lat, vo, time, area, vortex_type
Sources
sample_data/era5/run000/test