= bx(100, 10, 0.5) coords_1, labels_1
Generate anchor boxes
Methods to generate anchor boxes of different aspect ratios.
To generate anchor boxes, we need three basic information:
- Input image size,
image_sz
: To position our anchor boxes within the maximum coordinates (width
,height
) of the image. - Feature map size,
feature_sz
: Feature map is the size (width
,height
) of the output of a convolutional operation. A \(10\times10\) feature map would mean \(10\times10\) local receptive field locations can be traced back into the input image. These 100 receptive field locations (\(10\times10=100\)) in the input image would act as our initial anchor box candidates.
- Aspect ratio of anchor boxes,
asp_ratio
: To generate anchor boxes with differentwidth
toheight
ratio (defaultasp_ratio=1
).
bx
bx (image_sz:(<class'int'>,<class'tuple'>), feature_sz:(<class'int'>,<class'tuple'>), asp_ratio:float=None, clip:bool=True, named:bool=True, anchor_sfx:str='a', min_visibility:float=0.25)
Calculate anchor box coords given an image size and feature size for a single aspect ratio.
Type | Default | Details | |
---|---|---|---|
image_sz | (<class ‘int’>, <class ‘tuple’>) | image size (width, height) | |
feature_sz | (<class ‘int’>, <class ‘tuple’>) | feature map size (width, height) | |
asp_ratio | float | None | aspect ratio (width:height), by default None |
clip | bool | True | whether to apply np.clip, by default True |
named | bool | True | whether to return (coords, labels), by default True |
anchor_sfx | str | a | suffix anchor label with anchor_sfx, by default “a” |
min_visibility | float | 0.25 | minimum visibility dictates the condition for a box to be considered valid. The value corresponds to the ratio of expected area of an anchor box to the calculated area after clipping to image dimensions., by default 0.25 |
Returns | typing.Union[typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[typing.Any]]]]], numpy.typing._array_like._SupportsArray[numpy.dtype], typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]], typing.Sequence[typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]], typing.Sequence[typing.Sequence[typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]], typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]]], bool, int, float, complex, str, bytes, typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]], typing.Sequence[typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]]], typing.Sequence[typing.Sequence[typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]]]], typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]]]]]] | anchor box coordinates in pascal_voc formatif named=True, a list of anchor box labels are also returned. |
Usually multiple anchor boxes with different feature_sz
and asp_ratio
are needed. This requirement arises in the case of multiscale object detection.
For multiscale object detection, feature maps from different convolution operations of the network are used to trace back into the input image, to generate anchor boxes. The bxs
method of pybx
provides this possibility.
bxs
bxs (image_sz:(<class'int'>,<class'tuple'>), feature_szs:list=None, asp_ratios:list=None, named:bool=True, **kwargs)
Calculate anchor box coords given an image size and multiple feature sizes for mutiple aspect ratios.
Type | Default | Details | |
---|---|---|---|
image_sz | (<class ‘int’>, <class ‘tuple’>) | image size (width, height) | |
feature_szs | list | None | list of feature map sizes, each feature map size being an int or tuple, by default [(8, 8), (2, 2)] |
asp_ratios | list | None | list of aspect ratios for anchor boxes, each aspect ratio being a float calculated by (width:height), by default [1 / 2.0, 1.0, 2.0] |
named | bool | True | whether to return (coords, labels), by default True |
kwargs | |||
Returns | typing.Union[typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[typing.Any]]]]], numpy.typing._array_like._SupportsArray[numpy.dtype], typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]], typing.Sequence[typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]], typing.Sequence[typing.Sequence[typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]], typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]]], bool, int, float, complex, str, bytes, typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]], typing.Sequence[typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]]], typing.Sequence[typing.Sequence[typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]]]], typing.Sequence[typing.Sequence[typing.Sequence[typing.Sequence[typing.Union[bool, int, float, complex, str, bytes]]]]]] | anchor box coordinates in pascal_voc format if named=True, a list of anchor box labels are also returned. |
= bxs(100, [10, 8, 5, 2], [1, 0.5, 0.3]) coords, labels
len(labels) coords.shape,
((579, 4), 579)
All methods work with asymetric image_sz
(and or feature_szs
as well):
= bxs((100, 200), [10, 8, 5, 2], [1, 0.5, 0.3]) coords, labels
len(labels) coords.shape,
((654, 4), 654)