FLOAT16_DECODE

The FLOAT16_DECODE function converts a blob of 16-bit "half-precision" floating-point data values into 32-bit single-precision floating point numbers.

The function assumes that the 16-bit input values are stored using the IEEE 754 binary16 (or float16) format, where the bits are:

seeeeeffffffffff

where "s" is the sign bit, "eeeee" is the 5-bit exponent (with an offset of 15), and the f's are 10 bits for the fractional (significand) part. The significand actually has 11 bits of precision because it has an implicit lead bit of 1 (unless the exponent bits are all zero).

Since IDL does not have a native 16-bit float data type, the input data must be of type Byte, Int, or Uint. For Byte input, it is assumed that each pair of bytes represents a single 16-bit binary16 number. For Int or Uint, it is assumed that each value represents the bits of the 16-bit binary16 number. In all cases, the bits will be decoded using the binary16 format, and then converted to 32-bit single-precision floats.

Note: Binary16 numbers have a limited range and precision, and are only suitable for certain applications. The smallest nonzero number is ±5.96e-08, while the largest is ±65504. Numbers larger than this are set to Infinity.

Tip: The READ_BINARY function has a FLOAT16 keyword that can be used to automatically read in float16 values from a data file.

Example

a = uint([0, 1, 0x8001, 0x3ff, 0x3c00, 0xbc00, 0x7bff, 0x7c00, 0xfc00, 0x7fff])

b = float16_decode(a)

help, b

print, b

IDL prints:

B    FLOAT  =  Array[10]

0.00000   5.96046e-08   -5.96046e-08   6.09756e-05   1.00000   -1.00000   65504.0   Inf   -Inf   NaN

Syntax

Result = FLOAT16_DECODE( Value )

Return Value

Returns a 32-bit floating-point scalar value or array of the same dimensions as the Value.

Arguments

Value

A scalar or array of type Byte, Int, or Uint in binary16/float16 format to be converted to 32-bit single-precision floating-point. If this argument is type Byte, then each pair of bytes will be combined into a single Uint number before converting to a 32-bit float.

Keywords

None

Examples

Here, we decode all possible float16 numbers and plot the spacing between neighboring floating-point values, as well as the relative spacing.

Note: We are only plotting the positive float16 values (from 5.96e-08 up to 65504). The negative float16 values give identical results.

x = float16_decode(uindgen(0x7bff))

p = plot(x[0:-2], (x[1:*] - x[0:-2]), thick=2,/xlog, /ylog, $

  dim=[900,400], layout=[2,1,1], $

  margin=[0.2, 0.15, 0.1, 0.1], font_size=12, $

  title='Difference between float16''s (x[n] $-$ x[n-1])', $

  xtitle='Float16 value', ytitle='Difference')

p = plot(x[0:-2], (x[1:*] - x[0:-2]) / x[1:*], thick=2,/xlog, /ylog, $

  layout=[2,1,2], /current, $

  margin=[0.2, 0.15, 0.1, 0.1], font_size=12, $

  title='Relative diff (x[n] $-$ x[n-1]) / x[n]', $

  xtitle='Float16 value', ytitle='Relative Difference')

Version History

9.1	Introduced