Английская Википедия:Find first set

Материал из Онлайн справочника
Перейти к навигацииПерейти к поиску

Шаблон:Use dmy dates In computer software and hardware, find first set (ffs) or find first one is a bit operation that, given an unsigned machine word,[nb 1] designates the index or position of the least significant bit set to one in the word counting from the least significant bit position. A nearly equivalent operation is count trailing zeros (ctz) or number of trailing zeros (ntz), which counts the number of zero bits following the least significant one bit. The complementary operation that finds the index or position of the most significant set bit is log base 2, so called because it computes the binary logarithm Шаблон:Math.[1] This is closely related to count leading zeros (clz) or number of leading zeros (nlz), which counts the number of zero bits preceding the most significant one bit.[nb 2] There are two common variants of find first set, the POSIX definition which starts indexing of bits at 1,[2] herein labelled ffs, and the variant which starts indexing of bits at zero, which is equivalent to ctz and so will be called by that name.

Most modern CPU instruction set architectures provide one or more of these as hardware operators; software emulation is usually provided for any that aren't available, either as compiler intrinsics or in system libraries.

Examples

Given the following 32-bit word:

0000 0000 0000 0000 1000 0000 0000 1000

The count trailing zeros operation would return 3, while the count leading zeros operation returns 16. The count leading zeros operation depends on the word size: if this 32-bit word were truncated to a 16-bit word, count leading zeros would return zero. The find first set operation would return 4, indicating the 4th position from the right. The truncated log base 2 is 15.

Similarly, given the following 32-bit word, the bitwise negation of the above word:

1111 1111 1111 1111 0111 1111 1111 0111

The count trailing ones operation would return 3, the count leading ones operation would return 16, and the find first zero operation ffz would return 4.

If the word is zero (no bits set), count leading zeros and count trailing zeros both return the number of bits in the word, while ffs returns zero. Both log base 2 and zero-based implementations of find first set generally return an undefined result for the zero word.

Hardware support

Many architectures include instructions to rapidly perform find first set and/or related operations, listed below. The most common operation is count leading zeros (clz), likely because all other operations can be implemented efficiently in terms of it (see Properties and relations).

Platform Mnemonic Name Operand widths Description On application to 0
ARM (ARMv5T architecture and later)
except Cortex-M0/M0+/M1/M23
clz[3] Count Leading Zeros 32 clz 32
ARM (ARMv8-A architecture) clz Count Leading Zeros 32, 64 clz Operand width
AVR32 clz[4] Count Leading Zeros 32 clz 32
DEC Alpha ctlz[5] Count Leading Zeros 64 clz 64
cttz[5] Count Trailing Zeros 64 ctz 64
Intel 80386 and later bsf[6] Bit Scan Forward 16, 32, 64 ctz Undefined; sets zero flag
bsr[6] Bit Scan Reverse 16, 32, 64 Log base 2 Undefined; sets zero flag
x86 supporting BMI1 or ABM lzcnt[7] Count Leading Zeros 16, 32, 64 clz Operand width; sets carry flag
x86 supporting BMI1 tzcnt[8] Count Trailing Zeros 16, 32, 64 ctz Operand width; sets carry flag
Itanium clz[9] Count Leading Zeros 64 clz 64
MIPS32/MIPS64 clz[10][11] Count Leading Zeros in Word 32, 64 clz Operand width
clo[10][11] Count Leading Ones in Word 32, 64 clo Operand width
Motorola 68020 and later bfffo[12] Find First One in Bit Field Arbitrary Log base 2 Field offset + field width
PDP-10 jffo Jump if Find First One 36 clz 0; no operation
POWER/PowerPC/Power ISA cntlz/cntlzw/cntlzd[13] Count Leading Zeros 32, 64 clz Operand width
Power ISA 3.0 and later cnttzw/cnttzd[14] Count Trailing Zeros 32, 64 ctz Operand width
RISC-V ("B" Extension) clz[15] Count Leading Zeros 32, 64 clz Operand width
ctz[15] Count Trailing Zeros 32, 64 ctz Operand width
SPARC Oracle Architecture 2011 and later lzcnt (synonym: lzd)[16] Leading Zero Count 64 clz 64
VAX ffs[17] Find First Set 0–32 ctz Operand width; sets zero flag
IBM z/Architecture flogr[18] Find Leftmost One 64 clz 64
vclz[18] Vector Count Leading Zeroes 8, 16, 32, 64 clz Operand width
vctz[18] Vector Count Trailing Zeroes 8, 16, 32, 64 ctz Operand width

On some Alpha platforms CTLZ and CTTZ are emulated in software.

Tool and library support

A number of compiler and library vendors supply compiler intrinsics or library functions to perform find first set and/or related operations, which are frequently implemented in terms of the hardware instructions above:

Tool/library Name Type Input type(s) Notes On application to 0
POSIX.1 compliant libc
4.3BSD libc
OS X 10.3 libc[2][19]
ffs Library function int Includes glibc. POSIX does not supply the complementary log base 2 / clz. 0
FreeBSD 5.3 libc
OS X 10.4 libc[19]
ffsl
fls
flsl
Library function int,
long
fls("find last set") computes (log base 2) + 1. 0
FreeBSD 7.1 libc[20] ffsll
flsll
Library function long long 0
GCC 3.4.0[21][22]
Clang 5.x[23][24]
__builtin_ffs[l,ll,imax]
__builtin_clz[l,ll,imax]
__builtin_ctz[l,ll,imax]
Built-in functions unsigned int,
unsigned long,
unsigned long long,
uintmax_t
GCC documentation considers result undefined clz and ctz on 0. 0 (ffs)
Visual Studio 2005 _BitScanForward[25]
_BitScanReverse[26]
Compiler intrinsics unsigned long,
unsigned __int64
Separate return value to indicate zero input Undefined
Visual Studio 2008 __lzcnt[27] Compiler intrinsic unsigned short,
unsigned int,
unsigned __int64
Relies on hardware support for the lzcnt instruction introduced in BMI1 or ABM. Operand width
Visual Studio 2012 _arm_clz[28] Compiler intrinsic unsigned int Relies on hardware support for the clz instruction introduced in the ARMv5T architecture and later. ?
Intel C++ Compiler _bit_scan_forward
_bit_scan_reverse[29][30]
Compiler intrinsics int Undefined
Nvidia CUDA[31] __clz Functions 32-bit, 64-bit Compiles to fewer instructions on the GeForce 400 series 32
__ffs 0
LLVM llvm.ctlz.*
llvm.cttz.*[32]
Intrinsic 8, 16, 32, 64, 256 LLVM assembly language Operand width, if 2nd argument is 0; undefined otherwise
GHC 7.10 (base 4.8), in Data.BitsШаблон:Citation needed countLeadingZeros
countTrailingZeros
Library function FiniteBits b => b Haskell programming language Operand width
C++20 standard library, in header <bit>[33][34] bit_ceil bit_floor
bit_width
countl_zero countl_one
countr_zero countr_one
Library function unsigned char,
unsigned short,
unsigned int,
unsigned long,
unsigned long long

Properties and relations

If bits are labeled starting at 1 (which is the convention used in this article), then count trailing zeros and find first set operations are related by Шаблон:Math (except when the input is zero). If bits are labeled starting at Шаблон:Math, then count trailing zeros and find first set are exactly equivalent operations. Given Шаблон:Math bits per word, the Шаблон:Math is easily computed from the Шаблон:Math and vice versa by Шаблон:Math.

As demonstrated in the example above, the find first zero, count leading ones, and count trailing ones operations can be implemented by negating the input and using find first set, count leading zeros, and count trailing zeros. The reverse is also true.

On platforms with an efficient log2 operation such as M68000, Шаблон:Math can be computed by:

Шаблон:Math

where Шаблон:Math denotes bitwise AND and Шаблон:Math denotes the two's complement of Шаблон:Math. The expression Шаблон:Math clears all but the least-significant Шаблон:Math bit, so that the most- and least-significant Шаблон:Math bit are the same.

On platforms with an efficient count leading zeros operation such as ARM and PowerPC, Шаблон:Math can be computed by:

Шаблон:Math.

Conversely, on machines without Шаблон:Math or Шаблон:Math operators, Шаблон:Math can be computed using Шаблон:Math, albeit inefficiently:

Шаблон:Math (which depends on Шаблон:Math returning Шаблон:Math for the zero input)

On platforms with an efficient Hamming weight (population count) operation such as SPARC's POPC[35][36] or Blackfin's ONES,[37] there is:

Шаблон:Math,[38][39] or Шаблон:Math,
Шаблон:Math[35]
Шаблон:Math

where Шаблон:Math denotes bitwise exclusive-OR, Шаблон:Math denotes bitwise OR and Шаблон:Math denotes bitwise negation.

The inverse problem (given Шаблон:Math, produce an Шаблон:Math such that Шаблон:Math) can be computed with a left-shift (Шаблон:Math).

Find first set and related operations can be extended to arbitrarily large bit arrays in a straightforward manner by starting at one end and proceeding until a word that is not all-zero (for Шаблон:Math, Шаблон:Math, Шаблон:Math) or not all-one (for Шаблон:Math, Шаблон:Math, Шаблон:Math) is encountered. A tree data structure that recursively uses bitmaps to track which words are nonzero can accelerate this.

Software emulation

Most CPUs dating from the late 1980s onward have bit operators for ffs or equivalent, but a few modern ones like some of the ARM-Mx series do not. In lieu of hardware operators for ffs, clz and ctz, software can emulate them with shifts, integer arithmetic and bitwise operators. There are several approaches depending on architecture of the CPU and to a lesser extent, the programming language semantics and compiler code generation quality. The approaches may be loosely described as linear search, binary search, search+table lookup, de Bruijn multiplication, floating point conversion/exponent extract, and bit operator (branchless) methods. There are tradeoffs between execution time and storage space as well as portability and efficiency.

Software emulations are usually deterministic. They return a defined result for all input values; in particular, the result for an input of all zero bits is usually 0 for ffs, and the bit length of the operand for the other operations.

If one has a hardware clz or equivalent, ctz can be efficiently computed with bit operations, but the converse is not true: clz is not efficient to compute in the absence of a hardware operator.

2n

The function Шаблон:Math (round up to the nearest power of two) using shifts and bitwise ORs[40] is not efficient to compute as in this 32-bit example and even more inefficient if we have a 64-bit or 128-bit operand:

function pow2(x):
    if x = 0 return invalid  // invalid is implementation defined (not in [0,63])
    x ← x - 1
    for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y)
    return x + 1

FFS

Since ffs = ctz + 1 (POSIX) or ffs = ctz (other implementations), the applicable algorithms for ctz may be used, with a possible final step of adding 1 to the result, and returning 0 instead of the operand length for input of all zero bits.

CTZ

Шаблон:AnchorThe canonical algorithm is a loop counting zeros starting at the LSB until a 1-bit is encountered:

function ctz1 (x)
    if x = 0 return w
    t ← 1
    r ← 0
    while (x & t) = 0
        t ← t << 1
        r ← r + 1
    return r

This algorithm executes O(n) time and operations, and is impractical in practice due to a large number of conditional branches.

Шаблон:AnchorA lookup table can eliminate most branches:

table[0..2n-1] = ctz(i) for i in 0..2n-1
function ctz2 (x)
    if x = 0 return w
    r ← 0
    loop
        if (x & (2n-1)) ≠ 0
            return r + table[x & (2n-1)]
        x ← x >> n
        r ← r + n

The parameter n is fixed (typically 8) and represents a time–space tradeoff. The loop may also be fully unrolled. But as a linear lookup, this approach is still O(n) in the number of bits in the operand.

Шаблон:AnchorA binary search implementation takes a logarithmic number of operations and branches, as in this 32-bit version:[41][42] This algorithm can be assisted by a table as well, replacing the bottom three "if" statements with a 256 entry lookup table using the first non-zero byte encountered as an index.

function ctz3 (x)
    if x = 0 return 32
    n ← 0
    if (x & 0x0000FFFF) = 0: n ← n + 16, x ← x >> 16
    if (x & 0x000000FF) = 0: n ← n +  8, x ← x >>  8
    if (x & 0x0000000F) = 0: n ← n +  4, x ← x >>  4
    if (x & 0x00000003) = 0: n ← n +  2, x ← x >>  2
    if (x & 0x00000001) = 0: n ← n +  1
    return n

Шаблон:AnchorIf the hardware has a clz operator, the most efficient approach to computing ctz is thus:

function ctz4 (x)
    x &= -x
    return w - (clz(x) + 1)

Шаблон:AnchorAn algorithm for 32-bit ctz uses de Bruijn sequences to construct a minimal perfect hash function that eliminates all branches.[43][44] This algorithm assumes that the result of the multiplication is truncated to 32 bit.

for i from 0 to 31: table[ ( 0x077CB531 * ( 1 << i ) ) >> 27 ] ← i  // table [0..31] initialized
function ctz5 (x)
    return table[((x & -x) * 0x077CB531) >> 27]

The expression (x & -x) again isolates the least-significant 1 bit. There are then only 32 possible words, which the unsigned multiplication and shift hash to the correct position in the table. (This algorithm does not handle the zero input.)

CLZ

Шаблон:AnchorThe canonical algorithm examines one bit at a time starting from the MSB until a non-zero bit is found, as shown in this example. It executes in O(n) time where n is the bit-length of the operand, and is not a practical algorithm for general use.

function clz1 (x)
    if x = 0 return w
    t ← 1 << (w - 1)
    r ← 0
    while (x & t) = 0
        t ← t >> 1
        r ← r + 1
    return r

Шаблон:AnchorAn improvement on the previous looping approach examines eight bits at a time then uses a 256 (28) entry lookup table for the first non-zero byte. This approach, however, is still O(n) in execution time.

function clz2 (x)
    if x = 0 return w
    t ← 0xff << (w - 8)
    r ← 0
    while (x & t) = 0
        t ← t >> 8
        r ← r + 8
    return r + table[x >> (w - 8 - r)]

Шаблон:AnchorBinary search can reduce execution time to O(log2n):

function clz3 (x)
    if x = 0 return 32
    n ← 0
    if (x & 0xFFFF0000) = 0: n ← n + 16, x ← x << 16
    if (x & 0xFF000000) = 0: n ← n +  8, x ← x <<  8
    if (x & 0xF0000000) = 0: n ← n +  4, x ← x <<  4
    if (x & 0xC0000000) = 0: n ← n +  2, x ← x <<  2
    if (x & 0x80000000) = 0: n ← n +  1
    return n

The fastest portable approaches to simulate clz are a combination of binary search and table lookup: an 8-bit table lookup (28=256 1-byte entries) can replace the bottom 3 branches in binary search. 64-bit operands require an additional branch. A larger width lookup can be used but the maximum practical table size is limited by the size of L1 data cache on modern processors, which is 32 KB for many. Saving a branch is more than offset by the latency of an L1 cache miss.

Шаблон:AnchorAn algorithm similar to de Bruijn multiplication for CTZ works for CLZ, but rather than isolating the most-significant bit, it rounds up to the nearest integer of the form 2n−1 using shifts and bitwise ORs:[45]

table[0..31] = {0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
                8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31}
function clz4 (x)
    for each y in {1, 2, 4, 8, 16}: x ← x | (x >> y)
    return table[((x * 0x07C4ACDD) >> 27) % 32]

Шаблон:AnchorFor processors with deep pipelines, like Prescott and later Intel processors, it may be faster to replace branches by bitwise AND and OR operators (even though many more instructions are required) to avoid pipeline flushes for mispredicted branches (and these types of branches are inherently unpredictable):

function clz5 (x)
    r = (x > 0xFFFF) << 4; x >>= r;
    q = (x > 0xFF  ) << 3; x >>= q; r |= q;
    q = (x > 0xF   ) << 2; x >>= q; r |= q;
    q = (x > 0x3   ) << 1; x >>= q; r |= q;
                                    r |= (x >> 1);
    return r;

Шаблон:AnchorOn platforms that provide hardware conversion of integers to floating point, the exponent field can be extracted and subtracted from a constant to compute the count of leading zeros. Corrections are needed to account for rounding errors.[41][46] Floating point conversion can have substantial latency. This method is highly non-portable and not usually recommended.

int x; 
int r;
union { unsigned int u[2]; double d; } t;

t.u[LE] = 0x43300000;  // LE is 1 for little-endian
t.u[!LE] = x;
t.d -= 4503599627370496.0;
r = (t.u[LE] >> 20) - 0x3FF;  // log2
r++;  // CLZ

Applications

The count leading zeros (clz) operation can be used to efficiently implement normalization, which encodes an integer as m × 2e, where m has its most significant bit in a known position (such as the highest position). This can in turn be used to implement Newton–Raphson division, perform integer to floating point conversion in software, and other applications.[41][47]

Count leading zeros (clz) can be used to compute the 32-bit predicate "x = y" (zero if true, one if false) via the identity Шаблон:Tt, where ">>" is unsigned right shift.[48] It can be used to perform more sophisticated bit operations like finding the first string of n 1 bits.[49] The expression Шаблон:Tt is an effective initial guess for computing the square root of a 32-bit integer using Newton's method.[50] CLZ can efficiently implement null suppression, a fast data compression technique that encodes an integer as the number of leading zero bytes together with the nonzero bytes.[51] It can also efficiently generate exponentially distributed integers by taking the clz of uniformly random integers.[41]

The log base 2 can be used to anticipate whether a multiplication will overflow, since Шаблон:Math.[52]

Count leading zeros and count trailing zeros can be used together to implement Gosper's loop-detection algorithm,[53] which can find the period of a function of finite range using limited resources.[42]

The binary GCD algorithm spends many cycles removing trailing zeros; this can be replaced by a count trailing zeros (ctz) followed by a shift. A similar loop appears in computations of the hailstone sequence.

A bit array can be used to implement a priority queue. In this context, find first set (ffs) is useful in implementing the "pop" or "pull highest priority element" operation efficiently. The Linux kernel real-time scheduler internally uses sched_find_first_bit() for this purpose.[54]

The count trailing zeros operation gives a simple optimal solution to the Tower of Hanoi problem: the disks are numbered from zero, and at move k, disk number ctz(k) is moved the minimum possible distance to the right (circling back around to the left as needed). It can also generate a Gray code by taking an arbitrary word and flipping bit ctz(k) at step k.[42]

See also

Notes

Шаблон:Reflist

References

Шаблон:Reflist

Further reading

External links


Ошибка цитирования Для существующих тегов <ref> группы «nb» не найдено соответствующего тега <references group="nb"/>

  1. Ошибка цитирования Неверный тег <ref>; для сносок Anderson_1 не указан текст
  2. 2,0 2,1 Ошибка цитирования Неверный тег <ref>; для сносок Linux_2012_FFS3 не указан текст
  3. Ошибка цитирования Неверный тег <ref>; для сносок ARM_2012_CLZ не указан текст
  4. Ошибка цитирования Неверный тег <ref>; для сносок Atmel_AVR32 не указан текст
  5. 5,0 5,1 Ошибка цитирования Неверный тег <ref>; для сносок Compaq_2002_Alpha не указан текст
  6. 6,0 6,1 Ошибка цитирования Неверный тег <ref>; для сносок Intel_64-32_DM-2A не указан текст
  7. Ошибка цитирования Неверный тег <ref>; для сносок AMD_2011_AMD64 не указан текст
  8. Ошибка цитирования Неверный тег <ref>; для сносок AMD_2013_AMD64 не указан текст
  9. Ошибка цитирования Неверный тег <ref>; для сносок Intel_Itanium_DM-3 не указан текст
  10. 10,0 10,1 Ошибка цитирования Неверный тег <ref>; для сносок MIPS_2011_32 не указан текст
  11. 11,0 11,1 Ошибка цитирования Неверный тег <ref>; для сносок MIPS_2011_64 не указан текст
  12. Ошибка цитирования Неверный тег <ref>; для сносок Motorola_1992 не указан текст
  13. Ошибка цитирования Неверный тег <ref>; для сносок Frey_PowerPC не указан текст
  14. Ошибка цитирования Неверный тег <ref>; для сносок IBM_PowerISA не указан текст
  15. 15,0 15,1 Ошибка цитирования Неверный тег <ref>; для сносок Wolf_2019_RISC-V-B не указан текст
  16. Ошибка цитирования Неверный тег <ref>; для сносок Oracle_2011_SPARC не указан текст
  17. Ошибка цитирования Неверный тег <ref>; для сносок DEC_1987_VAX не указан текст
  18. 18,0 18,1 18,2 Ошибка цитирования Неверный тег <ref>; для сносок IBM_Z_C22 не указан текст
  19. 19,0 19,1 Ошибка цитирования Неверный тег <ref>; для сносок Apple_1994_FFS3 не указан текст
  20. Ошибка цитирования Неверный тег <ref>; для сносок FreeBSD_2012_FFS3 не указан текст
  21. Ошибка цитирования Неверный тег <ref>; для сносок GCC_2015_Functions не указан текст
  22. Ошибка цитирования Неверный тег <ref>; для сносок GCC_2015_Changes не указан текст
  23. Ошибка цитирования Неверный тег <ref>; для сносок LLVM_Clang_Extensions не указан текст
  24. Ошибка цитирования Неверный тег <ref>; для сносок LLVM_Clang_Sources не указан текст
  25. Ошибка цитирования Неверный тег <ref>; для сносок Microsoft_2008_Intrinsics_1 не указан текст
  26. Ошибка цитирования Неверный тег <ref>; для сносок Microsoft_2008_Intrinsics_2 не указан текст
  27. Ошибка цитирования Неверный тег <ref>; для сносок Microsoft_2008_Intrinsics_3 не указан текст
  28. Ошибка цитирования Неверный тег <ref>; для сносок Microsoft_2012_Intrinsics_1 не указан текст
  29. Ошибка цитирования Неверный тег <ref>; для сносок Intel_Intrinsics_Guide не указан текст
  30. Ошибка цитирования Неверный тег <ref>; для сносок Intel_2006_Intrinsics не указан текст
  31. Ошибка цитирования Неверный тег <ref>; для сносок NVIDIA_2010_CUDA не указан текст
  32. Ошибка цитирования Неверный тег <ref>; для сносок LLVM_Intrinsic не указан текст
  33. Шаблон:Cite book
  34. Шаблон:Cite web
  35. 35,0 35,1 Ошибка цитирования Неверный тег <ref>; для сносок SPARC_1992_A41 не указан текст
  36. Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013 не указан текст
  37. Ошибка цитирования Неверный тег <ref>; для сносок AD_2001 не указан текст
  38. Ошибка цитирования Неверный тег <ref>; для сносок Dietz не указан текст
  39. Ошибка цитирования Неверный тег <ref>; для сносок Isenberg не указан текст
  40. Ошибка цитирования Неверный тег <ref>; для сносок Anderson_2 не указан текст
  41. 41,0 41,1 41,2 41,3 Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013_5-3 не указан текст
  42. 42,0 42,1 42,2 Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013_5-4 не указан текст
  43. Ошибка цитирования Неверный тег <ref>; для сносок Leiserson_1998 не указан текст
  44. Ошибка цитирования Неверный тег <ref>; для сносок Busch_2009 не указан текст
  45. Ошибка цитирования Неверный тег <ref>; для сносок Anderson_3 не указан текст
  46. Ошибка цитирования Неверный тег <ref>; для сносок Anderson_4 не указан текст
  47. Ошибка цитирования Неверный тег <ref>; для сносок Sloss_2004 не указан текст
  48. Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013_2-11 не указан текст
  49. Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013_6-2 не указан текст
  50. Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013_11-1 не указан текст
  51. Ошибка цитирования Неверный тег <ref>; для сносок Schlegel_2010 не указан текст
  52. Ошибка цитирования Неверный тег <ref>; для сносок Warren_2013_2-12 не указан текст
  53. Ошибка цитирования Неверный тег <ref>; для сносок Gosper_1972 не указан текст
  54. Ошибка цитирования Неверный тег <ref>; для сносок Aas_2005 не указан текст