5 responses to “Efficient bit-scanning in Lisp”

  1. Paul Khuong

    Using declarations to lie to SBCL is a really bad idea; if the compiler is just clever enough, you might end up seeing your code “optimised” into nothingness.

    In addition to good support for machine integer arithmetic, SBCL is also able to identify overflowing arithmetic operations (like C implicitly does for unsigned integers), in portable CL code. There is currently no such cleverness for arithmetic negation in the master branch, but I intend to merge the 16 LOC to do so in the next release. The result is:

    CL-USER> (defun bitscan2 (bb)
    (declare (optimize speed)
    (type (unsigned-byte 64) bb))
    (let ((ashbb (ash (ldb (byte 64 0)
    (* (logand bb (- bb))
    (aref index64 ashbb)))
    CL-USER> (disassemble *)
    ; disassembly for BITSCAN2
    ; Size: 62 bytes
    ; 0874B0AD: 488BC2 MOV RAX, RDX ; no-arg-parsing entry point
    ; B0: 48F7D8 NEG RAX
    ; B3: 4821C2 AND RDX, RAX
    ; B6: 48BBC2284E9AE5D5ED07 MOV RBX, 571347909858961602
    ; C0: 488BC2 MOV RAX, RDX
    ; C3: 48F7E3 MUL RAX, RBX
    ; C6: 488BC8 MOV RCX, RAX
    ; C9: 48C1E93A SHR RCX, 58
    ; CD: 48D1E1 SHL RCX, 1
    ; D0: 488B0529FFFFFF MOV RAX, [RIP-215] ; #(63 0 58
    ; 1 …)
    ; D7: 0FB7540801 MOVZX EDX, WORD PTR [RAX+RCX+1]
    ; DC: 48D1E2 SHL RDX, 1
    ; DF: 488BE5 MOV RSP, RBP
    ; E2: F8 CLC
    ; E3: 5D POP RBP
    ; E4: C3 RET
    ; E5: 0F0B0A BREAK 10 ; error trap
    ; E8: 02 BYTE #X02
    ; EA: 9A BYTE #X9A ; RCX

    The CL source is fully portable: it will output the same result on any conforming implementation. It does so by explicit stating what is implicit in the C code: we’re only interested in the lower 64 bits of the multiplication. FWIW, without the (LDB …) wrapped around it, the negation would still be performed in modular arithmetic (its result is immediately ANDed with a 64 bit value).

    In portable CL again, integer-length is a slightly souped up BSR.

    We try to make sure SBCL enables the development of safe, mostly portable and performant code. If you find yourself doing questionable hacks involving declarations and (safety 0), I would love to hear about your problems on the mailing list. It may be possible to rewrite the code into something more palatable; failing that, the request might provide the drive to improve things, particularly low-hanging fruits.

  2. Foo


  3. メンズ 腕時計 ランキング

    シチズン 腕時計

Leave a Reply