A bit mask uses the bits in the binary representation of an integer as “toggles” to indicate whether certain conditions are met. For a more general introduction to this concept, consult Wikipedia or other online resources on the subject.
Within the SDSS, the most common use of a bitmask is to indicate the status of an object (or spectrum, or target, or whatever) with respect to some set of conditions. For example, a given photometric object might be saturated, and be deblended, and have some interpolated pixels. All of these conditions are tracked as “bits” in a bitmask-encoded value called “flags”. For instance, if bit 18 is set, it indicates there is a saturated pixel in the object; if bit 18 is not set, there is no saturated pixel in the object. This sort of bitmask is useful when there are many possible Boolean (true/false) conditions to track, since it doesn’t require an individual variable for each one.
Converting between hex, binary, and decimal values
Hex | Binary | Decimal |
---|---|---|
0x1 | 1 | 1 |
0x2 | 10 | 2 |
0x4 | 100 | 4 |
0x8 | 1000 | 8 |
0x10 | 10000 | 16 |
0x20 | 100000 | 32 |
0x40 | 1000000 | 64 |
0x80 | 10000000 | 128 |
… | … | … |
0x80000000 | 1000000….0 | 2147483648 |
In more detail, when we refer above to “bit 18,” we are referring to the eighteenth bit, counting right to left with the least significant digit indexed as digit number zero. Thus, if only bit 18 is set, the integer is equal to 218 = 262144. Of course, in general, many bits can be set, so the value of the variable is not necessarily a power of two. If the integer is signed, note that bit 31 indicates the sign of the integer, so the integer value of a bitmask might be interpreted by the computer as negative.
Note also that many people express bitmasks in hexadecimal instead of decimal. You can tell when people are doing this because they will start with “0x” as in “0x00000100” instead of “8”. The choice to write the numbers in hexadecimal is just a convention; the values in the files and in the Catalog Archive Server (CAS) are regular (decimal) integers. However, this choice does often make it easier to figure out which bit is being referred to. For example, it is easy to figure out that “0x00040000” is bit 18 than to figure out that 262144 is bit 18. The table above shows examples of converting among hex, binary and decimal numbers. Be aware that many programming languages provide tools for translating between binary, hex, and decimal (we provide more detailed examples of using bitmasks in Python, CAS and IDL below).
Examples of using bitmask
To get a sense of this behavior, consider some one-byte unsigned integers,a nd what their bits are:
bit 7 | bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 | bit 0 | integer value | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | = | 75 |
1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | = | 130 |
0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | = | 7 |
to check the value of one or more bits, one needs to execute a “bitwise and” on the bitmask. The simplest case is checking the value of a single bit. In C, for example, the bitwise and operator is “&”, so you can write an if statement like:
if((myflag & 4) != 0) {printf("Bit 2 is set\n"); } else {printf("Bit 2 is not set\n"); }
which will output whether or not bit 2 is set.
What is the code doing here? It is asking, for each bit, “is this bit set in both myflag and in 4?” If so, the result (an integer) will have that bit set. We can look, for example, at what this operation would look like if myflag were equal to 13:
bit 7 | bit 6 | bit 5 | bit 4 | bit 3 | bit 2 | bit 1 | bit 0 | ||
---|---|---|---|---|---|---|---|---|---|
myflag | = | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
4 | = | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
myflag & 4 | = | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Clearly the result equals “4”, so the condition is satisfied: bit 2 is set! You can easily generalize these sort of operations to bitwise “or” or “exclusive or”, or “and not”. Consult a good computer science reference for those details.
For SDSS, the important thing to know is what each bit means for each type of bitmask. If you expand this page’s Table of Contents, within-page links there will lead to tables for each bitmask-encoded variable, telling you what it means when each bit it set.
SDSS Bitmasks in Python
Activating bits
Often the first step when working with bitmasks is to generate an integer with the bits for which you want to query. The easiest way to do this is with a bitwise left shift. In Python we use the operator <<
to shift a bit a number of positions. For example, to activate bit 7 we shift it 7 positions (this is equivalent to 27)
>>> 1 << 7
128
>>> bin(1 << 7)
'0b10000000'
>>> hex(1 << 7)
'0x80'
Note that Python always prints the decimal representation of a number, but we can use the built-in functions bin
and hex
to get the strings with the binary and hexadecimal representations of the value.
We can use the bitwise OR operation (|
) to activate multiple bits. To activate bits 3 and 5 we do
>>> bits_3_5 = (1 << 3) | (1 << 5) # Parenthesis are not needed but improve readability
>>> bits_3_5
40
>>> bin(bits_3_5)
'0b101000'
Checking bits
Another useful operation is checking whether a value has certain bits active. For that we use the AND bitwise operator (&
). If the bitwise AND of the value and the bit is greater than zero, the bit is present in that value.
>>> bit_4 = 1 << 4
>>> bit_4
>>> '0b10000'
>>> 33 & bit_4 # Bit 4 not present
0
>>> 27 & bit_4 # Bit 4 present
16
>>> bin(27)
'0b11011'
When we use the AND operator with a combination of bits, the result will be non-zero if any of the bits are present
>>> 9 & bit_3_5
8
Checking if all the bits are active is the same as checking that both numbers are equal.
To check SDSS maskbits you first need to find the bit or bits that you are interested in. You can do that by checking the sections below. For example, to identify targets with a critical issue in MaNGA DRP 3D cubes, we would check the CRITICAL
bit in the MANGA_DRP3QUAL
bitmask, which corresponds to the binary position 30 (1<<30
, decimal value 1073741824).
Getting all the bits and labels from a value
Another useful operation is to retrieve all the active bits from a given value. For example, say that we know that an APOGEE-2 target has APOGEE_TARGET1
equal to 36872 and we want to know the bits and labels associated with that value. For that we need to loop over all the possible bits and check which ones are active.
>>> value = 36872
>>> bits = [bit for bit in range(0, 32) if (value & 1 << bit) > 0]
>>> bits
[3, 12, 15]
Here we use the fact that we know the APOGEE_TARGET1
maskbit has a maximum of 32 bits; if the maskbit has a different number of bits you’ll need to modify the loop accordingly.
Alternatively you can parse the file $IDLUTILS_DIR/data/sdss/sdssMaskbits.par
in the idlutils product. Several Python libraries provide help doing that.
Using pydl
You can use the pydl SDSS utilities that include the Python equivalents of the IDL bitmask tools (see the section below)
>>> from pydl.pydlutils import sdss
>>> from os import getenv
>>> sdss.set_maskbits(maskbits_file=getenv("IDLUTILS_DIR")+"/data/sdss/sdssMaskbits.par") # Always call set_maskbits to set the latest version
>>> sdss.sdss_flagval('MANGA_DRP3QUAL', 'CRITICAL')
1073741824
pydl can also retrieve the SDSS-IV sdssMaskBits.par
file directly from the idlutils SVN repo but at present is unable to retrieve them from the SDSS-V github repo:
>>> sdss.set_maskbits(idlutils_version='v5_5_33')
To get the associated labels using pydl we do
>>> from pydl.pydlutils import sdss
>>> sdss.set_maskbits(maskbits_file=getenv("IDLUTILS_DIR")+"/data/sdss/sdssMaskbits.par") # Always call set_maskbits to set the latest version
>>> [sdss.sdss_flagname('APOGEE_TARGET1', 1 << bit)[0] for bit in bits]
['APOGEE_IRAC_DERED', 'APOGEE_INTERMEDIATE', 'APOGEE_SERENDIPITOUS']
or even easier
>>> sdss.sdss_flagname('APOGEE_TARGET1', 36872)
['APOGEE_IRAC_DERED', 'APOGEE_INTERMEDIATE', 'APOGEE_SERENDIPITOUS']
Using astropy maskbits
Astropy provides several tools to work with maskbits. You can read the documentation here, which is also a good complement to this tutorial.
The main limitation of astropy is that it doesn’t natively know about the maskbits defined in sdssMaskbits.par
. However, one can programmatically generate BitFlagNameMap
instances. We’ll use pydl to read the par file (you can download it from here).
>>> from pydl.pydlutils.yanny import yanny
>>> sdssMaskbits = yanny('sdssMaskbits.par')['MASKBITS']
>>> from astropy.nddata.bitmask import extend_bit_flag_map
>>> apogee_target1 = sdssMaskbits[sdssMaskbits['flag']==b'APOGEE_TARGET1']
>>> label_to_bit = zip(apogee_target1['label'].astype('U'), 1 << apogee_target1['bit'].astype('uint32'))
>>> ext = extend_bit_flag_map('APOGEE_TARGET1', **dict(x))
>>> ext
<BitFlagNameMap 'APOGEE_TARGET1'>
>>> ext.APOGEE_DISK_RED_GIANT
4194304
And now we can use it normally with other astropy bitmask utilities.
>>> import numpy
>>> from astropy.nddata.bitmask import bitfield_to_boolean_mask
>>> bitfield_to_boolean_mask([512, 14, 217], ignore_flags='APOGEE_SCI_CLUSTER', dtype=numpy.uint8, flag_name_map=ext)
array([0, 1, 1], dtype=uint8)
Using Marvin
Marvin also provides tools for handling maskbits using the sdssMaskBits.par
file. The documentation of this can be found on the DR17 bitmask page.
SDSS Bitmasks in CAS
The query tools of the SDSS CAS allow you to check values of a bitmask variable; see the tables below for detailed information on the meanings of all bitmask values. Checking bitmasks can be included in the where
clause of an SQL query, using a form such as:
(flagname & value) != 0
where & is a bitwise AND operator.
Similarly, | is a bitwise OR operator. A simple example using the specObjAll table (up to DR17) is
SELECT plate, mjd, fiberid FROM specObjAll WHERE (zWarning & 128) != 0
or the spAll table (new for DR18) is as follows:
SELECT plate, mjd, catalogid FROM spAll WHERE (SDSSV_BOSS_TARGET0 & 256) != 0
The bitmask value can also be specified in hex, preceded by “0x”. Important: even though bitmask bits take the values {0, 1}, take care to always compare bit values using the := inequality operator, rather than the greater than operator. Because computers can sometimes interpret bitmask values as signed integers, a greater than check may fail, while an inequality check will always return correct results.
More usage of bitmasks in CAS is documented in on the SkyServer wepage. The Schema Browser has a menu on the right with an entry on Constants. Click on that, then select DataConstants to obtain a list of bitmasks.
When you click on a given bitmask (e.g., PhotoFlags
), you will see all the bitmask values as well as the documentation on the functions that return the values associated with each bitmask (e.g. fPhotoFlags
). Those lookup functions are useful for readability; however, note that they come with a performance cost, and that using the bitmask value explicitly will result in faster queries that select a large number of rows. You can do this by using the function once to obtain the binary bitmask value, then plug that into your query, e.g.
SELECT dbo.fPhotoFlags('BLENDED') + dbo.fPhotoFlags('SATURATED')
This will return the value 262152, which you can then plug into a query, e.g.,
SELECT TOP 10 objid,ra,dec,u,g,r,i,z
FROM PhotoObj
WHERE (flags & 262152) > 0
You can use the reverse functions to get the names of the flags, e.g.,
SELECT TOP 10 dbo.fPhotoFlagsN(flags) FROM PhotoObj
or
SELECT dbo.fPhotoFlagsN(262152) -- will give you back "SATURATED BLENDED"
SDSS Bitmasks in IDL
SDSS flat files contain all of the bitmasks. Using them is particularly convenient for IDL users who use the idlutils product.
Inside of idlutils, there is a file, $IDLUTILS_DIR/data/sdss/sdssMaskbits.par
, which contains a listing of all mask bits defined for SDSS.
To access these from within IDL, one uses either sdss_flagval()
or sdss_flagname()
. The first tells you the integer value corresponding to each bit mask name. The second returns the names of the bits set, given an integer.
For example, you can ask what the integer corresponding to NEGATIVE_EMISSION is as follows:
IDL> PRINT, sdss_flagval('ZWARNING', 'NEGATIVE_EMISSION')
64
Or, if ZWARNING for a spectrum is set to “246”, then you can check what that means as follows:
IDL> PRINT, sdss_flagname('ZWARNING', 246)
LITTLE_COVERAGE SMALL_DELTA_CHI2 MANY_OUTLIERS Z_FITLIMIT NEGATIVE_EMISSION UNPLUGGED
Uh oh! That spectrum must be Very Bad!
The most common usage is within a program. For example, let’s say you want to find all spectrum for which MANY_OUTLIERS is set. In IDL you can do this as follows (assuming that “spobj” is a structure which has “zwarning” as a tag):
imany= WHERE((spobj.zwarning AND sdss_flagval('ZWARNING', 'MANY_OUTLIERS')) NE 0, nmany)
which returns “imany” as an array of those elements with ZWARNING set. You should be careful in calls such as those above to always ask for the AND to return a non-zero output (do not check for LT 0
, which can get you in trouble if zwarning happens to be cast as a signed integer).
A useful tool in IDL is the ability to write out an integer in hex format. For example, the command below outputs “80000000”:
IDL> PRINT, STRING(2L^31, FORMAT='(Z)')
80000000