How to monitor NVMe drives in the OSX

NVMe support in OSX

After upgrade to the latest Macbook Pro i found that smartctl is not able to find any smart capable drive. This is because Apple replaces SATA SSD with NVMe one and old SMART API is not working (it is very ATA specific). Smartmontools itself includes NVMe support for the Linux, Windows and FreeBSD, so i decided to try to add it to the Darwin as well. However it was not as easy as expected – Apple did not published any source code or documentation about NVMe device support or monitoring. Moreover – there is no any tool in OSX to show such statistic and old tools from SDK are useless because of API Change.

Starting to search for the API provider

After looking on the file tree i have found good candidate: /System/Library/Extensions/NVMeSMARTLib.plugin. Its done more or less similar to the /System/Library/Extensions/SMARTLib.plugin/ which provides SATA/ATA SMART support. As i mentioned – there is no documentation, so i had to use otool, nm and lldb to deal with it. As expected, it was found that API is similar to the ATA one. You can get list of the symbols and functions using this command: nm NVMeSMARTLib | c++filt -p -i. So i tried to connect to it using modified example from SDK for the SMART. Tricky part was to find kIONVMeSMARTUserClientTypeID and kIONVMeSMARTInterfaceID values which are using by CFPLUGIN infrastructure in the IOKit to initialize API interface. Fortunately library comparing this data during runtime, so with disasm i been able to find them. After successful connect to the interface i been able to reconstruct missing headers and use some of the functions (see below)

What is working and what is not.

The most important functions SMARTReadData and GetIdentifyData are working and result is provided in the structures matching with NVMe standard. I was not able to get GetLogPage function running, probably it expects pointer to some structure with defined data. If apple will release any consumer of it it would be easy to find this out.

Also there are some other, unknown functions in this API: GetFieldCounters (always returns error), ScheduleBGRefresh (no parameters, returns ok), GetSystemCounters and GetAlgorithmCounters (some driver info? or vendor-specific log pages?). Another interesting finding was string “Sandisk 401Z128G-4p-MLC” in the SMART log page, so possibly this NVMe is originally from this vendor.

SmartMontools support

I am working to add limited NVMe support to the smartctl and smartd for OSX. Now i already have working prototype, but need to cleanup and refactor some code. I am planning to add this before the next release. Below is an output from my disk:

smartctl 6.6 2017-09-14 r4434M [Darwin 16.7.0 x86_64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===

Model Number:                       APPLE SSD AP0512J
Serial Number:                      XXXXXX
Firmware Version:                   16.14.01
PCI Vendor/Subsystem ID:            0x106b
IEEE OUI Identifier:                0x000502
Controller ID:                      0
Number of Namespaces:               2
Local Time is:                      Wed Sep 20 08:56:36 2017 CEST
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0004):   Frmw_DL
Optional NVM Commands (0x0004):     DS_Mngmt
Maximum Data Transfer Size:         256 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     0.00W       -        -    0  0  0  0        0       0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0x0)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    90%
Available Spare Threshold:          2%
Percentage Used:                    0%
Data Units Read:                    19,311,330 [9.88 TB]
Data Units Written:                 11,653,167 [5.96 TB]
Host Read Commands:                 50,388,833
Host Write Commands:                37,404,327
Controller Busy Time:               0
Power Cycles:                       2,320
Power On Hours:                     23
Unsafe Shutdowns:                   7
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Reconstructed API

If you want to play with the API yourself – you can use this header. Please let me know if you found how to use GetLogPage or any other useful information:

// NVMe definitions, non documented, experimental

// Constant to init driver
#define kIONVMeSMARTUserClientTypeID       CFUUIDGetConstantUUIDWithBytes(NULL,      \
                                        0xAA, 0x0F, 0xA6, 0xF9, 0xC2, 0xD6, 0x45, 0x7F, 0xB1, 0x0B, \
                    0x59, 0xA1, 0x32, 0x53, 0x29, 0x2F)

// Constant to use plugin interface
#define kIONVMeSMARTInterfaceID        CFUUIDGetConstantUUIDWithBytes(NULL,                  \
                    0xcc, 0xd1, 0xdb, 0x19, 0xfd, 0x9a, 0x4d, 0xaf, 0xbf, 0x95, \
                    0x12, 0x45, 0x4b, 0x23, 0xa, 0xb6)

// interface structure, obtained using lldb, could be incomplete or wrong
typedef struct IONVMeSMARTInterface
{
        IUNKNOWN_C_GUTS;

        UInt16 version;
        UInt16 revision;

                // NVMe smart data, returns nvme_smart_log structure
        IOReturn ( *SMARTReadData )( void *  interface,
                                     struct nvme_smart_log * NVMeSMARTData );

                // NVMe IdentifyData, returns nvme_id_ctrl per namespace
        IOReturn ( *GetIdentifyData )( void *  interface,
                                      struct nvme_id_ctrl * NVMeIdentifyControllerStruct,
                                      unsigned int ns );

                // Always getting kIOReturnDeviceError
        IOReturn ( *GetFieldCounters )( void *   interface,
                                        char * FieldCounters );
                // Returns 0
        IOReturn ( *ScheduleBGRefresh )( void *   interface);

                // Always returns kIOReturnDeviceError, probably expects pointer to some
                // structure as an argument
        IOReturn ( *GetLogPage )( void *  interface, void * data, unsigned int, unsigned int);


                /* GetSystemCounters Looks like a table with an attributes. Sample result:

                0x101022200: 0x01 0x00 0x08 0x00 0x00 0x00 0x00 0x00
                0x101022208: 0x00 0x00 0x00 0x00 0x02 0x00 0x08 0x00
                0x101022210: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x101022218: 0x03 0x00 0x08 0x00 0xf1 0x74 0x26 0x01
                0x101022220: 0x00 0x00 0x00 0x00 0x04 0x00 0x08 0x00
                0x101022228: 0x0a 0x91 0xb1 0x00 0x00 0x00 0x00 0x00
                0x101022230: 0x05 0x00 0x08 0x00 0x24 0x9f 0xfe 0x02
                0x101022238: 0x00 0x00 0x00 0x00 0x06 0x00 0x08 0x00
                0x101022240: 0x9b 0x42 0x38 0x02 0x00 0x00 0x00 0x00
                0x101022248: 0x07 0x00 0x08 0x00 0xdd 0x08 0x00 0x00
                0x101022250: 0x00 0x00 0x00 0x00 0x08 0x00 0x08 0x00
                0x101022258: 0x07 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x101022260: 0x09 0x00 0x08 0x00 0x00 0x00 0x00 0x00
                0x101022268: 0x00 0x00 0x00 0x00 0x0a 0x00 0x04 0x00
                .........
                0x101022488: 0x74 0x00 0x08 0x00 0x00 0x00 0x00 0x00
                0x101022490: 0x00 0x00 0x00 0x00 0x75 0x00 0x40 0x02
                0x101022498: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                */
        IOReturn ( *GetSystemCounters )( void *  interface, char *, unsigned int *);


                /* GetAlgorithmCounters returns mostly 0
                0x102004000: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004008: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004010: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004018: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004020: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004028: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004040: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004048: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004050: 0x00 0x00 0x00 0x00 0x80 0x00 0x00 0x00
                0x102004058: 0x80 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004060: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004068: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004070: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004078: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004080: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004088: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004090: 0x00 0x01 0x00 0x00 0x00 0x00 0x00 0x00
                0x102004098: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

                */
        IOReturn ( *GetAlgorithmCounters )( void *  interface, char *, unsigned int *);
} IONVMeSMARTInterface;


Advertisements
Tagged , , , ,

3 thoughts on “How to monitor NVMe drives in the OSX

  1. Harry says:

    Hello. Here are some results running the trunk version of smartctl with various NVMe disks on macOS 10.13 Release.

    ====================================================
    $ ./smartctl -x /dev/rdisk0
    smartctl 6.6 2017-10-01 r4503 [Darwin 17.0.0 x86_64] (local build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Number: APPLE SSD SM0512L
    Serial Number: C02734500XVHRDY1D
    Firmware Version: CXS5EA0Q
    PCI Vendor/Subsystem ID: 0x144d
    IEEE OUI Identifier: 0x002538
    Controller ID: 2
    Number of Namespaces: 1
    Local Time is: Mon Oct 2 15:20:59 2017 BST
    Firmware Updates (0x06): 3 Slots
    Optional Admin Commands (0x0006): Format Frmw_DL
    Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
    Maximum Data Transfer Size: 256 Pages

    Supported Power States
    St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
    0 + 6.00W - - 0 0 0 0 5 5
    1 - 0.0400W - - 1 1 1 1 210 1200
    2 - 0.0050W - - 2 2 2 2 1900 5300

    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    SMART/Health Information (NVMe Log 0x02, NSID 0x0)
    Critical Warning: 0x00
    Temperature: 30 Celsius
    Available Spare: 100%
    Available Spare Threshold: 10%
    Percentage Used: 0%
    Data Units Read: 5,113,644 [2.61 TB]
    Data Units Written: 9,796,981 [5.01 TB]
    Host Read Commands: 9,304,553
    Host Write Commands: 15,808,044
    Controller Busy Time: 50
    Power Cycles: 61
    Power On Hours: 3
    Unsafe Shutdowns: 16
    Media and Data Integrity Errors: 0
    Error Information Log Entries: 0

    Read Error Information Log failed: NVMe admin command:0x02/page:0x01 is not supported

    ====================================================
    $ ./smartctl -x /dev/rdisk2
    smartctl 6.6 2017-10-01 r4503 [Darwin 17.0.0 x86_64] (local build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Number: UBPAMAE512HCH1-HTG-UGN
    Serial Number: TW61240002 0500003
    Firmware Version: REMCD1P7
    PCI Vendor ID: 0x1cc2
    PCI Vendor Subsystem ID: 0x0340
    IEEE OUI Identifier: 0x280c28
    Controller ID: 832
    Number of Namespaces: 1
    Local Time is: Mon Oct 2 16:46:38 2017 BST
    Firmware Updates (0x12): 1 Slot, no Reset required
    Optional Admin Commands (0x0006): Format Frmw_DL
    Optional NVM Commands (0x0004): DS_Mngmt
    Maximum Data Transfer Size: 32 Pages

    Supported Power States
    St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
    0 + 0.00W - - 0 0 0 0 0 0

    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    SMART/Health Information (NVMe Log 0x02, NSID 0x0)
    Critical Warning: 0x00
    Temperature: 41 Celsius
    Available Spare: 100%
    Available Spare Threshold: 10%
    Percentage Used: 0%
    Data Units Read: 70,701 [36.1 GB]
    Data Units Written: 4,248,800 [2.17 TB]
    Host Read Commands: 283,247
    Host Write Commands: 19,909,598
    Controller Busy Time: 0
    Power Cycles: 12
    Power On Hours: 2
    Unsafe Shutdowns: 1
    Media and Data Integrity Errors: 0
    Error Information Log Entries: 0

    Read Error Information Log failed: NVMe admin command:0x02/page:0x01 is not supported

    ====================================================
    $ ./smartctl -x /dev/rdisk2
    smartctl 6.6 2017-10-01 r4503 [Darwin 17.0.0 x86_64] (local build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Number: THNSN5256GPU7 TOSHIBA
    Serial Number: 563S106ATUFV
    Firmware Version: 57GA4103
    PCI Vendor/Subsystem ID: 0x1179
    IEEE OUI Identifier: 0x00080d
    Controller ID: 0
    Number of Namespaces: 1
    Local Time is: Mon Oct 2 15:29:42 2017 BST
    Firmware Updates (0x02): 1 Slot
    Optional Admin Commands (0x0007): Security Format Frmw_DL
    Optional NVM Commands (0x000e): Wr_Unc DS_Mngmt Wr_Zero
    Warning Comp. Temp. Threshold: 78 Celsius
    Critical Comp. Temp. Threshold: 82 Celsius

    Supported Power States
    St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
    0 + 6.00W - - 0 0 0 0 0 0
    1 + 2.40W - - 1 1 1 1 0 0
    2 + 1.90W - - 2 2 2 2 0 0
    3 - 0.1600W - - 3 3 3 3 1000 1000
    4 - 0.0120W - - 4 4 4 4 5000 35000
    5 - 0.0060W - - 5 5 5 5 100000 110000

    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    SMART/Health Information (NVMe Log 0x02, NSID 0x0)
    Critical Warning: 0x00
    Temperature: 36 Celsius
    Available Spare: 100%
    Available Spare Threshold: 10%
    Percentage Used: 0%
    Data Units Read: 1,018,286 [521 GB]
    Data Units Written: 2,757,158 [1.41 TB]
    Host Read Commands: 5,044,409
    Host Write Commands: 11,461,386
    Controller Busy Time: 47
    Power Cycles: 29
    Power On Hours: 128
    Unsafe Shutdowns: 19
    Media and Data Integrity Errors: 0
    Error Information Log Entries: 0
    Warning Comp. Temperature Time: 0
    Critical Comp. Temperature Time: 0
    Temperature Sensor 1: 36 Celsius

    Read Error Information Log failed: NVMe admin command:0x02/page:0x01 is not supported

    ====================================================
    $ ./smartctl -x /dev/rdisk2
    smartctl 6.6 2017-10-01 r4503 [Darwin 17.0.0 x86_64] (local build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

    === START OF INFORMATION SECTION ===
    Model Number: Samsung SSD 950 PRO 512GB
    Serial Number: S2GMNCAGB21134T
    Firmware Version: 1B0QBXX7
    PCI Vendor/Subsystem ID: 0x144d
    IEEE OUI Identifier: 0x002538
    Controller ID: 1
    Number of Namespaces: 1
    Local Time is: Mon Oct 2 15:25:25 2017 BST
    Firmware Updates (0x06): 3 Slots
    Optional Admin Commands (0x0007): Security Format Frmw_DL
    Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
    Maximum Data Transfer Size: 32 Pages

    Supported Power States
    St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
    0 + 6.50W - - 0 0 0 0 5 5
    1 + 5.80W - - 1 1 1 1 30 30
    2 + 3.60W - - 2 2 2 2 100 100
    3 - 0.0700W - - 3 3 3 3 500 5000
    4 - 0.0050W - - 4 4 4 4 2000 22000

    === START OF SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED

    SMART/Health Information (NVMe Log 0x02, NSID 0x0)
    Critical Warning: 0x00
    Temperature: 29 Celsius
    Available Spare: 100%
    Available Spare Threshold: 10%
    Percentage Used: 0%
    Data Units Read: 7,268,674 [3.72 TB]
    Data Units Written: 50,548,708 [25.8 TB]
    Host Read Commands: 42,754,478
    Host Write Commands: 206,906,671
    Controller Busy Time: 319
    Power Cycles: 171
    Power On Hours: 1,477
    Unsafe Shutdowns: 100
    Media and Data Integrity Errors: 0
    Error Information Log Entries: 3

    Read Error Information Log failed: NVMe admin command:0x02/page:0x01 is not supported

    • sammczk says:

      Thank you for sharing! its good to know that it works not only for me. Will be a part of upcoming smartmontools release.

    • sammczk says:

      BTW, just found, all apple NVMe-s reports unrealistic “power on hours” value.

      Probably it is because

      “Power On Hours:
      Contains the number of power-on hours. This does not include time that the controller was powered and in a low power state condition.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: