hit counter script

Advertisement

Quick Links

PowerScale OneFS
Event Reference Guide
April 2022

Advertisement

Table of Contents
loading

Summary of Contents for Dell PowerScale OneFS

  • Page 1 PowerScale OneFS Event Reference Guide April 2022...
  • Page 2 A WARNING indicates a potential for property damage, personal injury, or death. © 2017 - 2022 Dell Inc. or its subsidiaries. All rights reserved. Dell Technologies, Dell, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners.
  • Page 3: Table Of Contents

    Contents Chapter 1: Introduction to this guide................... 15 About this guide................................. 15 Scale-out NAS overview..............................15 Where to get help................................15 Additional options for getting help...........................15 Chapter 2: Introduction to system events..................17 Events overview................................. 17 Event groups overview..............................17 Alerts overview...................................
  • Page 4 Event notification methods............................27 Managing event notification rules........................... 27 Managing event group notification settings......................29 Chapter 4: Software events......................30 Software events overview.............................. 35 100010001................................... 35 100010002...................................35 100010003...................................36 100010004...................................36 100010005...................................36 100010006................................... 37 100010007................................... 37 100010008................................... 37 100010009................................... 37 100010010....................................37 100010011.................................... 38 100010012....................................38 100010013....................................38 100010014....................................38...
  • Page 5 100010046...................................48 100010050...................................48 100010051....................................48 100010052...................................49 100010053...................................49 100010054...................................49 100010055..................................50 100010056...................................50 100010057...................................50 100010058................................... 51 100010059................................... 51 100010060................................... 51 100010061.................................... 51 100010062...................................52 100020060..................................52 100020061...................................52 100020062..................................53 100020063..................................53 100030001...................................53 200010001...................................53 200010002..................................54 200010003..................................54 200010006..................................55 200010007..................................55 200010008..................................55 200010009..................................56 200020001..................................56 200020002..................................57 200020003..................................
  • Page 6 300010001...................................65 300010002..................................65 300010003..................................66 300020001..................................66 300020002..................................66 300020003..................................67 400020001..................................67 400030001..................................67 400030002..................................67 400040001..................................68 400040002..................................68 400040003..................................68 400040004..................................68 400040007..................................69 400040009..................................69 400040010..................................70 400040011................................... 70 400040012..................................70 400040014..................................70 400040015................................... 71 400040017................................... 71 400040018..................................
  • Page 7 400070005..................................80 400070006..................................80 400070007..................................80 400080001...................................81 400090001...................................81 400090002..................................81 400090003..................................81 400090004..................................82 400100001...................................82 400100002..................................82 400100003..................................82 400100004..................................83 400100005..................................83 400100006..................................83 400100007..................................83 400100008..................................83 400100009..................................84 400100010...................................84 400100011....................................84 400110001....................................84 400120001...................................85 400130001...................................85 400130002..................................85 400140001...................................85 400140002..................................86 400140003..................................
  • Page 8 400200001..................................94 400200002..................................94 400210001................................... 94 400210002..................................94 400210003..................................94 400210004..................................94 400210005..................................95 400210006..................................95 400210007..................................95 400210008..................................95 400220000..................................95 400230001..................................96 400240000..................................96 400240001..................................96 400240002..................................96 400240003..................................97 400240004..................................97 400240005..................................97 400250000..................................98 400260000..................................98 500010001...................................98 500010002..................................98 500010003..................................99 500010004..................................99 500010005..................................99...
  • Page 9 800010006..................................108 800010007..................................108 800010008..................................109 800010009..................................109 800010010..................................109 1100000001..................................110 1100000002..................................110 1100000003..................................110 1100000004..................................111 1100000005..................................111 1100000006..................................111 1100000007..................................112 1100000008..................................112 1100000009..................................112 Chapter 5: Hardware events....................... 113 Hardware events overview............................117 900010001..................................117 900010002..................................118 900010003..................................118 900010004..................................
  • Page 10 900020021..................................126 900020022..................................127 900020023..................................127 900020024..................................127 900020025..................................127 900020026..................................127 900020027..................................128 900020028..................................129 900020029..................................129 900020030..................................130 900020031..................................130 900020032..................................131 900020033..................................132 900020034..................................133 900020035..................................133 900060001..................................134 900060002..................................134 900060003..................................135 900060004..................................135 900060005..................................135 900060006..................................136 900060007..................................136 900060008..................................136 900060009..................................136 900060010..................................137 900060011..................................137 900060012..................................137 900060013..................................137 900060014..................................137...
  • Page 11 900060036..................................145 900060037..................................146 900060038..................................147 900060039..................................148 900060040..................................148 900080001..................................149 900080002..................................149 900080003..................................150 900080004..................................150 900080005..................................150 900080006..................................151 900080007..................................151 900080008..................................151 900080009..................................152 900080010..................................152 900080011..................................152 900080012..................................152 900080013..................................153 900080014..................................153 900080015..................................153 900080016..................................153 900080017..................................153 900080018..................................154 900080019..................................154 900080020..................................154 900080021..................................
  • Page 12 900100024..................................167 900100025..................................167 900100026..................................167 900100027..................................167 900100028..................................168 900100029..................................168 900100030..................................168 900100031..................................169 900100032..................................169 900110001..................................169 900110002..................................170 900110003..................................170 900110004..................................171 900110005..................................172 900120001..................................173 900120002..................................173 900120003..................................174 900120004..................................174 900120005..................................175 900130001..................................176 900130002..................................176 900130003..................................177 900130004..................................177 900130005..................................177 900130006..................................178 900130007..................................178 900130008..................................178 900130009..................................178 900130010..................................
  • Page 13 900160012..................................187 900160013..................................187 900160014..................................188 900160015..................................188 900160016..................................188 900160017..................................189 900160018..................................189 900160019..................................190 900160020..................................190 900160021..................................190 900160022..................................191 900160023..................................191 900160024..................................191 900160100..................................192 900160102..................................192 900160101..................................192 900170001..................................192 900170002..................................193 900180001..................................193 900180002..................................193 900180003..................................193 900180004..................................
  • Page 14 920100004..................................203 920100005..................................203 920100006..................................204 920100007..................................204 920100008..................................205 920100009..................................205 930100000..................................205 930100001..................................205 930100002..................................206 930100003..................................206 930100004..................................206 930100005..................................207 930100006..................................207 940100001..................................207 940100002..................................208 Contents...
  • Page 15: Chapter 1: Introduction To This Guide

    The Dell Technologies Support site (https://www.dell.com/support) contains important information about products and services including drivers, installation packages, product documentation, knowledge base articles, and advisories. A valid support contract and account might be required to access all the available information about a specific Dell Technologies product or service.
  • Page 16 ● Local phone numbers for a specific country or region are available at https:// www.dell.com/support/incidents-online/en-us/contactus/product/isilon-onefs. PowerScale OneFS Documentation Info ● https://www.dell.com/support/kbdoc/en-us/000152189/powerscale-onefs-info- Hubs hubs Dell Community Board for self-help ● https://www.dell.com/community Introduction to this guide...
  • Page 17: Chapter 2: Introduction To System Events

    Introduction to system events This section contains the following topics: Topics: • Events overview • Event groups overview • Alerts overview • Alert channel overview • Viewing and modifying event groups • Managing alerts • Managing channels • Managing event thresholds •...
  • Page 18: Alert Channel Overview

    You can adjust the thresholds at which certain events raise alerts. For example, by default, OneFS generates an alert when a disk pool is 95% full. You can adjust that threshold to a lower percentage. You can configure your cluster to generate alerts only for specific event groups, conditions, severity, or during limited time periods.
  • Page 19: Managing Alerts

    ● You can search for relevant event groups by entering the search string in the search box. 2. In the Actions column of the event group that contains the event you want to view, click View event details. 3. In the new window, click +See event instance details to expand the list of events. 4.
  • Page 20: Modify Alerts By Event Type Id

    Modify alerts by event type id You can suppress or un-suppress one or more event type ID depending on its current state. 1. Click Cluster Management > Events and Alerts > Alert Management. 2. In the Actions column of the event type ID you want to modify, click Suppress or Un-suppress. You can perform an action on multiple event type IDs by selecting the check box next to the event type ID of the alerts you want to change, then selecting an action from the Select a bulk action list.
  • Page 21: Delete An Alerting Rule

    Delete an alerting rule You can delete an existing alerting rule. 1. Click Cluster Management > Events and Alerts > Alert Management. 2. In the CELOG alerting area, click the Alerting rule tab. 3. In the Actions column of the alerting rule, click Edit rule. The Edit alert rule window appears.
  • Page 22: Modify A Channel

    k. In the Allowed nodes field, type the node number of a node in the cluster that is allowed to send alerts through this channel. To add another allowed node to the channel, click Add another Node. If you do not specify any nodes, all nodes in the cluster are considered as allowed nodes.
  • Page 23: Delete A Channel

    In the Allowed nodes field, type the node number of a node in the cluster that is allowed to send alerts through this channel. To add another allowed node to the channel, click Add another node. If you do not specify any nodes, all nodes in the cluster are considered as allowed nodes.
  • Page 24: Maintenance And Testing

    Maintenance and testing You can modify event settings to specify retention and storage limits for event data, schedule maintenance history windows, and send test events. Event data retention and storage limits You can modify settings to determine how event data is handled on your cluster. By default, data related to resolved event groups is retained indefinitely.
  • Page 25: Gathering Cluster Logs

    Gathering cluster logs You can gather cluster logs and send the logs to PowerScale Technical Support for analysis. Cluster logs can be sent automatically or manually through the cluster command-line and web administration interfaces. NOTE: Your cluster must be connected to the internet to be able to send log files directly. In newer versions of OneFS, you must also have remote support and SRS enabled.
  • Page 26 3. Open an FTP client. 4. Enter the following settings to connect to the FTP server: Host: ftp.emc.com User name: anonymous Password: your email address 5. Change the destination directory to incoming. 6. Upload the log file. Introduction to system events...
  • Page 27: Chapter 3: Notifications

    Notifications This section contains the following topics: Topics: • Event notifications Event notifications Event notifications enable you to determine which system events are sent to you. By default, OneFS is configured to log and send all critical and emergency events to PowerScale Technical Support. Additionally, you can configure event notification rules and receive notification when an event is logged on your cluster.
  • Page 28 Create an event notification rule You can configure event notification rules based on specified events and event types. You can configure email notification and SNMP trap generation for a specific event. 1. Click Cluster management > Events and alerts. 2. In the Alerting rule tab on the Alert management page, click Create alert rule. 3.
  • Page 29: Managing Event Group Notification Settings

    Managing event group notification settings You can view and modify event group notification settings and configure batch notifications. Event group notification settings You can specify whether you want to receive event notifications as aggregated batches or as individual notifications for each event.
  • Page 30: Chapter 4: Software Events

    Software events This section contains the following topics: Topics: • Software events overview • 100010001 • 100010002 • 100010003 • 100010004 • 100010005 • 100010006 • 100010007 • 100010008 • 100010009 • 100010010 • 100010011 • 100010012 • 100010013 • 100010014 •...
  • Page 31 • 100010050 • 100010051 • 100010052 • 100010053 • 100010054 • 100010055 • 100010056 • 100010057 • 100010058 • 100010059 • 100010060 • 100010061 • 100010062 • 100020060 • 100020061 • 100020062 • 100020063 • 100030001 • 200010001 • 200010002 •...
  • Page 32 • 400030001 • 400030002 • 400040001 • 400040002 • 400040003 • 400040004 • 400040007 • 400040009 • 400040010 • 400040011 • 400040012 • 400040014 • 400040015 • 400040017 • 400040018 • 400040019 • 400040020 • 400040021 • 400040022 • 400040023 •...
  • Page 33 • 400100006 • 400100007 • 400100008 • 400100009 • 400100010 • 400100011 • 400110001 • 400120001 • 400130001 • 400130002 • 400140001 • 400140002 • 400140003 • 400150001 • 400150002 • 400150003 • 400150004 • 400150005 • 400150006 • 400150007 •...
  • Page 34 • 400260000 • 500010001 • 500010002 • 500010003 • 500010004 • 500010005 • 600010001 • 600010002 • 600010003 • 600010004 • 600010005 • 700010001 • 700010003 • 700010004 • 700010005 • 700020001 • 700020002 • 700020003 • 700030001 • 700030002 •...
  • Page 35: Software Events Overview

    00018985 on the Dell EMC Online Support site. ● If you are storing temporary files in the /var/crash partition, move them to a larger capacity directory. If you need to keep the files in the /var/crash partition for some time, quiet the event.
  • Page 36 If a drive was not recently replaced, make sure that the drive firmware is updated to the latest version. Drive firmware is available from the Dell EMC Online Support site. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to...
  • Page 37 100010006 A drive logged an error or a change in the disk subsystem. Administrator action If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 38 Dell EMC Online Support site. ● If the Flexprotect job failed, contact Dell EMC PowerScale Technical Support for additional troubleshooting. 100010012 The disk has stalled and the disk health is being evaluated.
  • Page 39 Administrator action If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs. 100010015 One of the disk pools on your cluster is nearing, or has reached, maximum capacity. Description If the cluster is too close to maximum capacity there might be insufficient space to restripe data in the event of a hardware failure, which could put your data at risk.
  • Page 40 BER error messages and event notifications. Administrator action Verify that the node firmware and drive firmware are up to date. The latest firmware packages can be downloaded from the Dell EMC Online Support site. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to...
  • Page 41 BER error messages and event notifications. Administrator action Verify that the node firmware and drive firmware are up to date. The latest firmware packages can be downloaded from the Dell EMC Online Support site. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to...
  • Page 42 Verify that the drive support package and drive firmware are up to date. You can download the latest drive support packages and drive firmware from the Dell EMC Online Support site. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to...
  • Page 43 This event indicates that an unsupported drive was installed in the node. Administrator action Remove the unsupported drive from the affected node and contact Dell EMC PowerScale Technical Support. You do not need to smartfail the drive as the drive was unable to format.
  • Page 44 Administrator action We recommend that you destroy the drive data according to your company's security regulations. 100010032 A used drive from another cluster was inserted as a replacement. Description A drive that was installed as a replacement was used previously in a different cluster. The event message provides you with the current node, bay, location, drive type, and Logical Number (LNUM) of the drive.
  • Page 45 100010035 A drive that previously failed was inserted as a replacement. Description A drive that previously failed was installed as a replacement. The event message provides you with the current node, bay, location, drive type, and Logical Number (LNUM) of the drive. Administrator action Replace the failed drive with a new drive according to the instructions in the PowerScale Drive Replacement Guide for your platform.
  • Page 46 Description The non-supported boot flash drive must be replaced. Boot flash drives are not customer-replaceable parts. If you are unsure whether a potential replacement boot flash drive is supported, contact Dell EMC PowerScale Technical Support. Administrator action Contact Dell EMC PowerScale Technical Support to diagnose and resolve the issue.
  • Page 47 100010045 A node boot flash drive is receiving excessive writes. Description This event is only generated during troubleshooting by Dell EMC PowerScale Technical Support. Administrator action Contact Dell EMC PowerScale Technical Support to diagnose and resolve the issue. Software events...
  • Page 48 100010046 A node pool has a node whose SSD count does not match the SSD counts of other nodes in the pool. Description A mismatch of the number of SSDs in nodes in a pool can be caused by an SSD failure in one of the nodes in the pool or, in the case of Generation 6 hardware, nodes do not contain the same number of cache SSDs and L3 cache is not enabled for the node pool.
  • Page 49 100010052 A drive is no longer appearing as part of the cluster and is being smartfailed. Description A drive appears to be missing from the cluster and the smartfail process has been initiated to officially remove the drive from the cluster. The event message provides you with the chassis serial number, node, sled, the drive slot number within the sled, drive type, and Logical Number (LNUM) of the drive.
  • Page 50 100010055 A drive that is write-cache enabled was installed in a Generation 6 platform. Write-cache enabled drives are not compatible with Generation 6 nodes and the drive has been smartfailed. Description A write-cache enabled drive was installed in a 6th Generation node and is not compatible with the node. The drive was smartfailed.
  • Page 51 100010058 A node pool does not meet the minimum storage space requirement for large files: {node_pool_name} (id={node_pool_id}) Description The event indicates that the Large File feature was previously enabled, but the node pool does not meet the minimum storage space capacity requirement. Administrator action If the event persists, gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 52 Administrator action Replace the failed drive. 100010062 A PCI link error is detected with NVMe drive connectivity. Description A PCI switch or bay link for NVMe drive connectivity is unhealthy and may require maintenance or replacement. Administrator action Determine which PCI link is unhealthy. The PCI link location will be specified in the event. Some errors may be recoverable by reseating cards or cables, but other errors may require replacement of hardware.
  • Page 53 100020062 A fault was detected in a drive sled. Description A drive sled has failed and must be replaced. The event message provides you with the failed drive sled and the node that the sled is a part of. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 54 Administrator action If the Cluster Status page in the OneFS web administration interface indicates that a node is down, complete the following steps. 1. Determine whether the node is turned on. Visually inspect the node to verify that the power light is on. 2.
  • Page 55 200010006 The identified node group is underprotected from data loss. Description The OneFS protection system relies on a particular number of drives and nodes being available for the pool, depending on your settings. If this requirement is not met, your data could be at risk. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 56 Administrator action 1. Locate the nodes that are no longer associated with the underprovisioned node pool. a. If a node became unprovisioned, you will see that node identified in a separate event, 200010007. You can attempt to provision the node back into the node pool by running the following command: isi_evaluate_provision_drive b.
  • Page 57 200020002 The 10 GigE interfaces on one or more nodes have experienced network connectivity issues. Administrator action Determine whether the issue is related to the cable or the node. Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps.
  • Page 58 200020004 One or more nodes have experienced network connectivity issues on their aggregated network interfaces. Administrator action Determine whether the issue is related to the cable or the node. Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps.
  • Page 59 200020006 The link status of the identified InfiniBand interface is changing rapidly and repeatedly. Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps. 1.
  • Page 60 Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 200020009 A power supply has failed in an internal network switch. Description The power supply is not a replaceable part. You will need to schedule a maintenance window to replace the switch. The event message provides you with the switch serial number and the network fabric supported by the switch.
  • Page 61 2. If the cable is connected securely, plug the cable into a different node that has a network port functioning at full speed and that has an identical network configuration. When you plug the cable into the other node, leave the other end of the cable plugged into the same switch port.
  • Page 62 ● If the issue persists after moving the cable to another port on the switch, review the switch logs and consult your switch user manual. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 63 The Dell switch has a cabling issue. Description The alert dynamically reports that the Dell switch is mis-cabled, or has another cabling issue. Administrator action 1. Run the isi event view <ID> command to determine the problem and the specific cabling issues based on the output.
  • Page 64 downlink bandwidth = total bandwidth between leaf and all Isilon nodes. This indicates the uplink bandwidth does not equal the downlink bandwidth, causing an imbalance in the fabric and subsequent bottleneck. Administrator action Correct wiring to ensure uplink bandwidth is equal to downlink fabric for a single fabric (int-a and/or int-b as specified). 200020024 Fabric bandwidth incongruence.
  • Page 65 200030001 The cluster does not have up-to-date firmware. Description The installed node firmware package has not been applied to all nodes in the cluster. The cluster requires a firmware upgrade. Administrator action To determine the recommended update schedule, run the isi upgrade firmware assess command. 200030002 The cluster does not have the available firmware packages.
  • Page 66 300010003 The node has failed to reboot within the specified time period. One or more nodes are offline due to one of the following conditions: ● A node was intentionally shut down for maintenance. ● A node lacks internal network connectivity. Internal connectivity is how a node communicates with other nodes on the cluster.
  • Page 67 300020003 The node encountered an error performing final shutdown. Administrator action Attempt to shut down the node again. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 68 400040001 A SyncIQ policy issue was detected. Administrator action If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs. 400040002 A SyncIQ policy failed. Administrator action This event provides information about the specific policy that has failed and information about possible causes.
  • Page 69 2. Confirm that the number of snapshots does not exceed the system-wide and directory limits. (The system-wide limit is 20,000 and the directory limit is 1,000.) If the number of snapshots is in excess of limitations, delete extraneous snapshots according to the instructions in the OneFS CLI Administration Guide OneFS Web Administration Guide.
  • Page 70 400040010 A SyncIQ policy configuration error occurred. Description This event varies based on the following possible errors that are included in the event message: ● The SyncIQ module detected a problem with the configuration of the policy. ● The SyncIQ policy target path overlaps the target path of another policy. ●...
  • Page 71 ● SyncIQ is unable to connect to a resource on the target cluster. SyncIQ is unable to connect to a local resource. ● SyncIQ is unable to connect to a daemon (bandwidth, throttle, pworker, or scheduler). Administrator action 1. If SyncIQ is unable to connect to a resource on the target cluster: ●...
  • Page 72 ● SyncIQ encountered a file system error on the source cluster. ● SyncIQ encountered a file system error on the target cluster. NOTE: If you are also experiencing events about hardware failure, investigate the cause of those events before attempting the following actions.
  • Page 73 400040020 The Recovery Point Objective (RPO) was exceeded for a SyncIQ policy. Description A replication job failed to complete within the time period specified by the SyncIQ policy. Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps.
  • Page 74 400040023 SyncIQ encountered an error during service export. Administrator action Review the report for additional information about the service export errors. 400040024 A SyncIQ policy detected unsupported WORM settings on the target. Description If there are unsupported WORM settings on the target, OneFS generates a report that provides details about the policy settings.
  • Page 75 400050001 This event was generated as a test. Description A user requested that a test event be generated, either through the OneFS web administration interface or by running the following command: isi event test create Administrator action This message is informational. No action is required. 400050002 This event was generated as a test.
  • Page 76 400060001 The AVScan service is enabled, but a URL to an antivirus ICAP server has not been entered. Administrator action 1. Perform one of the following tasks: ● If you do not intend to configure a supported ICAP antivirus server, disable the antivirus service through the OneFS web administration interface.
  • Page 77 Administrator action Configure one or more external anti-virus servers, and then check the anti-virus service. 400060102 All CEE/CAVA anti-virus servers disabled. Description The anti-virus service on the cluster has stopped because no external anti-virus servers have been enabled. Administrator action Enable one or more external anti-virus servers, and then check the anti-virus service.
  • Page 78 400060106 The CEE/CAVA server is offline. Description One node reports that anti-virus server is offline. Administrator action Verify that the anti-virus server is connected to the node and that the server is operational. 400060107 All access zones have CEE or CAVA anti-virus disabled. Description The anti-virus service is disabled on all access zones in the cluster.
  • Page 79 400060110 The anti-virus IP Pool is missing or is misconfigured. Description The anti-virus IP Pool is misconfigured or missing. Anti-virus scanning cannot occur until the IP Pool is properly configured. Administrator action Ensure that the CAVA pool is properly configured, and that it displays in the isi antivirus cava settings view command output.
  • Page 80 400070004 An evaluation license for a OneFS software module is scheduled to expire soon. Administrator action To purchase the software module before the evaluation license expires, contact your sales representative. 400070005 An evaluation license for a OneFS software module has expired. Administrator action To purchase the software module, contact your sales representative.
  • Page 81 400080001 A firmware upgrade has failed: {msg} Administrator action Attempt to reapply the firmware by running the following command from the node that reported the error: isi upgrade firmware start --nodes-to-upgrade <local-node-lnn> If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 82 400090004 The cluster lost connection to the Secure Remote Support (SRS) gateway server. Description In order for the cluster to activate SRS services, OneFS must communicate with the SRS gateway server. This event might be the result of the following issues: ●...
  • Page 83 400100004 The job failed. Administrator action This message is informational. No action is required. 400100005 A job policy event occurred. Administrator action This message is informational. No action is required. 400100006 Job {job_type} failed to start as scheduled. Administrator action Another instance of the job is still running.
  • Page 84 400100009 One or more nodes have been excluded from participating in this job. Description The job engine was configured to exclude one or more nodes from participating in this job. Administrator action This message is informational. No action is required. 400100010 One or more nodes that do not exist have been excluded from participating in this job.
  • Page 85 Administrator action Boot disks are not customer-replaceable components. Contact Dell EMC PowerScale Technical Support. 400130001 The NFS export rules are configured in such a way that the client cannot mount the path.
  • Page 86 Administrator action 1. Verify that the idmap.conf file is not missing or corrupted on the client. 2. Compare the value for domainstring in the idmap.conf file and the value for the cluster NFSv4 domain name that is returned from the following cluster sysctl command: isi_for_array -s sysctl vfs.nfsrv.nfsv4.idmap_replacedomain 3.
  • Page 87 400150004 A step in the OneFS upgrade process is taking longer than expected. Administrator action Confirm that the upgrade is still making progress. If you feel that the upgrade process has discontinued, contact Dell EMC PowerScale Technical Support. 400150005 The rollback of a OneFS upgrade started.
  • Page 88 400150006 Agent not ready. Description The Agent is commanded by the Supervisor to execute a hook or command, but if the node does not have superblock quorum and quorum, the Agent will not start the hook. If the hook does not start, the upgrade Supervisor re-sends the command at periodic intervals indefinitely.
  • Page 89 Upgrade Alert - disk pool db and LKF failure domain mismatch - unable to reboot nodes without potential client disruption. Administrator action If a parallel OneFS upgrade is stalled, and the nodes are stuck at the PendingReboot hook, contact Dell EMC PowerScale Technical Support. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For...
  • Page 90 400150012 Error installing HEALTHCHECK patch Description There was an error while installing the healthcheck patch. Administrator action Determine if you are at the most recent patch level and upgrade if necessary. 400151001 No secure image was found for use in expanding the cluster. Description An install image that matches the current committed version is required in order for nodes to be joined to the cluster.
  • Page 91 400160002 Audit System cannot provide service. Description The audit system cannot provide service. Administrator action Verify that the audit services is enabled by running: isi services -a isi_audit_d enable. 400160005 Audit daemon failed to persist one or more events. Description The audit daemon failed to save one or more events.
  • Page 92 400180001 Inline dedupe allocation failed on node {lnn}, occurrence {occurrence} Description Memory allocation failed for the inline dedupe index. Administrator action 1. Disable inline dedupe. 2. Free up memory by running the isi_flush command. 3. Enable inline dedupe. 4. If the issue does not resolve, restart the node. 400180002 Inline dedupe allocation in progress on node {lnn}, occurrence {occurrence} Description...
  • Page 93 Check the paths in the SmartDedupe configuration and verify that they exist: ● If the directory exists but is not accessible then raise contact Dell EMC Technical support. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 94 400200001 Security verification check failed Administrator action This message is informational. No action is required. 400200002 Security verification check ran successfully Administrator action This message is informational. No action is required. 400210001 The encryption key manager for self-encrypting drives (SED) is unable to start on the indicated node. Administrator action Do not reboot the node.
  • Page 95 Administrator action Check KMIP configurations and logs on the KMIP server. 400210005 Network error occurred when reaching KMIP server. Administrator action Check the network configuration and verify that all nodes can connect to the key management interoperability protocol (KMIP) server. 400210006 KMIP Key Migration failed.
  • Page 96 Correct the configuration changes as per the reported error. 400240000 S3 Service failed to start. Administrator action Check cluster status. If the cluster is healthy, but S3 is failing, contact Dell EMC technical support. 400240001 Identity query failed user=1000 to name status=STATUS_ACCESS_DENIED. Description S3 failed to resolve name from ID.
  • Page 97 S3 could not parse mpu info for bucket id : 123456. Upload Id 987654. SBT may be broken. Description Could not parse mpu info for bucket id. Administrator action Contact Dell EMC technical support to open a Service Request (SR). 400240004 S3 key in SBT is invalid. SBT may be broken. Current Basekey = a/b/c. Description S3 key in SBT is invalid.
  • Page 98 400250000 Noncompatible, user-specified patches were found, and ignored. Description The specified patches have conflicts with the embedded patches in the prepatched image. The conflicts were ignored, and the upgrade continued. Administrator action This event is informational, no action is required. 400260000 PW account was updated.
  • Page 99 ● The mail server or the authentication server is down. ● A quota address mapping rule is configured incorrectly. Administrator action Review the settings for quota notification rules and correct any apparent errors. For information about configuring SmartQuotas, see the OneFS Web Administration Guide.
  • Page 100 Administrator action 1. Determine whether a node is in the minority or majority group, by running the following command from the node that is reporting the error: sysctl efs.gmp.has_quorum ● If the command returns 0, the error occurred on the minority group. The message might continue until the cluster is healthy.
  • Page 101 600010003 The snapshot daemon failed to remove a snapshot lock. Description The system cannot remove an expired snapshot lock. This error can occur when a disk is unwritable. If the cluster is split, this error might occur on the minority group, or the group that contains fewer than half of the nodes. In this case, the message persists until the cluster is healthy, and you can safely ignore the error.
  • Page 102 700010001 The cluster time differs from the Windows Active Directory server. Description The timestamp between one or more nodes and the Active Directory host differs by at least five minutes. The time discrepancy can result in authentication failures due to mismatches with the Kerberos ticket timestamp. Mismatched clock settings might be a result of one of the following: ●...
  • Page 103 700010005 An authentication upgrade failure has occurred. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 700020001 The Windows UID map range is full. Authentication might fail until the range is increased. Description The user ID (UID) range for mapping Microsoft Active Directory groups has run out of IDs and must be expanded.
  • Page 104 700020003 The Windows networking service failed to parse idmap rules. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 700030001 The Active Directory account data that is stored on the cluster was deleted or damaged. Administrator action Unjoin and rejoin the Active Directory domain.
  • Page 105 700030003 The node cannot perform read or write operations on the authentication database files. Description This event typically appears when a node does not have quorum or otherwise cannot access the authentication database files. This event also sometimes appears when a node starts up. In that case, the event frequently resolves itself within five minutes. If the event resolves itself, no action is required.
  • Page 106 2. Repair any missing SPNs by running the following command: isi auth ads spn fix <provider-name> If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 107 ● The first event states that an IDI was detected and that the system is attempting to resolve the issue through the DSR process. This event is sent with a Critical severity level. ● The second event provides information that will assist Dell EMC PowerScale Technical Support with the debugging process. This event is sent with an Info severity level.
  • Page 108 1. Identify a list of the process types with the largest number of file descriptors by running the following command from the OneFS command-line interface: fstat|awk '{ print $2 }'|sort|uniq -c|sort -rn|head 2. Contact Dell EMC PowerScale Technical Support and provide the output from the command. 800010007 An Isilon Data Integrity (IDI) network checksum error was detected.
  • Page 109 Administrator action Contact Dell EMC PowerScale Technical Support to determine if one of the components or motherboard must be replaced on the node or nodes. 800010008 The NVRAM journal is larger than the journal backup partition. Description The system has resized the journal partition.
  • Page 110 1100000001 A CloudPools network connection failed. Description The network connection for the specified account failed and CloudPools is unable to access files in the cloud provider. Administrator action This event might be the result of internal or external network issues. Check to make sure that internal network connections are healthy, then make sure you are able to reach the cloud provider.
  • Page 111 1100000004 An Amazon S3 telemetry reporting bucket was not found. Description CloudPools attempted to access usage reports from an Amazon S3 cloud provider. The S3 telemetry reporting bucket, where usage reports are stored, could not be found. Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps.
  • Page 112 Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 1100000007 CloudPools no usable account found. Description CloudPools is unable to write data to the cloud storage account because it is disabled. Administrator action Confirm that the account is disabled by running the following command: isi cloud accounts view <id>...
  • Page 113 Hardware events This section contains the following topics: Topics: • Hardware events overview • 900010001 • 900010002 • 900010003 • 900010004 • 900010005 • 900010006 • 900010007 • 900010008 • 900010009 • 900010010 • 900010011 • 900010012 • 900010013 • 900020001 •...
  • Page 114 • 900020033 • 900020034 • 900020035 • 900060001 • 900060002 • 900060003 • 900060004 • 900060005 • 900060006 • 900060007 • 900060008 • 900060009 • 900060010 • 900060011 • 900060012 • 900060013 • 900060014 • 900060015 • 900060016 • 900060017 •...
  • Page 115 • 900080014 • 900080015 • 900080016 • 900080017 • 900080018 • 900080019 • 900080020 • 900080021 • 900080022 • 900080023 • 900080024 • 900080025 • 900080026 • 900080027 • 900080028 • 900080029 • 900080030 • 900080031 • 900080032 • 900080033 •...
  • Page 116 • 900130006 • 900130007 • 900130008 • 900130009 • 900130010 • 900130011 • 900130013 • 900130014 • 900130015 • 900140001 • 900140002 • 900140003 • 900140004 • 900140005 • 900150001 • 900160001 • 900160002 • 900160003 • 900160004 • 900160005 •...
  • Page 117 • 900180013 • 900180014 • 900180015 • 900180016 • 900180028 • 900180029 • 900180030 • 900180031 • 900180032 • 910100001 • 910100002 • 910100003 • 910100004 • 910100005 • 910100006 • 910100007 • 920100000 • 920100001 • 920100002 • 920100003 •...
  • Page 118 ● For legacy hardware follow the instructions to initiate a battery test by following the instructions in Understanding Isilon node battery testing, article 000016079. ● For NL410, X210, S210, X410, and HD400 nodes, contact Dell EMC PowerScale Technical Support for troubleshooting. 900010003 There is an issue with the NVRAM card. Administrator action Reboot the node.
  • Page 119 900010006 A memory, PCI, or PCIe bus error has occurred in the node. Administrator action Troubleshooting is required to determine if a hardware component must be replaced. Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs.
  • Page 120 This event is generated if the model number of the chassis is not properly updated after replacing a node chassis. Administrator action Contact Dell EMC PowerScale Technical Support to update the model number of the node. 900010011 The Baseboard Management Controller (BMC) or Chassis Management Controller (CMC) are unresponsive.
  • Page 121 Administrator action Contact Dell EMC PowerScale Technical Support for a potential suitcase replacement. 900010013 The firmware update failed for the specified device. Administrator action Attempt to update the firmware again by running the following command from the node that reported the event: isi firmware update --local If the event persists, gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 122 900020002 A power supply fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm. Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406...
  • Page 123 900020005 A power supply fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm. Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406...
  • Page 124 900020008 A chassis fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm. Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406...
  • Page 125 900020012 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900020013 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 126 900020017 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900020018 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 127 900020022 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900020023 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 128 Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps. ● (HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. ●...
  • Page 129 900020028 The internal or ambient temperature around a node has exceeded the allowable thresholds for a power supply. Description Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted.
  • Page 130 ● Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way. ● Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue.
  • Page 131 If a node is subjected to high temperatures for an extended period of time, the CPU is throttled and the node goes into read only-mode to help prevent potential data loss due to component failure. If the node temperature reaches critical levels, it is possible that the node will shut down entirely.
  • Page 132 If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs. 900020033 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed.
  • Page 133 900020034 The node is reporting less than the expected amount of physical memory. Description This event typically appears because a DIMM has failed, is poorly seated, or an incorrect type of DIMM is installed. Administrator action Contact Technical Support to determine if a DIMM replacement is required. 900020035 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU.
  • Page 134 900060001 A sensor in the front panel of a node has exceeded the specified threshold. Description This event can occur intermittently without harm to the system. Administrator action 1. Cancel or quiet the event. 2. If the event recurs, shutdown and restart the node by completing the following steps: ●...
  • Page 135 900060003 A power supply fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm. Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406...
  • Page 136 900060006 A chassis fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm. Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406...
  • Page 137 900060010 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900060011 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 138 900060015 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900060016 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 139 900060020 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900060021 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU. Description Ambient temperature is only measured by front panel sensors.
  • Page 140 If a node is subjected to high temperatures for an extended period of time, the CPU is throttled and the node goes into read only-mode to help prevent potential data loss due to component failure. If the node temperature reaches critical levels, it is possible that the node will shut down entirely.
  • Page 141 If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs. 900060024 The internal or ambient temperature around the front panel of a node has exceeded the allowable threshold. Description Ambient temperature is only measured by front panel sensors.
  • Page 142 ● (HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. ● Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center.
  • Page 143 900060027 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed. Administrator action Perform the following steps in the order listed.
  • Page 144 900060028 The node is reporting less than the expected amount of physical memory. Description This event typically appears because a DIMM has failed, is poorly seated, or an incorrect type of DIMM is installed. Administrator action Contact Technical Support to determine if a DIMM replacement is required. 900060029 A voltage component is out of specification.
  • Page 145 900060033 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900060034 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 146 900060037 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed. Administrator action Perform the following steps in the order listed.
  • Page 147 900060038 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed. Administrator action Perform the following steps in the order listed.
  • Page 148 900060039 The internal or ambient temperature around a node has exceeded the allowable thresholds for a power supply. Description Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted.
  • Page 149 ● Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way. ● Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue.
  • Page 150 Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406 to determine if this event is a false alarm. If this event is not a false alarm, contact Technical Support. 900080003 A power supply fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm.
  • Page 151 Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406 to determine if this event is a false alarm. If this event is not a false alarm, contact Technical Support. 900080006 A chassis fan in the node might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm.
  • Page 152 Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406 to determine if this event is a false alarm. If this event is not a false alarm, contact Technical Support. 900080009 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 153 900080013 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900080014 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 154 900080018 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900080019 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 155 Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps. 1. Confirm that both power cables are properly connected to the node. 2. View the LED lights on the power supplies and confirm the status of the power supply: Power status Node type Steady green...
  • Page 156 2. View the LED lights on the power supplies and confirm the status of the power supply: Power status Node type Steady green Good All nodes Blinking green Good, but the node is currently 36000X, 3600NL, 72000X, 72000N powered down Steady amber Good, but the node is currently X-Series, S-Series...
  • Page 157 Power status Node type Blinking green Good, but the node is currently 36000X, 3600NL, 72000X, 72000N powered down Steady amber Good, but the node is currently X-Series, S-Series powered down Blinking amber A power supply failure has occurred X-Series, S-Series No light Insufficient or no A/C power All nodes...
  • Page 158 Power status Node type Blinking amber A power supply failure has occurred X-Series, S-Series No light Insufficient or no A/C power All nodes 3. If only one node reports the issue, determine the cause of the problem by performing the following steps. CAUTION: Do not move the power cable to another power supply in the same node as this will cause the node to lose power.
  • Page 159 If the steps above were unsuccessful in clearing this event, the subsystem that monitors the health of the hardware (such as the temperature and fan speeds) might have encountered a problem. This event can occur intermittently without harm to the system and you can safely quiet the event unless the issue persists.
  • Page 160 Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps. ● (HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. ●...
  • Page 161 900080030 The internal or ambient temperature around a node has exceeded the allowable threshold. Description Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted. If a node is subjected to high temperatures for an extended period of time, the CPU is throttled and the node goes into read only-mode to help prevent potential data loss due to component failure.
  • Page 162 ● Verify that air flow within the rack, and through the front and rear panel vents of the node, is not obstructed in any way. ● Make sure that the faceplate on the affected node is installed, properly seated, and undamaged. In some cases, removing and re-seating the faceplate will resolve this issue.
  • Page 163 Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps. 1. Confirm that both power cables are properly connected to the node. 2. View the LED lights on the power supplies and confirm the status of the power supply: Power status Node type Steady green...
  • Page 164 900080035 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU. Description Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted.
  • Page 165 900100001 The NVRAM in the indicated node experienced a single-bit error. The error was automatically corrected by ECC. Administrator action If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 166 900100020 The NVRAM card is not responding and the node has been set to read-only. Description Connection between the NVRAM card and the node was lost. In some cases, restarting the node will reload the NVRAM driver and restore the connection. Administrator action Reboot the node.
  • Page 167 900100024 The NVRAM card firmware reported an uncorrectable error. Administrator action Reboot the node. If the event clears and does not recur, no other action is required. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 168 900100028 NVDIMM has lost persistence in the chassis ({chassis}). To protect the journal, setting the node to read-only. Description This event happens when NVDIMM detects something in its subsystem that will not allow it to persist DRAM through a power-loss event. This could be related to the DRAM itself or the battery backup unit (BBU). Administrator action The node is placed into a read-only state to protect the journal.
  • Page 169 NVDIMM subsystem health is not being monitored. Node will transition to read-only mode until the issue has been resolved. Causes: Dell PowerTools is not responding to service requests. Note: The monitoring subsystem will attempt several times before going read-only and triggering this event. Currently it is set to 100 seconds.
  • Page 170 ● (HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. ● Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center.
  • Page 171 Administrator action Contact Technical Support to determine if a DIMM replacement is required. 900110004 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed.
  • Page 172 900110005 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed. Administrator action Perform the following steps in the order listed.
  • Page 173 900120001 The internal or ambient temperature around a node has exceeded the allowable threshold for the CPU. Description Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted.
  • Page 174 4. (All other nodes.) Re-seat the front panel. 5. Move the front panel from a functioning node to the affected node and see if the event clears. 6. Install the front panel from the affected node on another node to determine if the problem is with the front panel or with the node.
  • Page 175 ● Locate the electrical outlet to which the problematic power supply is connected, and then determine if the outlet is functioning properly by plugging the power cable into a different electrical outlet. ● If the issue is not resolved by using a different electrical outlet, move the power cable from the power supply that reports the failure to the power supply of a node that does not report a failure.
  • Page 176 CAUTION: Do not switch power supplies in the same node as this will cause the node to lose power. ● If the issue follows the power supply, the power supply must be replaced. 5. If multiple nodes report power supply issues, it is likely that the issue is environmental. Check each of the following items to confirm the health of the power subsystem: ●...
  • Page 177 Administrator action 1. Cancel or quiet the event. 2. If the event recurs, shutdown and restart the node by completing the following steps: ● Connect to the affected node through SSH or serial cable. ● Shut down the node by running the following command: shutdown -p now ●...
  • Page 178 900130006 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900130007 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 179 Administrator action Follow the instructions in Event notification: Fan speed out of spec, article 000083406 to determine if this event is a false alarm. If this event is not a false alarm, contact Technical Support. 900130011 A power supply fan in the might have failed. Description If the fan speed temporarily falls out of optimal range for less than a minute or so and the event does not repeat, this event might be a false alarm.
  • Page 180 If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs. 900130014 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed.
  • Page 181 900130015 One of the power supplies in a node has failed or lost power. Description It is possible that a power cable was unplugged during recent maintenance or the circuit supplying power to the affected power supply has failed. Administrator action Perform the following steps in the order listed.
  • Page 182 900140001 A node reported a voltage measurement event group. Administrator action Reboot the node. If the event clears and does not recur, no other action is required. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 183 ● Review the temperature statistics for the affected sensor, which are included in the event. If the temperature is consistently elevated, the problem is likely a high ambient temperature in the data center. Address any changes in the cluster environment such as air conditioning outages. ●...
  • Page 184 Administrator action Contact Dell EMC PowerScale Technical Support to determine if a replacement node is required. Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900160002 A node cannot connect to its peer node.
  • Page 185 Administrator action Contact Dell EMC PowerScale Technical Support to determine if a replacement DIMM is required. Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900160005 A node has powered down in response to a thermal issue.
  • Page 186 Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 900160008 Battery Backup Unit fault detected. Description A battery module has failed. The battery module must be replaced. The battery module is not a customer-replaceable part. The event message provides you with the chassis and node slot of the affected node.
  • Page 187 900160011 Internal fault detected. Description An internal fault was detected. Troubleshooting is required to determine if a hardware component must be replaced. The event message provides you with the chassis and node slot of the affected node. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs.
  • Page 188 900160014 A hardware issue was detected with the I2C bus. Description The I2C bus is a controller or bus that carries information from various sensors (for fan speed, power supply voltage, and temperature) in a node chassis. It is not critical to address this issue immediately, but this event will continue to appear until the issue has been addressed. While this event is active, the node will not report correct values for the temperature, fans, or power supply health on the node.
  • Page 189 Do not switch power supplies in the same node pair as this will cause the node to lose power. ○ If the issue follows the power supply, the power supply needs to be replaced. ○ If the issue stays with the node, please contact Dell EMC PowerScale Technical Support for help with troubleshooting the node.
  • Page 190 900160019 Journal protection in the event of a power failure not enabled. Description OneFS failed to enable the subsystem that copies the contents of the local and peer node journals to the M.2 vault card in the event of power loss. If the node loses power unexpectedly, OneFS will not be able to copy the journals.
  • Page 191 1. Confirm that both nodes are cabled correctly and powered up. 2. If the issue persists, contact Dell EMC PowerScale Technical Support to determine if a replacement node is required. Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster...
  • Page 192 2. If the node resets, monitor the node to make sure it successfully rejoins the cluster. 3. If issues with the node persist, contact Dell EMC PowerScale Technical Support to determine if maintenance is required. Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster...
  • Page 193 PowerTools, the iDRAC HTTPS port, and the USB network. To ensure that the hardware components are monitored, contact Dell EMC support as soon as possible. Administrator action The event may clear up automatically. However, if the connectivity problem persists, check the services and contact Dell EMC technical support. 900180002 Failed to communicate with the Internal Dual SD Module (IDSDM).
  • Page 194 900180004 A fan has failed. Description There is a bad, degraded, or missing fan. Administrator action Replace the fan. 900180005 NVDIMM battery has failed. Description There is a bad, degraded, or missing NVDIMM battery. Administrator action Replace the NVDIMM battery. 900180006 NVDIMM battery charge is low and is below an acceptable threshold and that a vault might fail.
  • Page 195 Administrator action Replace the NVDIMM battery by opening the chassis and locating the NVDIMM change LEDs on the black box. The Dell PowerEdge R640 documentation provides complete details on changing the battery. 900180008 DIMM has failed on a node. Description There is a bad, damaged, degraded, or missing DIMM in a node, or the correctable memory error rate is exceeded.
  • Page 196 900180011 The system board sensor {sensor_name} has detected that a component is operating {adj} the recommended temperature range. Description The data center is too cold or too hot. Administrator action ● Check the data center thermostat. ● Ensure that enclosed fans are working correctly. ●...
  • Page 197 900180014 Power supply has lost redundancy. Description One or more power supplies in a redundancy set are experiencing the following issues: ● The power supply unit is incorrectly seated. ● The power cable may be improperly connected, or disconnected. ● The power supply unit has failed. ●...
  • Page 198 900180028 NVDIMM has lost persistence. Setting the node to read-only to protect the journal. Description The NVDIMM detects something in its subsystem that does not allow it to persist DRAM through a power-loss event. The event might be related to the DRAM itself or the BBU (battery backup unit). Administrator action The node is placed into a read-only state to protect the journal.
  • Page 199 NVDIMM subsystem health is not being monitored. The node transitions to read-only mode until the issue has been resolved. Description Dell PowerTools are not responding to service requests. Administrator action To determine the problem, run the isi_hwmon -b IDRACServices command.
  • Page 200 910100003 The internal or ambient temperature around a node has exceeded the allowable threshold. Description Ambient temperature is only measured by front panel sensors. If you receive an event that indicates that the front panel is out of specification, the temperature in your data center might need to be adjusted. If a node is subjected to high temperatures for an extended period of time, the CPU is throttled and the node goes into read only-mode to help prevent potential data loss due to component failure.
  • Page 201 910100006 A voltage component is out of specification. Administrator action Gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, Gathering cluster logs. 910100007 A sensor in the front panel of a node has exceeded the specified threshold Description This event can occur intermittently without harm to the system.
  • Page 202 Administrator action Perform the following steps in the order listed. If the issue resolves after a step, there is no need to complete the subsequent steps. ● (HD400 only) Make sure that the drive drawer is properly shut by sliding it out and re-closing it firmly but carefully. ●...
  • Page 203 If the event does not clear itself when maintenance is complete, or if maintenance is not being performed on the node and the above steps do not resolve the issue, follow the instructions to gather logs, and contact Dell EMC PowerScale Technical Support.
  • Page 204 920100006 A sensor on a node indicates an elevated temperature. Drives are overheating. The node will reboot immediately. Drive power will discontinue in five minutes. Description The node will reboot but will not rejoin the cluster until temperatures are within acceptable thresholds. If not cooled within five minutes, the drives will stay powered down until the inlet temperature is at an acceptable level to restart.
  • Page 205 920100008 One of the drives is overheating. Administrator action Reboot the node. If the event clears and does not recur, no other action is required. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting. For instructions on how to gather cluster logs, see Gathering cluster logs.
  • Page 206 930100002 A sensor is reporting temperature values that are outside expected specifications. Description The event message provides you with the chassis and node slot of the affected node. Administrator action Monitor your cluster for other events that might be related to this event. If the event persists, gather logs, and then contact Technical Support for additional troubleshooting.
  • Page 207 930100005 A sensor is reporting values that are outside expected specifications. Description This event will tell you which sensor is reporting the unexpected values. The event message provides you with the chassis and node slot of the affected node. Administrator action Monitor your cluster for other events that might be related to this event.
  • Page 208 940100002 OneFS {version} is currently running on unsupported nodes (devid(s) {devids}). {msg}. Description OneFS {version} is currently running on the specified nodes in this cluster. Administrator action Contact Technical Support to obtain the supported software version for this hardware. Hardware events...

Table of Contents