Product Guide (.pdf)
Page 3
... ...1 2. EFI-Based SELViewer Task 3 4. BSP Reset Failures 22 8.5.2 FRB2 - SR870BH2 Machine Check Error Handling 7 5.1 Classification of Contents 1. Debug Methodology and Failure Isolation 18 8.1 Memory...18 8.1.1 Memory Debug Methodology 18 8.1.2 Memory Component Isolation 18 8.2 Processor...19 8.2.1 Processor Debug Methodology 19 8.2.2 Processor Component Isolation 19 8.3 Processor - Late Self-test 20 8.3.1 Late Self-test Display 20...
... ...1 2. EFI-Based SELViewer Task 3 4. BSP Reset Failures 22 8.5.2 FRB2 - SR870BH2 Machine Check Error Handling 7 5.1 Classification of Contents 1. Debug Methodology and Failure Isolation 18 8.1 Memory...18 8.1.1 Memory Debug Methodology 18 8.1.2 Memory Component Isolation 18 8.2 Processor...19 8.2.1 Processor Debug Methodology 19 8.2.2 Processor Component Isolation 19 8.3 Processor - Late Self-test 20 8.3.1 Late Self-test Display 20...
Product Guide (.pdf)
Page 10
... 26h 81h 82h 10h 11h 12h 13h 14h 15h 16h 17h 18h 19h 1Ah 1Bh 1Ch 1Dh 50h 51h 52h 53h 54h Sensor Name Temperature Memory Board Temp Memory Board SNC Temp PCI Riser SIOH Temp Peripheral Board AMB Temp PCI Riser board Temp CPU Area Temp... Memory Area Temp Processor 1 Temp Processor 2 Temp Voltage MB Bd +1.25V MB Bd +1.5V MB Bd +1.8V MB Bd +3.3V MB Bd +3.3V SB MB Bd +...
... 26h 81h 82h 10h 11h 12h 13h 14h 15h 16h 17h 18h 19h 1Ah 1Bh 1Ch 1Dh 50h 51h 52h 53h 54h Sensor Name Temperature Memory Board Temp Memory Board SNC Temp PCI Riser SIOH Temp Peripheral Board AMB Temp PCI Riser board Temp CPU Area Temp... Memory Area Temp Processor 1 Temp Processor 2 Temp Voltage MB Bd +1.25V MB Bd +1.5V MB Bd +1.8V MB Bd +3.3V MB Bd +3.3V SB MB Bd +...
Product Guide (.pdf)
Page 14
... consumes data with an uncorrectable error. • Global MCA: A machine check is taken by the platform (such as singlebit ECC error in memory) and errors that are signaled as a CMCI to system software. These include errors that are corrected by the processor when it reads data with...occurs when the processor detects a fatal or recoverable error during execution of instructions or when the processor is a 2XECC error detected on a write to memory. 8 Revision 1.1 In the event of an MCA, the processor will enter MCA handling mode. If the event is detected on platform-fatal errors...
... consumes data with an uncorrectable error. • Global MCA: A machine check is taken by the platform (such as singlebit ECC error in memory) and errors that are signaled as a CMCI to system software. These include errors that are corrected by the processor when it reads data with...occurs when the processor detects a fatal or recoverable error during execution of instructions or when the processor is a 2XECC error detected on a write to memory. 8 Revision 1.1 In the event of an MCA, the processor will enter MCA handling mode. If the event is detected on platform-fatal errors...
Product Guide (.pdf)
Page 16
.... For details on System Management BIOS (SMBIOS) Type 4, Type 16 and 17, refer to excessive amounts of time spent in a memory DIMM, a corrected error may experience performance degradation due to the System Management BIOS Reference Specification available on the next reboot. In this ...in the chipset. If this scenario, the system may occur with duplicate errors. Corrected errors are classified into four categories: Processor, Memory, PCI PERR, and Generic Bus. Thresholding does not apply to the BMC SEL logging routines. This thresholding in no way affects the...
.... For details on System Management BIOS (SMBIOS) Type 4, Type 16 and 17, refer to excessive amounts of time spent in a memory DIMM, a corrected error may experience performance degradation due to the System Management BIOS Reference Specification available on the next reboot. In this ...in the chipset. If this scenario, the system may occur with duplicate errors. Corrected errors are classified into four categories: Processor, Memory, PCI PERR, and Generic Bus. Thresholding does not apply to the BMC SEL logging routines. This thresholding in no way affects the...
Product Guide (.pdf)
Page 19
.../RED_BLACK NVRAM cleared By jumper DFLT/RED_BLACK Password clear WARN/YELLOW_BLACK NVRAM cleared By Front panel DFLT/RED_BLACK PCI Error DFLT/RED_BLACK PCI Memory Allocation Error DFLT/RED_BLACK Pause on PCI riser, remove Add-in the column-heading Pause On Boot. Check battery, check / modify...brief pause and does not require user interaction. Check CLR PASS Jumper J5H2: Normal 1-2, Clear 23, If failure persists replace main board. Memory allocation area for stuck key or replace Keyboard, If failure persists replace main board. Intel® Server Platform SR870BH2 BIOS POST Error Codes and...
.../RED_BLACK NVRAM cleared By jumper DFLT/RED_BLACK Password clear WARN/YELLOW_BLACK NVRAM cleared By Front panel DFLT/RED_BLACK PCI Error DFLT/RED_BLACK PCI Memory Allocation Error DFLT/RED_BLACK Pause on PCI riser, remove Add-in the column-heading Pause On Boot. Check battery, check / modify...brief pause and does not require user interaction. Check CLR PASS Jumper J5H2: Normal 1-2, Clear 23, If failure persists replace main board. Memory allocation area for stuck key or replace Keyboard, If failure persists replace main board. Intel® Server Platform SR870BH2 BIOS POST Error Codes and...
Product Guide (.pdf)
Page 20
... devices has been exceeded - remove add-adapter(s) retest. If this occurs and the ambient temp is present and secured. PCI ROM memory allocation area for PCI devices has been exceeded - Suspect or replace any add-in adapter(s) 1st, the PCI Riser 2nd and the... Yes PCI IRQ Allocation Error Yes DFLT/RED_BLACK Shadow of PCI ROM Failed Yes DFLT/RED_BLACK PCI ROM not found Yes DFLT/RED_BLACK Insufficient Memory to "Processor" in Debug Methodology and failure Isolation section. However, Under normal (i.e. remove add-adapter(s) retest. remove addadapter(s) retest....
... devices has been exceeded - remove add-adapter(s) retest. If this occurs and the ambient temp is present and secured. PCI ROM memory allocation area for PCI devices has been exceeded - Suspect or replace any add-in adapter(s) 1st, the PCI Riser 2nd and the... Yes PCI IRQ Allocation Error Yes DFLT/RED_BLACK Shadow of PCI ROM Failed Yes DFLT/RED_BLACK PCI ROM not found Yes DFLT/RED_BLACK Insufficient Memory to "Processor" in Debug Methodology and failure Isolation section. However, Under normal (i.e. remove add-adapter(s) retest. remove addadapter(s) retest....
Product Guide (.pdf)
Page 22
... 2 will be Functionally restricted. If both will be Performance restricted. Failure identified in Debug Methodology and failure Isolation section. Refer to "Memory" in Debug Methodology and failure Isolation section. Refer to "Processor Late Self test" in Debug Methodology and failure Isolation section. Row mapped... 2 Late Self Test Failed: Catastrophic failure DFLT/RED_BLACK Baseboard Management Controller failed to "Watch dog timer" for detail. Refer to "Memory" in Row 2; The multi-bit error is detected on Boot No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes...
... 2 will be Functionally restricted. If both will be Performance restricted. Failure identified in Debug Methodology and failure Isolation section. Refer to "Memory" in Debug Methodology and failure Isolation section. Refer to "Processor Late Self test" in Debug Methodology and failure Isolation section. Row mapped... 2 Late Self Test Failed: Catastrophic failure DFLT/RED_BLACK Baseboard Management Controller failed to "Watch dog timer" for detail. Refer to "Memory" in Row 2; The multi-bit error is detected on Boot No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes...
Product Guide (.pdf)
Page 23
... Isolation section. WARN/YELLOW_BLACK DIMM1 defective. WARN/YELLOW_BLACK DIMM4 defective. WARN/YELLOW_BLACK DIMM7 defective. Row disabled, Refer to "Memory" in Row 2; WARN/YELLOW_BLACK DIMM2 defective. WARN/YELLOW_BLACK DIMM3 defective. WARN/YELLOW_BLACK DIMM8 defective. Issue with DIMM SPD value... in Debug Methodology and failure Isolation section. Revision 1.1 17 Issue with DIMM 1=J9J3, Refer to "Memory" in Debug Methodology and failure Isolation section. Intel® Server Platform SR870BH2 BIOS POST Error Codes and Messages Error Code 8508 ...
... Isolation section. WARN/YELLOW_BLACK DIMM1 defective. WARN/YELLOW_BLACK DIMM4 defective. WARN/YELLOW_BLACK DIMM7 defective. Row disabled, Refer to "Memory" in Row 2; WARN/YELLOW_BLACK DIMM2 defective. WARN/YELLOW_BLACK DIMM3 defective. WARN/YELLOW_BLACK DIMM8 defective. Issue with DIMM SPD value... in Debug Methodology and failure Isolation section. Revision 1.1 17 Issue with DIMM 1=J9J3, Refer to "Memory" in Debug Methodology and failure Isolation section. Intel® Server Platform SR870BH2 BIOS POST Error Codes and Messages Error Code 8508 ...
Product Guide (.pdf)
Page 24
... contamination on the main board near each DIMM site - DIMM Sites 1=J9J3, 2=J9J1, 3=J9D3, 4=J9D1 FIRST. Debug Methodology and Failure Isolation 8.1 Memory If the memory test finds any bad DIMM(s) (defined as mismatched DIMMs within a row, multi-bit errors [MBE] detected within a DIMM, single-bit [SBE] ...there is at a minimum). swap ROW one ROW, DIMM sites 1,2,3,4 must be replaced in one with ROW two (2nd set of known good memory. The memory test can be logged to SEL. Sites 5=J9J2, 6=J8J1, 7=J9J2, 8=J8D1 are not physically consecutive. first 4 DIMMS in the post and...
... contamination on the main board near each DIMM site - DIMM Sites 1=J9J3, 2=J9J1, 3=J9D3, 4=J9D1 FIRST. Debug Methodology and Failure Isolation 8.1 Memory If the memory test finds any bad DIMM(s) (defined as mismatched DIMMs within a row, multi-bit errors [MBE] detected within a DIMM, single-bit [SBE] ...there is at a minimum). swap ROW one ROW, DIMM sites 1,2,3,4 must be replaced in one with ROW two (2nd set of known good memory. The memory test can be logged to SEL. Sites 5=J9J2, 6=J8J1, 7=J9J2, 8=J8D1 are not physically consecutive. first 4 DIMMS in the post and...
Product Guide (.pdf)
Page 26
... system early in POST, well before the POST error manager is called, in the order Healthy, Functionally, Performance Restricted). - Once the system memory is initialized, BIOS SAL calls PAL to the "Processor" section of displaying the late self-test errors that any processor late selftest error is ...of Debug Methodology and COMPONENT Isolation. At this time the late self-test will continue through POST until the sign-on banner and memory test results are : Functionally restricted - The BIOS assumes this time the late self-test will continue through POST until the sign-on ...
... system early in POST, well before the POST error manager is called, in the order Healthy, Functionally, Performance Restricted). - Once the system memory is initialized, BIOS SAL calls PAL to the "Processor" section of displaying the late self-test errors that any processor late selftest error is ...of Debug Methodology and COMPONENT Isolation. At this time the late self-test will continue through POST until the sign-on banner and memory test results are : Functionally restricted - The BIOS assumes this time the late self-test will continue through POST until the sign-on ...
Product Guide (.pdf)
Page 31
...section. See "Processor" in Debug Methodology and failure Isolation section. Beep count 3 Error message Memory failure Memory test failure. 4 System timer 5 Processor failure 7 Processor exception interrupt error 8 Display memory read/write error 9 ROM checksum error 11 Invalid BIOS Table 7. Check Add-in a row,... timer is faulty. If symptom persists replace main board. Beep Codes During the course of action / Possible Failure 1) No valid memory was found in the system. 2) Mismatched DIMMs in video adapter if used, if onboard is enabled. See "Processor" in Debug...
...section. See "Processor" in Debug Methodology and failure Isolation section. Beep count 3 Error message Memory failure Memory test failure. 4 System timer 5 Processor failure 7 Processor exception interrupt error 8 Display memory read/write error 9 ROM checksum error 11 Invalid BIOS Table 7. Check Add-in a row,... timer is faulty. If symptom persists replace main board. Beep Codes During the course of action / Possible Failure 1) No valid memory was found in the system. 2) Mismatched DIMMs in video adapter if used, if onboard is enabled. See "Processor" in Debug...
Product Guide (.pdf)
Page 34
... devices are not supported and will result in a continuous beep code (approximately 1 beep every 2 seconds until the system is emitted as the server begins to memory. 7. CD Recovery activity begins. Reconnect AC power and power on the front control panel. 11.2 BIOS Recovery Mode The BIOS Recovery Mode permits re-flashing...
... devices are not supported and will result in a continuous beep code (approximately 1 beep every 2 seconds until the system is emitted as the server begins to memory. 7. CD Recovery activity begins. Reconnect AC power and power on the front control panel. 11.2 BIOS Recovery Mode The BIOS Recovery Mode permits re-flashing...
Product Guide (.pdf)
Page 35
... platform MCA condition occurred. Megabyte=1024 Kilobytes. see the Itanium™ Processor Family Error Handling Guide. Parity Error. Read-Only Memory. Built-In Self Test. Distributed Translation Look-aside Buffer. Fault Resilient Booting. 1024 MB. Local Area Network. Machine Check Architecture...Configuration and Power Interface. A data communications system which allows a number of independent devices to support limited detection/correction of memory, which normally resides on the PCI bus that has extra bit(s) to communicate with each other within a moderate size ...
... platform MCA condition occurred. Megabyte=1024 Kilobytes. see the Itanium™ Processor Family Error Handling Guide. Parity Error. Read-Only Memory. Built-In Self Test. Distributed Translation Look-aside Buffer. Fault Resilient Booting. 1024 MB. Local Area Network. Machine Check Architecture...Configuration and Power Interface. A data communications system which allows a number of independent devices to support limited detection/correction of memory, which normally resides on the PCI bus that has extra bit(s) to communicate with each other within a moderate size ...
Product Guide (.pdf)
Page 36
Scalable Node Controller. The north bridge and memory controller (combined) in the 870 chipset. Universal Serial Bus, a standard serial expansion bus meant for connecting peripherals. Glossary Intel® Server Platform SR870BH2 Term SEL SERR SMBIOS SNC USB System Event Log. II Revision 1.1 System Management BIOS. A signal on the PCI bus that indicates a 'fatal' error on the bus. Definition System Error.
Scalable Node Controller. The north bridge and memory controller (combined) in the 870 chipset. Universal Serial Bus, a standard serial expansion bus meant for connecting peripherals. Glossary Intel® Server Platform SR870BH2 Term SEL SERR SMBIOS SNC USB System Event Log. II Revision 1.1 System Management BIOS. A signal on the PCI bus that indicates a 'fatal' error on the bus. Definition System Error.