The stability and reliability of our services is one of our key focus points. Nevertheless, it can happen that your VPS becomes involved in an incident and temporarily becomes inaccessible. In this article, we provide you with an overview of terms you may encounter in a message following an incident, which have not yet been covered in other articles.
Ansible playbook
An Ansible playbook is an automated script used for managing configurations, automating tasks, and deploying software. A playbook is written in YAML (Yet Another Markup Language) and defines tasks and settings that need to be executed on one or more servers. Playbooks are designed to be reusable and easy to understand, allowing system administrators and developers to automate and simplify complex processes.
Ceph
Ceph is an open-source data storage software platform. Ceph provides object, block, and file storage in a distributed computer network. It is designed to be highly scalable and fault-tolerant. Ceph allows the use of thousands of storage servers to manage petabytes to exabytes of data.
DMA
DMA (Direct Memory Access) is a feature that allows hardware components in a computer to read and write data directly to and from RAM without the intervention of the CPU. This increases data speed and reduces processor load.
Driver/Device Driver
A driver/device driver refers to a specific type of software that enables interaction between the operating system and a hardware component. Drivers allow the operating system to communicate with the hardware and are essential for the operation of computers and other devices.
Flapping
When a network connection 'flaps,' it drops out momentarily. It is only a fraction of a second where the network connection of a device drops and immediately comes back online. Flapping of a connection can have various causes, such as an excessive load on CPU cores that process IRQs.
Hypervisor
A hypervisor is computer software, firmware, or hardware that allows you to create and host virtual machines (VMs). With a hypervisor, a computer (the host machine) can support one or more virtual machines (guest machines) by virtually sharing its resources (CPU, RAM, network, etc).
In most cases, this is used to virtualize servers. A VPS at TransIP is an example of a virtual machine hosted on a hypervisor.
IO
IO stands for "Input/Output," which literally means "input/output." It is a term used to describe how a computer or device receives information (input) and how it sends or displays information (output).
- Input: This can be anything, such as the letters you type on a keyboard, the movements of your mouse, or data downloaded from the internet. They are all actions or data sent to the computer to be processed. In the context of our platforms, we usually mean network traffic or data sent to/from the disk of a VPS/hypervisor.
- Output: These are the results after processing by the computer, such as the text that appears on your screen, the sound that comes from your speakers, or documents that come out of your printer.
IOMMU
An IOMMU (Input-Output Memory Management Unit) is a hardware component that enables advanced memory management, primarily for external devices such as graphics cards and network cards. It allows these devices to have direct access to the computer's memory while protecting the integrity and security of the system. The security provided by IOMMU is very important because direct access to memory without IOMMU could also potentially provide uncontrolled access to things like passwords. When you type a password, for example, to log in to a website, that password ends up in memory.
IOMMU Path
The IOMMU path refers to the communication path between the IOMMU and the devices it supports. The path ensures data transfer and memory allocation between the IOMMU and the device, making direct access to the memory possible without the CPU needing to intervene.
IRQ
An IRQ (Interrupt Request) is a signal sent to the CPU to request attention and indicate that a device such as a keyboard, mouse, or network card needs immediate attention. When the CPU receives an IRQ, it interrupts the current operations (hence the term “interrupt”). It enables the CPU to respond to hardware requests by executing the corresponding 'interrupt handler.'
Interrupt Handler
An interrupt handler, also known as an interrupt service routine (ISR), is a special function in the operating system or in a driver that is called when an interrupt (IRQ) occurs.
When the CPU interrupts the current operations after receiving an IRQ, it executes the interrupt handler. This handler is responsible for identifying the cause of the interrupt, performing the necessary tasks to process the event. The handler then informs the CPU that the interrupt has been handled, after which the CPU returns to its previous activities.
Kernel
The kernel is the central part of an operating system. It has full control over a system and functions as a kind of bridge between applications and the computer's hardware. Drivers/device drivers are used for communication between the operating system and the physical hardware. The kernel is responsible, for example, for memory management, process management, device management, and processing IRQs.
The critical code of the kernel is usually loaded into a reserved part of the memory. This reserved memory is protected from access by applications and less critical parts of an operating system.
Kernel panic
A kernel panic is a critical system error that occurs when the operating system detects an error from which it cannot recover. This often results in the system abruptly stopping to prevent further damage. A kernel panic can be caused by various issues such as hardware failure, corrupt drivers, or bugs in the kernel code itself. The screen typically displays an error message or diagnostic information that can help in tracing the cause of the failure.
Kernel workers
Kernel workers are background processes or 'threads' created and managed by the operating system's kernel. They are used to perform asynchronous tasks, such as handling system events (such as hardware interrupts, or IRQs) or executing drivers, without affecting the performance of processes running in the foreground of your operating system.
KRBD Kernel Module
KRBD stands for Kernel-based RADOS Block Device. It is a module within the Linux kernel that provides direct access to Ceph storage clusters via the operating system's block device interface. KRBD utilizes the operating system's kernel to mount RBD images (RADOS Block Devices) as block devices, allowing them to be used by your VPS as if they were local hard drives.
Network Interface
A network interface or network adapter is a software or hardware component that enables communication between a computer and a network. It can be a physical network card (for example, one that sits on your computer's motherboard) or a software device such as a virtual network adapter (like your VPS's network adapter). The network interface contains the necessary electronic circuits and software drivers to communicate over a network.
Network Packet
In a computer network, network communication is divided into very small pieces, called network packets. These are no larger than 21 to 65535 bytes and contain not only the data to be sent but also destination and source addresses and error control information.
Packet Loss
Packet loss occurs when one or more sent packets do not reach their destination on a network. This can be caused by network congestion, data corruption, hardware failures, or software problems. Packet loss can affect the speed and quality of network communication.
PCI Device
A PCI device is a piece of hardware that is connected to a computer via a PCI slot on the motherboard. PCI stands for Peripheral Component Interconnect, an industry standard bus for connecting peripherals to the computer.
PCI devices can range from network cards, sound cards, graphics cards, to expansion cards that provide additional ports such as USB or Ethernet. These devices communicate with the motherboard via the PCI bus, allowing data to be transferred between the hardware and the system. PCI has evolved over time into variants such as PCI-X and PCI Express, which offer higher data transfer speeds and improved performance.
RADOS Block Device (RBD-image)
An RBD image is a type of virtual disk storage/hard drive used by Ceph to simulate block storage (in OSDs), similar to physical disks but within a distributed environment.
Ceph distributes these RBD images across a cluster of servers to achieve high availability. The data is split into pieces and spread across different servers, which helps protect against data loss due to hardware failures.
RBD Header
The headers of an RBD image contain metadata, snapshot information, location tracking, version control, and regulate access control of an RBD image, namely:
Metadata: The headers contain crucial metadata such as the size of the image, settings of the redundancy configuration (e.g., whether replication or erasure coding is applied), snapshot information, and information necessary for cloning images.
Snapshot: Snapshots are automatically created of RBD images. The headers contain information about snapshots of the RBD image that is crucial for restoring a snapshot and managing changes in the data stored in an RBD image.
Location Tracking: Helps track where the individual pieces of data from the RBD image are stored within the Ceph cluster. This is important for quickly locating and retrieving data from various OSDs.
Version Control: The headers can contain information about different versions of the RBD image, which is especially useful when updating or modifying the image without disturbing the integrity of the original data.
Access Control: Rules can be stored in the headers that determine who or what has access to the RBD image.
To ensure the integrity of the data, Ceph places a lock on the RBD header. A hypervisor can, in turn, obtain a lock on an RBD image through this lock, allowing it to be used by a VPS.
ZFS
ZFS is an advanced file and volume management system designed by Sun Microsystems. It offers features such as high storage capacity, data integrity control, integrated RAID functionality, snapshot and clone capabilities, and continuous data integrity checking.
With ZFS, if data is stored on a (RAID) set of multiple hard drives, this set of drives can then be connected to another system (with ZFS) to continue being used. This allows us, for example, to migrate disks with VPS data between different storage servers without you noticing.