The knowledge involved in building large-scale system, covering everthing form architecture to algorithms, from macro to micro.

Comments and suggestions are welcomed.

(Content is being sorted out, a little bit confusing right now)

Distributed Systems

Data Processing

Stream Processing

SQL Workloads

Distrubuted Storage System

Database

Motivation

Architecture

SQL Parser

Query Optimization

Join Algorithm

  • Nested Loop Join
    • Simple Nested Loops Join
      • tuple-at-a-time
      • page-at-a-time
    • Block Nested Loops Join
    • Index Nested Loops Join
  • Sort-Merge Join
  • Hash Join

Lock & Transaction

Log

  • Physical log
  • Logical logging
  • Physiological logging
  • Write Ahead Logging (WAL)

Deadlock Handling

  • Deadlock avoidance
  • Deadlock detection
    • Timeout
    • Wait-for graph

Two-Phase Locking(2PL)

Classification of 2PL

  • Basic 2PL
  • Strict 2PL
  • Conservative 2PL
  • Rigorous 2PL

Isolation Level

  • Read uncommited
  • Read commited
  • Repeatable read
  • Serializale

Concurrency Control

  • Lock
  • Optimistic concurrency control
  • Multiversion concurrency control (MVCC)

Optimistic Concurrency Control

Recovery

Write Ahead Logging (WAL)

Write Behind Logging

Others

Storage

BLOB Storage

Distributed File System

Disk Error Correction

Reed-Solomon

Data Structures

Range Filter

Distributed Algorithm

Course

Eventual Consistency

Consensus Algorithm

Raft

Paxos

Zab

Distrubuted Hash Table (DHT)

File Format

Tracing

Scheduling

Allocator

GPU Programming

Concurrency Programming

PLT

Course

Distributed Systemes

System Programming

UICD CS 241: System Programming

Waiting For Classification

System Programming

Memory Allocation

Compiler

LLVM