<h1 id="ddia">DDIA<a aria-hidden="true" class="anchor-heading icon-link" href="#ddia"></a></h1>
<p>Legendary book about building scalable data systems.</p>
<ul>
<li>Chapters 1-4 are a must read.</li>
<li>Chapter 1 defines what system characteristics we care about: scalability, reliability, maintainability.</li>
<li>Chapter 2 discusses data models and query languages - from user's perspective how do you work with data.</li>
<li>Chapter 3 discusses storage and retrieval - how data is stored on disk, B+ trees, LSM trees - from DB-engineer's perspective. DB has two jobs to store data and to return data when queried.</li>
<li>Chapter 4 discusses encoding and evolution - how data is encoded on disk, serialization formats (JSON, XML, Protobuf, Avro, Thrift), schema evolution. This is about data representation on a computer.</li>
<li>Part II is about distributed data (a bit harder to read). Chapter 7 is worth reading to understand what transactions are (ACID, where the 'C' is not related to transactions..)</li>
<li>Chapter 5&#x26; 6 are about replication and partitioning and are ok. Chapter 8-9 not worth so much (or at least I'm not at the stage where I could understand and relate to them).</li>
<li>Chapter 10 about batch processing is great. Starts from Unix batch, explain MapReduce and all this will help you understand the internals of Spark.</li>
</ul>

DDIA


Zdr bebce kp ;)

This is my knowledge base, additionally I keep [daily journal](https://docs.google.com/document/d/1m8Npu0-t8RweyKiHCjLL1PPDYzbebqm3OZDHH8IVsb8/edit?tab=t.rj0kvkrm7zpr).