How to use this Book ?

This handbook isn’t a passive read; it’s a springboard for your journey as a data engineer. Here’s how to unlock its full potential and build a strong data foundation:

Active Reading and Problem-Solving

  • As you read, don’t just absorb information.** Engage with the problems** presented.
  • Pause and reflect: What is the business challenge? What data is involved?
  • Imagine yourself as a data engineer: How would you approach this problem?
  • Annotate the book: Take notes, ask questions in the margins, sketch out potential solutions.

Leverage Drawing Tools

  • Don’t just read about data flows, draw them yourself!
  • Use a physical notebook, whiteboard, or digital tools like Miro or draw.io.
  • Visualize data pipelines: Draw boxes for data sources, transformations, and destinations.
  • Map relationships: Use arrows to show how data flows between components.
  • Capture complex concepts: Diagrams can clarify multi-step processes and system interactions.

Hands-on Practice (Optional)

  • If the book provides code examples, don’t just read them. Run the code!
  • Experiment with different scenarios and modify the code.
  • This practical exploration will solidify your understanding and build confidence.

Remember: Don’t be afraid to experiment with your drawings, code (if available), and problem-solving approaches. This active learning approach will help you:

  • Develop strong logical thinking skills.
  • Gain a deeper understanding of how data engineering impacts businesses.
  • Become comfortable tackling real-world data challenges.

Bonus Tip

  • Consider creating a dedicated “Data Engineering Learning Journal” to document your progress. Include notes, diagrams, code snippets, and reflections on your learning journey.
  • By combining active reading, visual thinking through drawing, and hands-on practice (if applicable), you’ll transform this book from a static resource to a dynamic tool that empowers you to build a solid foundation for your data engineering career.
  • List of drawing tools available and most used: DrawIO, Excalidraw, Lucid Chart, etc.

Power of Fundamental Foundation Learning

Target of this book is for dicing into Data Platform Blueprint and describing each component of the platform. Finding in the book tools that fit into each key area of a Data platform (Connect, Buffer, Processing Framework, Store, Visualize).

Then back to the basic of data platform with 5 tiers:

  1. Source: Connect, Integration, Ingestion
  2. Backend: Buffer, Processing
  3. Storage: Data-lake, Data Warehouse, Lake-house, OLAP, CUBE
  4. Semantic: API, Cache, Memory, Access Control
  5. Frontend: Visualization, Exporting, Revert ETL, Application

Select a few tools you are interested in, research and work with them.

I have also created a repo Setup Data Dev Environment, fork it and start a custom, please create PR if you find out any interesting tools/services.

I am keeping maintain this book as the technology is change as always but I will keep the fundamental concepts, knowledge here. I know people are learning things so fast and that is not easy to understand everything.

They might be loss the basic of software engineering and computer science. But I will keep it here.

Quote

Step back and move forward!

What if you are working in the legacy system?

As freelance and outsourcing consultant, I experienced with SQL Server, Oracle, IBM, Informatica Power Center; That felt uncomfortable, painful and weaknesses at the first time, I decided to view that with different angle and I realized that I can learn a lot from them, Yes - The Basic things that how it was built was created.

Key learnings:

  • Legacy technologies like Informatica, SSIS, and SQL Server can provide valuable learning opportunities.
  • By understanding and learning from these technologies, you can gain a deeper understanding of data management and ETL processes, which are still crucial in today’s data-driven world.
  • Building solid foundation knowledge and how the basis things work before jumping into other more “Easier”.
  • I mentioned “Easier” because the improvement, revolution of tools, platform with less managed from end-users.

Use Mapping topics with Contents

I am using second brain for navigating thoughts to mapping and notes that I am noting, that helps me to structure and categorize knowledge in the better format. Check Mapping of Contents

If you prefer to hear the Podcast, check my Youtube channels Youtube Channel, and subscribe to get new episodes posted.

As you can see, this book is not finished. I’m constantly adding new stuff and doing videos for the topics. But obviously, because I do this as a hobby my time is limited. You can help making this book even better.

Tell me your thoughts, what you value, what you think should be included, or correct me where I am wrong.

But no-worry about that you have shortcut the book itself or anything because this book is focusing on fundamental knowledge and the good for people want to learn the foundation.

Everything has changed by AI and ML, but those concepts were published since 19xx or early 20xx, and know it has been crazy enough to make the huge difference.

Then, I think that we should not affair this, stay focused, and keep going forwards, and last but not least, keep rules

flowchart LR
r1["fontawesome-solid-check Start from small"] --> r2["Keep Discipline"] & r3["Do Consistent"] --> r4["Never-end"] --> r1["Start from small"]

Using Code Examples

Even though this book is helping you with idea of how things were initially and created, but good enough for practicing with the coding and programming.

As I prefer to use dotfile for setting up everything for Data Engineer and Data Architect to be able to develop and create Data Applications.

Warning

I highly recommend to use the the Open Source - Data Stack for development. 5 Layers has their own opensource for data stack.

You can use the code in this book for your reference and pre-produce it on your environment with dotfile setup which had custom and design for data people.

Glossary

TermDescription
SFDESolid Foundations for Data Engineer
DECData Engineering Camping
DEHData Engineering Book
DFData Foundation
KBKnowledge Base
BPBest Practices
UCUse case
CSCase Study
DDDesign Pattern