Smart Structuring: A Guide to Efficiently Managing Data Science Projects. Preface
Smart Structuring: A Guide to Efficiently Managing Data Science Projects. Preface
This is an excerpt from my upcoming e-book
“Smart Structuring: A Guide to Efficiently Managing Data Science Projects”.
It will cover topics such as:
1) standardized file & folder structure for your projects,
2) package version management for code stability,
3) introducing parameters in your projects,
4) much more, including examples in Python! 🐍
Since the beginning of ages, people have had problems with ordering files on their desktops. You could easily imagine a honest day of work in ancient Egypt, opening your “Pyramid_resources” folder and seeing there something like this:
- “limestone.jpg”
- “limestone_v1.jpg”,
- “limestone_v4.jpg”,
- “limestone_v4_but_better.jpg”,
- “limestone_v3_sent_from_Anubis_Final.jpg”
What the hell are you supposed to do with this?! Which limestone should be used for today’s construction batch? There is an enhanced version 4 at hand, but I am not sure if Anubis isn’t someone who has the final say around here…
Imagine how much easier it would be to just have it formatted as “yyyymmdd_limestone.jpg”? For example:
- 20240401_limestone.jpg
- 20240324_limestone.jpg
Now, if you sort your files by name in descending order - you immediately know the last version you worked on.
You have multiple versions per day? Not a problem - add a “vxx” component for version. So together it will be “yyyymmdd_vxx_limestone.jpg”. Something along the lines:
- 20240410_v05_limestone.jpg
- 20240410_v04_limestone.jpg
- (…)
- 20240324_limestone.jpg
Benefits?
- Again, you can just find current version by sorting files by their names.
- There’s no need to implement changes retroactively—all older items with only dates will still sort correctly.
And perhaps you wouldn’t be scared when your boss approaches your desk; instead, you might be smiling.
Phew! That was a close one! Thankfully, you avoided a scolding.
Are you the only one who struggles with keeping order in their projects? Of course not! There are legions of us!
Even kind souls at prestigious universities like Harvard and the Massachusetts Institute of Technology (MIT) have faced similar challenges and have developed guidelines on how to organize your project files. I’ll do my best to combine those with my data science experience and share a recipe for a project structure that features:
- Usability - easy to understand
- Maintainability - easy to change and update
- Reusability - easy to use in other projects or applications
- Portability - easy to shift to other systems
- Modularity - can be easily decomposed into smaller, independent pieces
Although it’s past Dry January, I will also cover the DRY (Don’t Repeat Yourself) principle in programming.
All of the above should make your projects more standardized, easier to kick off, and over time - bring more joy to your work. I don’t recommend reinventing the wheel each time; this allows you to save precious brain energy for thinking about things like flying saucers, a cure for cancer, or where you left your car keys.
There is a reason why Barack Obama, during his presidency, wore only gray or blue suits daily. It was one of his strategies to limit decision fatigue, saving energy for more important decisions. Be like Barack .
Hop on! There’s a lot to explain!
Any comments, remarks, questions - feel free to contact me at tomasz@demystifAI.blog
This is an excerpt from my upcoming e-book
“Smart Structuring: A Guide to Efficiently Managing Data Science Projects”.
It will cover topics such as:
1) standardized file & folder structure for your projects,
2) package version management for code stability,
3) introducing parameters in your projects,
4) much more, including examples in Python! 🐍