# Day 4: Create a Standard ML Project Structure

> Source: <https://dev.to/thukhakyawe_cloud/day-4-create-a-standard-ml-project-structure-3hi7>
> Published: 2026-06-03 12:35:33+00:00

A colleague has started a new ML project at /root/code/fraud-detection/, but the layout does not match the xFusionCorp Industries standard. Bring the project in line with the team's conventions.

```
Inspect the existing project at /root/code/fraud-detection/.

The final layout must match the tree below exactly:
```

fraud-detection/

├── data/

│ ├── raw/

│ └── processed/

├── models/

├── notebooks/

├── src/

│ ├── data/

│ ├── features/

│ ├── models/

│ └── utils/

├── tests/

├── configs/

├── requirements.txt

└── README.md

```
Every subdirectory under src/ must contain an __init__.py file so that Python recognises it as a package.

requirements.txt must list the following dependencies, one per line: scikit-learn, pandas, numpy, and mlflow. The canonical PyPI name for the scikit-learn package is scikit-learn.

README.md must begin with the heading # fraud-detection.

Review the existing project and correct everything that does not match the requirements above.
```

🧭 Part 1: Lab Step-by-Step Guidelines

Run the following commands on the controlplane host.

Step 1 — Move into the project directory

```
cd /root/code/fraud-detection
```

Step 2 — Inspect the current structure

```
tree
```

If tree is unavailable:

```
sudo apt update && sudo apt install tree
```

Step 3 — Check the required directory structure

Run:

```
tree
mv src/feature src/features
mv src/util src/utils
mkdir -p data/raw
mkdir -p data/processed
mkdir -p tests
mkdir -p configs
```

Check:

```
ls src/features
ls src/utils
```

You should see:

**init**.py

Step 4 — Verify and fix requirements.txt

Create/update the file:

```
cat > requirements.txt
```

Output:

```
sklearn
pandas
numpy
```

Create the correct requirements.txt:

```
cat > requirements.txt <<EOF
scikit-learn
pandas
numpy
mlflow
EOF
```

Create/update the README:

```
cat README.md
```

Output:

```
# Fraud

ML project for fraud detection at xFusionCorp Industries.
```

Replace the README

Run:

```
cat > README.md <<EOF
# fraud-detection

ML project for fraud detection at xFusionCorp Industries.
EOF
```

Step 7 — Verify the final structure and README.md content

Run:

```
tree
cat README.md
```

Expected structure:

fraud-detection/

├── data/

│ ├── raw/

│ └── processed/

├── models/

├── notebooks/

├── src/

│ ├── data/

│ │ └── **init**.py

│ ├── features/

│ │ └── **init**.py

│ ├── models/

│ │ └── **init**.py

│ └── utils/

│ └── **init**.py

├── tests/

├── configs/

├── requirements.txt

└── README.md

```
root@controlplane ~/code/fraud-detection via 🐍 v3.12.3 ➜  cat README.md 
# fraud-detection

ML project for fraud detection at xFusionCorp Industries.
```

🧠 Part 2: Simple Beginner-Friendly Explanation

This lab focuses on organising a machine learning project according to the xFusionCorp Industries standard structure.

The goal is to:

standardise project layout

improve maintainability

make collaboration easier for developers and data scientists

You must inspect the existing project and correct anything that does not match the required structure.

Understanding the Required Project Structure

The final project must look exactly like this:

fraud-detection/

├── data/

│ ├── raw/

│ └── processed/

├── models/

├── notebooks/

├── src/

│ ├── data/

│ ├── features/

│ ├── models/

│ └── utils/

├── tests/

├── configs/

├── requirements.txt

└── README.md

Each folder has a specific purpose in an ML workflow.

Purpose of Each Directory

data/

Stores datasets used in the project.

data/raw/

Contains original unmodified data.

Example:

transactions.csv

data/processed/

Contains cleaned or transformed datasets used for training.

Example:

clean_transactions.csv

models/

Stores trained machine learning models.

Example:

fraud_model.pkl

notebooks/

Contains Jupyter notebooks for experimentation and analysis.

Example:

eda.ipynb

src/

Contains the main Python source code for the application.

This keeps project logic organised and modular.

Why **init**.py Files Are Required

Every subdirectory under src/ must contain:

**init**.py

This tells Python:

“Treat this directory as a Python package.”

Without these files:

imports may fail

modules may not be recognised correctly

Example:

from src.models.train import train_model

Purpose of src/ Subdirectories

Directory Purpose

src/data Data loading and preprocessing

src/features Feature engineering logic

src/models Training and prediction code

src/utils Helper functions and utilities

Why requirements.txt Matters

The lab requires the following dependencies:

scikit-learn

pandas

numpy

mlflow

This file helps developers install all required Python packages consistently.

Important note:

the correct PyPI package name is scikit-learn

not sklearn

Why README.md Matters

The README file provides project documentation.

The lab specifically requires it to begin with:

This acts as the project title and ensures naming consistency.

Why Exact Naming Is Important

Lab validators check:

exact folder names

exact file names

exact dependency names

Even small differences such as:

feature instead of features

util instead of utils

can cause the lab to fail.