cd /news/machine-learning/day-4-create-a-standard-ml-project-s… Β· home β€Ί topics β€Ί machine-learning β€Ί article
[ARTICLE Β· art-20375] src=dev.to pub= topic=machine-learning verified=true sentiment=Β· neutral

Day 4: Create a Standard ML Project Structure

A developer at xFusionCorp Industries restructured a fraud-detection machine learning project at `/root/code/fraud-detection/` to match the company's standard layout. The project now includes directories for `data/raw/`, `data/processed/`, `models/`, `notebooks/`, `src/` with subpackages, `tests/`, and `configs/`, along with a corrected `requirements.txt` listing scikit-learn, pandas, numpy, and mlflow, and a `README.md` beginning with `# fraud-detection`.

read4 min publishedJun 3, 2026

A colleague has started a new ML project at /root/code/fraud-detection/, but the layout does not match the xFusionCorp Industries standard. Bring the project in line with the team's conventions.

Inspect the existing project at /root/code/fraud-detection/.

The final layout must match the tree below exactly:

fraud-detection/

β”œβ”€β”€ data/

β”‚ β”œβ”€β”€ raw/

β”‚ └── processed/

β”œβ”€β”€ models/

β”œβ”€β”€ notebooks/

β”œβ”€β”€ src/

β”‚ β”œβ”€β”€ data/

β”‚ β”œβ”€β”€ features/

β”‚ β”œβ”€β”€ models/

β”‚ └── utils/

β”œβ”€β”€ tests/

β”œβ”€β”€ configs/

β”œβ”€β”€ requirements.txt

└── README.md

Every subdirectory under src/ must contain an __init__.py file so that Python recognises it as a package.

requirements.txt must list the following dependencies, one per line: scikit-learn, pandas, numpy, and mlflow. The canonical PyPI name for the scikit-learn package is scikit-learn.

README.md must begin with the heading # fraud-detection.

Review the existing project and correct everything that does not match the requirements above.

🧭 Part 1: Lab Step-by-Step Guidelines

Run the following commands on the controlplane host.

Step 1 β€” Move into the project directory

cd /root/code/fraud-detection

Step 2 β€” Inspect the current structure

tree

If tree is unavailable:

sudo apt update && sudo apt install tree

Step 3 β€” Check the required directory structure

Run:

tree
mv src/feature src/features
mv src/util src/utils
mkdir -p data/raw
mkdir -p data/processed
mkdir -p tests
mkdir -p configs

Check:

ls src/features
ls src/utils

You should see:

init.py

Step 4 β€” Verify and fix requirements.txt

Create/update the file:

cat > requirements.txt

Output:

sklearn
pandas
numpy

Create the correct requirements.txt:

cat > requirements.txt <<EOF
scikit-learn
pandas
numpy
mlflow
EOF

Create/update the README:

cat README.md

Output:


ML project for fraud detection at xFusionCorp Industries.

Replace the README

Run:

cat > README.md <<EOF

ML project for fraud detection at xFusionCorp Industries.
EOF

Step 7 β€” Verify the final structure and README.md content

Run:

tree
cat README.md

Expected structure:

fraud-detection/

β”œβ”€β”€ data/

β”‚ β”œβ”€β”€ raw/

β”‚ └── processed/

β”œβ”€β”€ models/

β”œβ”€β”€ notebooks/

β”œβ”€β”€ src/

β”‚ β”œβ”€β”€ data/

β”‚ β”‚ └── init.py

β”‚ β”œβ”€β”€ features/

β”‚ β”‚ └── init.py

β”‚ β”œβ”€β”€ models/

β”‚ β”‚ └── init.py

β”‚ └── utils/

β”‚ └── init.py

β”œβ”€β”€ tests/

β”œβ”€β”€ configs/

β”œβ”€β”€ requirements.txt

└── README.md

root@controlplane ~/code/fraud-detection via 🐍 v3.12.3 ➜  cat README.md 

ML project for fraud detection at xFusionCorp Industries.

🧠 Part 2: Simple Beginner-Friendly Explanation

This lab focuses on organising a machine learning project according to the xFusionCorp Industries standard structure.

The goal is to:

standardise project layout

improve maintainability

make collaboration easier for developers and data scientists

You must inspect the existing project and correct anything that does not match the required structure.

Understanding the Required Project Structure

The final project must look exactly like this:

fraud-detection/

β”œβ”€β”€ data/

β”‚ β”œβ”€β”€ raw/

β”‚ └── processed/

β”œβ”€β”€ models/

β”œβ”€β”€ notebooks/

β”œβ”€β”€ src/

β”‚ β”œβ”€β”€ data/

β”‚ β”œβ”€β”€ features/

β”‚ β”œβ”€β”€ models/

β”‚ └── utils/

β”œβ”€β”€ tests/

β”œβ”€β”€ configs/

β”œβ”€β”€ requirements.txt

└── README.md

Each folder has a specific purpose in an ML workflow.

Purpose of Each Directory

data/

Stores datasets used in the project.

data/raw/

Contains original unmodified data.

Example:

transactions.csv

data/processed/

Contains cleaned or transformed datasets used for training.

Example:

clean_transactions.csv

models/

Stores trained machine learning models.

Example:

fraud_model.pkl

notebooks/

Contains Jupyter notebooks for experimentation and analysis.

Example:

eda.ipynb

src/

Contains the main Python source code for the application.

This keeps project logic organised and modular.

Why init.py Files Are Required

Every subdirectory under src/ must contain:

init.py

This tells Python:

β€œTreat this directory as a Python package.”

Without these files:

imports may fail

modules may not be recognised correctly

Example:

from src.models.train import train_model

Purpose of src/ Subdirectories

Directory Purpose

src/data Data and preprocessing

src/features Feature engineering logic

src/models Training and prediction code

src/utils Helper functions and utilities

Why requirements.txt Matters

The lab requires the following dependencies:

scikit-learn

pandas

numpy

mlflow

This file helps developers install all required Python packages consistently.

Important note:

the correct PyPI package name is scikit-learn

not sklearn

Why README.md Matters

The README file provides project documentation.

The lab specifically requires it to begin with:

This acts as the project title and ensures naming consistency.

Why Exact Naming Is Important

Lab validators check:

exact folder names

exact file names

exact dependency names

Even small differences such as:

feature instead of features

util instead of utils

can cause the lab to fail.

── more in #machine-learning 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/day-4-create-a-stand…] indexed:0 read:4min 2026-06-03 Β· β€”