Got it 👍 You want a realistic end-to-end project that covers:

  1. Fetch messy dataset from a free public API (CSV/Excel).
  2. Clean data → focus on string cleaning & manipulation (interview-useful).
  3. Store / serve cleaned data.
  4. Build a dashboard (Flask / FastAPI recommended) → interactive tables/graphs.
  5. Production-grade → modular, structured, easy to run locally.

🔹 Suggested Project: “World Population & Country Data Dashboard”


🔹 Project Workflow

Step 1: Fetch Data

  • Use requests to call the API.
  • Save JSON → convert into pandas DataFrame.
  • Alternatively, download CSV from web.

Step 2: Data Cleaning (String Manipulation)

  • Standardize country names (strip spaces, title case).
  • Handle missing values (fillna, dropna).
  • Extract numeric parts from messy fields (e.g., population, area).
  • Split / join fields (e.g., capital cities).
  • Create derived columns (continent short codes, name lengths).

Step 3: Store Clean Data

  • Save cleaned data to SQLite (portable for your laptop).
  • Or keep as cleaned CSV/Parquet.

Step 4: Build Dashboard

  • Use Flask (simple) or FastAPI + Jinja2 + Bootstrap (recommended).
  • Pages:
    • Home → Summary stats (population, area).
    • Search → Query countries.
    • Charts → Plot population by continent (using Plotly/Matplotlib).

Step 5: Productionize

Modular structure: string_project/ ├── app.py # Flask app ├── data_fetch.py # API/CSV fetcher ├── data_clean.py # String cleaning functions ├── models.py # SQLite DB helper ├── static/ # CSS/JS ├── templates/ # Jinja2 HTML templates └── requirements.txt

requirements.txt flask pandas requests plotly sqlalchemy


🔹 Sample End-to-End Script (minimal but extendable)

# app.py
from flask import Flask, render_template, request
import pandas as pd
import requests

app = Flask(__name__)

DATA_URL = "https://restcountries.com/v3.1/all"

def fetch_and_clean():
    # Step 1: Fetch
    res = requests.get(DATA_URL)
    countries = res.json()

    # Step 2: Normalize into DataFrame
    df = pd.json_normalize(countries)

    # Step 3: String cleaning
    df['name.common'] = df['name.common'].str.strip().str.title()
    df['region'] = df['region'].fillna("Unknown").str.upper()
    df['capital'] = df['capital'].astype(str).str.replace(r"[\[\]']", "", regex=True)

    # Derived column
    df['name_length'] = df['name.common'].str.len()

    return df[['name.common', 'region', 'capital', 'population', 'area', 'name_length']]

@app.route("/")
def home():
    df = fetch_and_clean()
    summary = {
        "total_countries": df.shape[0],
        "total_population": df["population"].sum(),
        "largest_country": df.loc[df["area"].idxmax(), "name.common"]
    }
    return render_template("home.html", summary=summary)

@app.route("/countries")
def countries():
    query = request.args.get("q", "")
    df = fetch_and_clean()
    if query:
        df = df[df['name.common'].str.contains(query, case=False)]
    return render_template("countries.html", tables=df.to_html(classes="table table-striped"), query=query)

if __name__ == "__main__":
    app.run(debug=True)

🔹 Example Templates (Jinja2)

templates/home.html

<!DOCTYPE html>
<html>
<head>
  <title>Country Dashboard</title>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css">
</head>
<body class="container mt-4">
  <h1>🌍 Country Dashboard</h1>
  <p>Total Countries: {{ summary.total_countries }}</p>
  <p>Total Population: {{ summary.total_population }}</p>
  <p>Largest Country (Area): {{ summary.largest_country }}</p>
  <a href="/countries">Browse Countries</a>
</body>
</html>

templates/countries.html

<!DOCTYPE html>
<html>
<head>
  <title>Countries</title>
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.0/dist/css/bootstrap.min.css">
</head>
<body class="container mt-4">
  <h2>Countries</h2>
  <form method="get">
    <input type="text" name="q" value="{{ query }}" placeholder="Search by name">
    <button type="submit" class="btn btn-primary btn-sm">Search</button>
  </form>
  <div class="mt-3">
    {{ tables | safe }}
  </div>
</body>
</html>

Pages: 1 2 3 4 5 6


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Posted in

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading