Lab 1.2 — UV CLI Tool + LaTeX Docs PDF on GitHub Pages

What you'll build

A command-line tool published via UV that anyone can run with uvx your-tool, plus a professional PDF documentation file generated with LaTeX + pandoc, deployed to GitHub Pages along with a Docusaurus-style HTML site.

Time: 60–90 minutes. Difficulty: ⭐⭐⭐☆☆. Ship: a live GitHub Pages URL with your docs + downloadable PDF.

What the Finished Thing Looks Like

By the end:

bash

uvx tds-csv-YOURNAME sample.csv --top 5
# ┌──────┬────────┐
# │ City │ Count  │
# ├──────┼────────┤
# │ ...  │ ...    │
# └──────┴────────┘

And https://<username>.github.io/tds-csv-YOURNAME/ shows your documentation site with a Download PDF button.

Prerequisites

Completed Lab 1.1 (at least through Step 6) — you understand UV + pyproject.toml.
LaTeX + pandoc available locally (see latex.mdx). GitHub Actions has both preinstalled — so local is optional.
GitHub Pages enabled on your account.

The Steps

Step 1 — Plan the CLI

Our CLI will be tds-csv-YOURNAME. Features:

Takes a CSV file path.
Optionally filters to the top N rows by a given column.
Pretty-prints as a table using rich.

code

Usage: tds-csv [OPTIONS] FILE

  Quickly explore a CSV file.

Options:
  --top INTEGER           Show top N rows [default: 10]
  --by TEXT               Sort by column (default: first column)
  --help                  Show this message and exit.

Step 2 — Scaffold the project

bash

uv init --app --python 3.13 tds-csv-YOURNAME
cd tds-csv-YOURNAME

We use --app (not --lib) because this is a CLI app. UV creates a single-module layout:

code

tds-csv-YOURNAME/
├── .gitignore
├── .python-version
├── README.md
├── main.py
└── pyproject.toml

Add dependencies:

bash

uv add typer "rich>=13" pandas
uv add --dev pytest

Step 3 — Write the CLI

Rename main.py to cli.py and replace its contents:

python

cli.py
"""tds-csv — quickly explore a CSV file."""

from pathlib import Path
from importlib.metadata import version as _v

import pandas as pd
import typer
from rich.console import Console
from rich.table import Table

__version__ = _v("tds-csv-YOURNAME")

app = typer.Typer(
    name="tds-csv",
    help="Quickly explore a CSV file.",
    add_completion=False,
)
console = Console()


def _render(df: pd.DataFrame, title: str) -> None:
    table = Table(title=title, show_lines=True)
    for col in df.columns:
        table.add_column(str(col), style="cyan")
    for _, row in df.iterrows():
        table.add_row(*[str(v) for v in row])
    console.print(table)


@app.command()
def main(
    file: Path = typer.Argument(..., exists=True, readable=True, help="CSV file to read."),
    top: int = typer.Option(10, help="Show top N rows."),
    by: str | None = typer.Option(None, help="Sort by column (default: first column)."),
    version: bool = typer.Option(False, "--version", help="Show version and exit."),
) -> None:
    """Render a CSV file as a pretty table."""
    if version:
        console.print(f"tds-csv v{__version__}")
        raise typer.Exit()

    df = pd.read_csv(file)
    sort_col = by or df.columns[0]
    if sort_col not in df.columns:
        console.print(f"[red]Column '{sort_col}' not in CSV[/red]")
        raise typer.Exit(code=1)
    df = df.sort_values(by=sort_col, ascending=False).head(top)
    _render(df, f"{file.name} — top {top} by {sort_col}")


if __name__ == "__main__":
    app()

Step 4 — Wire it up as a CLI entry point

Edit pyproject.toml to expose tds-csv as an entry point:

toml

pyproject.toml (add this section)
[project.scripts]
tds-csv = "cli:app"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Full file should look like:

toml

pyproject.toml
[project]
name = "tds-csv-YOURNAME"
version = "0.1.0"
description = "Quickly explore a CSV file from the command line."
readme = "README.md"
license = "MIT"
requires-python = ">=3.11"
authors = [{ name = "Your Name", email = "you@example.com" }]
dependencies = [
    "typer",
    "rich>=13",
    "pandas",
]

[project.scripts]
tds-csv = "cli:app"

[project.urls]
Homepage = "https://github.com/YOUR-USERNAME/tds-csv-YOURNAME"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[dependency-groups]
dev = ["pytest>=8"]

[tool.hatch.build.targets.wheel]
packages = ["."]
include = ["cli.py"]

Test it locally:

bash

uv sync
uv run tds-csv --help

You should see the Typer help output.

Step 5 — Create a sample CSV and try it

bash

cat > sample.csv <<'EOF'
city,population,state
Chennai,7088000,Tamil Nadu
Mumbai,20411000,Maharashtra
Bangalore,8443000,Karnataka
Hyderabad,6809000,Telangana
Pune,3124000,Maharashtra
Kolkata,14850000,West Bengal
Delhi,28514000,Delhi
EOF

uv run tds-csv sample.csv --top 5 --by population

You should see a nicely formatted table sorted by population.

Step 6 — Try it as a one-shot tool (uvx)

Build the wheel and run it from an ephemeral env:

bash

uv build

# Run without installing globally
uv tool run --from "./dist/tds_csv_YOURNAME-0.1.0-py3-none-any.whl" tds-csv sample.csv --top 3
# or the short form:
uvx --from ./dist/*.whl tds-csv sample.csv --top 3

Later (after publishing to PyPI), anyone can uvx tds-csv-YOURNAME sample.csv.

Step 7 — Write Markdown documentation

Create a docs/ folder and put your documentation in Markdown:

bash

mkdir docs

markdown

docs/index.md
---
title: tds-csv — User Guide
author: Your Name
date: 2026-05-10
---

# tds-csv

**tds-csv** is a tiny CLI for quickly exploring CSV files. Built for the
*Tools in Data Science* course at IIT Madras, May 2026.

## Installation

```bash
uvx tds-csv-YOURNAME --help

Or install globally:

bash

uv tool install tds-csv-YOURNAME
tds-csv --help

Usage

Show the top 10 rows

bash

tds-csv sample.csv

Sort by a specific column

bash

tds-csv sample.csv --by population --top 5

How It Works

The tool:

Reads the CSV with pandas.read_csv.
Sorts by the chosen column (defaulting to the first column).
Takes the top N rows.
Renders them with rich as a Unicode table.

Architecture

The formula for text-to-digital transformation in our case is:

output = Render(SortBy_col(Read(csv))[:N])

License

MIT — see the LICENSE file.

code

</details>

<details>
<summary><b>Step 8 — Build the PDF with pandoc + LaTeX</b></summary>

Create a pandoc template for nicer PDF output:

```latex title="docs/template.tex"
\documentclass[11pt,a4paper]{article}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{fancyhdr}
\usepackage{amsmath}
\usepackage{xcolor}

\definecolor{tdsblue}{RGB}{79,70,229}
\hypersetup{
  colorlinks=true,
  linkcolor=tdsblue,
  urlcolor=tdsblue
}

\pagestyle{fancy}
\fancyhf{}
\lhead{$title$}
\rhead{$date$}
\cfoot{\thepage}

\title{\textcolor{tdsblue}{$title$}}
\author{$author$}
\date{$date$}

\begin{document}
\maketitle
\tableofcontents
\newpage

$body$

\end{document}

Locally test:

bash

pandoc docs/index.md -o docs/tds-csv.pdf \
  --template=docs/template.tex \
  --pdf-engine=xelatex \
  --toc \
  --number-sections \
  --highlight-style=tango

Open docs/tds-csv.pdf — you should have a beautifully typeset document with a cover page, TOC, and code highlighting.

If pandoc isn't installed locally

Skip this local build and let GitHub Actions do it (Step 11). The Action's Ubuntu runner has pandoc + texlive pre-installable.

Step 9 — Build a Docusaurus site for the HTML docs

You have two choices:

Option A (quick) — use plain HTML or mdBook. Small output, minutes to set up.
Option B (professional) — use Docusaurus like the TDS course itself. Takes 10 minutes but matches the course pattern.

We'll go with Option B — Docusaurus. Initialize:

bash

# in the repo root
npx create-docusaurus@latest site classic --typescript

This creates a site/ folder. Move your documentation in and delete the default content:

bash

rm -rf site/docs/* site/blog
cp docs/index.md site/docs/intro.md

Edit site/docusaurus.config.ts — set url, baseUrl, organizationName, projectName:

site/docusaurus.config.ts
const SITE_URL = process.env.SITE_URL ?? 'https://YOUR-USERNAME.github.io';
const BASE_URL = process.env.BASE_URL ?? '/tds-csv-YOURNAME/';

const config = {
  title: 'tds-csv',
  tagline: 'Quickly explore any CSV',
  url: SITE_URL,
  baseUrl: BASE_URL,
  organizationName: 'YOUR-USERNAME',
  projectName: 'tds-csv-YOURNAME',
  // ... (rest of defaults)
};

Test the dev server:

bash

cd site
npm install
npm run start        # opens http://localhost:3000

Stop the dev server (Ctrl+C) and do a production build:

bash

npm run build        # outputs to site/build

Step 10 — Link the PDF from the site

Docusaurus serves anything under site/static/ as a top-level file. Copy the PDF there:

bash

mkdir -p site/static/downloads
cp docs/tds-csv.pdf site/static/downloads/

Reference it in site/docs/intro.md:

markdown

[📄 Download the full PDF manual](/downloads/tds-csv.pdf)

Step 11 — Write the GitHub Actions deploy workflow

This workflow rebuilds the PDF on every push and deploys the site to GitHub Pages.

yaml

.github/workflows/deploy.yml
name: Deploy Docs

on:
  push:
    branches: [main]
  workflow_dispatch:

permissions:
  contents: read
  pages: write
  id-token: write

concurrency:
  group: pages
  cancel-in-progress: true

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # --- PDF step ---
      - name: Install pandoc + texlive
        run: |
          sudo apt-get update
          sudo apt-get install -y pandoc texlive-xetex texlive-fonts-recommended texlive-latex-extra

      - name: Build PDF
        run: |
          mkdir -p site/static/downloads
          pandoc docs/index.md -o site/static/downloads/tds-csv.pdf \
            --template=docs/template.tex \
            --pdf-engine=xelatex \
            --toc \
            --number-sections \
            --highlight-style=tango

      # --- Site step ---
      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: '24'
          cache: 'npm'
          cache-dependency-path: site/package-lock.json

      - name: Install site deps
        working-directory: site
        run: npm ci

      - name: Build site
        working-directory: site
        env:
          SITE_URL: https://${{ github.repository_owner }}.github.io
          BASE_URL: /${{ github.event.repository.name }}/
        run: npm run build

      - name: Upload Pages artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: site/build

  deploy:
    needs: build
    runs-on: ubuntu-latest
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

Step 12 — Commit and deploy

Enable Pages on the repo: Settings → Pages → Source: GitHub Actions.

bash

git add .
git commit -m "feat: initial site + PDF docs pipeline"
git push

Watch the Actions tab. The job takes ~3 minutes (pandoc + texlive install is the slow part). When it finishes green, open the URL shown in the deploy step.

You should see:

A Docusaurus HTML site at https://<username>.github.io/tds-csv-YOURNAME/
A Download PDF link inside it that serves your rendered PDF

Step 13 — Speed up the Action by caching texlive (optional but nice)

Installing TeX Live takes ~90 seconds. You can cache it:

yaml

- name: Cache pandoc + texlive
  uses: actions/cache@v4
  id: cache-tex
  with:
    path: /usr/share/texlive
    key: texlive-${{ runner.os }}-v1

- name: Install pandoc + texlive
  if: steps.cache-tex.outputs.cache-hit != 'true'
  run: |
    sudo apt-get update
    sudo apt-get install -y pandoc texlive-xetex ...

This only helps on re-runs — first build is unchanged.

Step 14 — Publish the CLI to PyPI too

Same process as Lab 1.1: add a .github/workflows/release.yml that triggers on v* tags, publishes via Trusted Publishing. Once done, anyone in the world can uvx tds-csv-YOURNAME my.csv.

Troubleshooting

pandoc "File not found" for template.tex

Your working directory in the Action matters. The Build PDF step runs from the repo root, so docs/template.tex is correct. If you moved things, update the path.

LaTeX error about missing packages

The texlive-fonts-recommended texlive-latex-extra packages cover most needs. If your template uses something exotic, add more packages:

bash

sudo apt-get install -y texlive-science texlive-pictures

Docusaurus build fails with "broken link"

Docusaurus is strict about broken links. Either fix the link or set onBrokenLinks: 'warn' in docusaurus.config.ts.

Site renders at wrong URL (404s on CSS)

Your baseUrl is wrong. For a project site at user.github.io/repo/, baseUrl must be '/repo/' — with the trailing slash.

What You've Learned

Turning UV-managed code into an installable CLI via [project.scripts].
Using uvx to run tools in ephemeral environments.
Authoring documentation in Markdown and rendering it to a professional PDF with pandoc + custom LaTeX template.
Hosting a Docusaurus site on GitHub Pages with Actions.
Combining two build outputs (site + PDF) in a single deploy pipeline.

Lab 1.2 — UV CLI Tool + LaTeX Docs PDF on GitHub Pages

What the Finished Thing Looks Like

Prerequisites

The Steps

Usage

Show the top 10 rows

Sort by a specific column

How It Works

Architecture

License

Troubleshooting

What You've Learned

Write a Blog Post

Next Lab

References

What the Finished Thing Looks Like​

Prerequisites​

The Steps​

Usage​

Show the top 10 rows​

Sort by a specific column​

How It Works​

Architecture​

License​

Troubleshooting​

What You've Learned​

Write a Blog Post​

Next Lab​

References​

What the Finished Thing Looks Like

Prerequisites

The Steps

Usage

Show the top 10 rows

Sort by a specific column

How It Works

Architecture

License

Troubleshooting

What You've Learned

Write a Blog Post

Next Lab

References