Lab 1.2 — UV CLI Tool + LaTeX Docs PDF on GitHub Pages
A command-line tool published via UV that anyone can run with uvx your-tool, plus a professional PDF documentation file generated with LaTeX + pandoc, deployed to GitHub Pages along with a Docusaurus-style HTML site.
Time: 60–90 minutes. Difficulty: ⭐⭐⭐☆☆. Ship: a live GitHub Pages URL with your docs + downloadable PDF.
What the Finished Thing Looks Like
By the end:
uvx tds-csv-YOURNAME sample.csv --top 5
# ┌──────┬────────┐
# │ City │ Count │
# ├──────┼────────┤
# │ ... │ ... │
# └──────┴────────┘
And https://<username>.github.io/tds-csv-YOURNAME/ shows your documentation site with a Download PDF button.
Prerequisites
- Completed Lab 1.1 (at least through Step 6) — you understand UV + pyproject.toml.
- LaTeX + pandoc available locally (see latex.mdx). GitHub Actions has both preinstalled — so local is optional.
- GitHub Pages enabled on your account.
The Steps
Step 1 — Plan the CLI
Our CLI will be tds-csv-YOURNAME. Features:
- Takes a CSV file path.
- Optionally filters to the top N rows by a given column.
- Pretty-prints as a table using
rich.
Usage: tds-csv [OPTIONS] FILE
Quickly explore a CSV file.
Options:
--top INTEGER Show top N rows [default: 10]
--by TEXT Sort by column (default: first column)
--help Show this message and exit.
Step 2 — Scaffold the project
uv init --app --python 3.13 tds-csv-YOURNAME
cd tds-csv-YOURNAME
We use --app (not --lib) because this is a CLI app. UV creates a single-module layout:
tds-csv-YOURNAME/
├── .gitignore
├── .python-version
├── README.md
├── main.py
└── pyproject.toml
Add dependencies:
uv add typer "rich>=13" pandas
uv add --dev pytest
Step 3 — Write the CLI
Rename main.py to cli.py and replace its contents:
"""tds-csv — quickly explore a CSV file."""
from pathlib import Path
from importlib.metadata import version as _v
import pandas as pd
import typer
from rich.console import Console
from rich.table import Table
__version__ = _v("tds-csv-YOURNAME")
app = typer.Typer(
name="tds-csv",
help="Quickly explore a CSV file.",
add_completion=False,
)
console = Console()
def _render(df: pd.DataFrame, title: str) -> None:
table = Table(title=title, show_lines=True)
for col in df.columns:
table.add_column(str(col), style="cyan")
for _, row in df.iterrows():
table.add_row(*[str(v) for v in row])
console.print(table)
@app.command()
def main(
file: Path = typer.Argument(..., exists=True, readable=True, help="CSV file to read."),
top: int = typer.Option(10, help="Show top N rows."),
by: str | None = typer.Option(None, help="Sort by column (default: first column)."),
version: bool = typer.Option(False, "--version", help="Show version and exit."),
) -> None:
"""Render a CSV file as a pretty table."""
if version:
console.print(f"tds-csv v{__version__}")
raise typer.Exit()
df = pd.read_csv(file)
sort_col = by or df.columns[0]
if sort_col not in df.columns:
console.print(f"[red]Column '{sort_col}' not in CSV[/red]")
raise typer.Exit(code=1)
df = df.sort_values(by=sort_col, ascending=False).head(top)
_render(df, f"{file.name} — top {top} by {sort_col}")
if __name__ == "__main__":
app()
Step 4 — Wire it up as a CLI entry point
Edit pyproject.toml to expose tds-csv as an entry point:
[project.scripts]
tds-csv = "cli:app"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
Full file should look like:
[project]
name = "tds-csv-YOURNAME"
version = "0.1.0"
description = "Quickly explore a CSV file from the command line."
readme = "README.md"
license = "MIT"
requires-python = ">=3.11"
authors = [{ name = "Your Name", email = "you@example.com" }]
dependencies = [
"typer",
"rich>=13",
"pandas",
]
[project.scripts]
tds-csv = "cli:app"
[project.urls]
Homepage = "https://github.com/YOUR-USERNAME/tds-csv-YOURNAME"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[dependency-groups]
dev = ["pytest>=8"]
[tool.hatch.build.targets.wheel]
packages = ["."]
include = ["cli.py"]
Test it locally:
uv sync
uv run tds-csv --help
You should see the Typer help output.
Step 5 — Create a sample CSV and try it
cat > sample.csv <<'EOF'
city,population,state
Chennai,7088000,Tamil Nadu
Mumbai,20411000,Maharashtra
Bangalore,8443000,Karnataka
Hyderabad,6809000,Telangana
Pune,3124000,Maharashtra
Kolkata,14850000,West Bengal
Delhi,28514000,Delhi
EOF
uv run tds-csv sample.csv --top 5 --by population
You should see a nicely formatted table sorted by population.
Step 6 — Try it as a one-shot tool (uvx)
Build the wheel and run it from an ephemeral env:
uv build
# Run without installing globally
uv tool run --from "./dist/tds_csv_YOURNAME-0.1.0-py3-none-any.whl" tds-csv sample.csv --top 3
# or the short form:
uvx --from ./dist/*.whl tds-csv sample.csv --top 3
Later (after publishing to PyPI), anyone can uvx tds-csv-YOURNAME sample.csv.
Step 7 — Write Markdown documentation
Create a docs/ folder and put your documentation in Markdown:
mkdir docs
---
title: tds-csv — User Guide
author: Your Name
date: 2026-05-10
---
# tds-csv
**tds-csv** is a tiny CLI for quickly exploring CSV files. Built for the
*Tools in Data Science* course at IIT Madras, May 2026.
## Installation
```bash
uvx tds-csv-YOURNAME --help
Or install globally:
uv tool install tds-csv-YOURNAME
tds-csv --help
Usage
Show the top 10 rows
tds-csv sample.csv
Sort by a specific column
tds-csv sample.csv --by population --top 5
How It Works
The tool:
- Reads the CSV with
pandas.read_csv. - Sorts by the chosen column (defaulting to the first column).
- Takes the top N rows.
- Renders them with
richas a Unicode table.
Architecture
The formula for text-to-digital transformation in our case is:
output = Render(SortBy_col(Read(csv))[:N])
License
MIT — see the LICENSE file.
</details>
<details>
<summary><b>Step 8 — Build the PDF with pandoc + LaTeX</b></summary>
Create a pandoc template for nicer PDF output:
```latex title="docs/template.tex"
\documentclass[11pt,a4paper]{article}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\usepackage{graphicx}
\usepackage{fancyhdr}
\usepackage{amsmath}
\usepackage{xcolor}
\definecolor{tdsblue}{RGB}{79,70,229}
\hypersetup{
colorlinks=true,
linkcolor=tdsblue,
urlcolor=tdsblue
}
\pagestyle{fancy}
\fancyhf{}
\lhead{$title$}
\rhead{$date$}
\cfoot{\thepage}
\title{\textcolor{tdsblue}{$title$}}
\author{$author$}
\date{$date$}
\begin{document}
\maketitle
\tableofcontents
\newpage
$body$
\end{document}
Locally test:
pandoc docs/index.md -o docs/tds-csv.pdf \
--template=docs/template.tex \
--pdf-engine=xelatex \
--toc \
--number-sections \
--highlight-style=tango
Open docs/tds-csv.pdf — you should have a beautifully typeset document with a cover page, TOC, and code highlighting.
Skip this local build and let GitHub Actions do it (Step 11). The Action's Ubuntu runner has pandoc + texlive pre-installable.
Step 9 — Build a Docusaurus site for the HTML docs
You have two choices:
- Option A (quick) — use plain HTML or mdBook. Small output, minutes to set up.
- Option B (professional) — use Docusaurus like the TDS course itself. Takes 10 minutes but matches the course pattern.
We'll go with Option B — Docusaurus. Initialize:
# in the repo root
npx create-docusaurus@latest site classic --typescript
This creates a site/ folder. Move your documentation in and delete the default content:
rm -rf site/docs/* site/blog
cp docs/index.md site/docs/intro.md
Edit site/docusaurus.config.ts — set url, baseUrl, organizationName, projectName:
const SITE_URL = process.env.SITE_URL ?? 'https://YOUR-USERNAME.github.io';
const BASE_URL = process.env.BASE_URL ?? '/tds-csv-YOURNAME/';
const config = {
title: 'tds-csv',
tagline: 'Quickly explore any CSV',
url: SITE_URL,
baseUrl: BASE_URL,
organizationName: 'YOUR-USERNAME',
projectName: 'tds-csv-YOURNAME',
// ... (rest of defaults)
};
Test the dev server:
cd site
npm install
npm run start # opens http://localhost:3000
Stop the dev server (Ctrl+C) and do a production build:
npm run build # outputs to site/build
Step 10 — Link the PDF from the site
Docusaurus serves anything under site/static/ as a top-level file. Copy the PDF there:
mkdir -p site/static/downloads
cp docs/tds-csv.pdf site/static/downloads/
Reference it in site/docs/intro.md:
[📄 Download the full PDF manual](/downloads/tds-csv.pdf)
Step 11 — Write the GitHub Actions deploy workflow
This workflow rebuilds the PDF on every push and deploys the site to GitHub Pages.
name: Deploy Docs
on:
push:
branches: [main]
workflow_dispatch:
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: true
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# --- PDF step ---
- name: Install pandoc + texlive
run: |
sudo apt-get update
sudo apt-get install -y pandoc texlive-xetex texlive-fonts-recommended texlive-latex-extra
- name: Build PDF
run: |
mkdir -p site/static/downloads
pandoc docs/index.md -o site/static/downloads/tds-csv.pdf \
--template=docs/template.tex \
--pdf-engine=xelatex \
--toc \
--number-sections \
--highlight-style=tango
# --- Site step ---
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: '24'
cache: 'npm'
cache-dependency-path: site/package-lock.json
- name: Install site deps
working-directory: site
run: npm ci
- name: Build site
working-directory: site
env:
SITE_URL: https://${{ github.repository_owner }}.github.io
BASE_URL: /${{ github.event.repository.name }}/
run: npm run build
- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v3
with:
path: site/build
deploy:
needs: build
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
Step 12 — Commit and deploy
Enable Pages on the repo: Settings → Pages → Source: GitHub Actions.
git add .
git commit -m "feat: initial site + PDF docs pipeline"
git push
Watch the Actions tab. The job takes ~3 minutes (pandoc + texlive install is the slow part). When it finishes green, open the URL shown in the deploy step.
You should see:
- A Docusaurus HTML site at
https://<username>.github.io/tds-csv-YOURNAME/ - A Download PDF link inside it that serves your rendered PDF
Step 13 — Speed up the Action by caching texlive (optional but nice)
Installing TeX Live takes ~90 seconds. You can cache it:
- name: Cache pandoc + texlive
uses: actions/cache@v4
id: cache-tex
with:
path: /usr/share/texlive
key: texlive-${{ runner.os }}-v1
- name: Install pandoc + texlive
if: steps.cache-tex.outputs.cache-hit != 'true'
run: |
sudo apt-get update
sudo apt-get install -y pandoc texlive-xetex ...
This only helps on re-runs — first build is unchanged.
Step 14 — Publish the CLI to PyPI too
Same process as Lab 1.1: add a .github/workflows/release.yml that triggers on v* tags, publishes via Trusted Publishing. Once done, anyone in the world can uvx tds-csv-YOURNAME my.csv.
Troubleshooting
pandoc "File not found" for template.tex
Your working directory in the Action matters. The Build PDF step runs from the repo root, so docs/template.tex is correct. If you moved things, update the path.
LaTeX error about missing packages
The texlive-fonts-recommended texlive-latex-extra packages cover most needs. If your template uses something exotic, add more packages:
sudo apt-get install -y texlive-science texlive-pictures
Docusaurus build fails with "broken link"
Docusaurus is strict about broken links. Either fix the link or set onBrokenLinks: 'warn' in docusaurus.config.ts.
Site renders at wrong URL (404s on CSS)
Your baseUrl is wrong. For a project site at user.github.io/repo/, baseUrl must be '/repo/' — with the trailing slash.
What You've Learned
- Turning UV-managed code into an installable CLI via
[project.scripts]. - Using
uvxto run tools in ephemeral environments. - Authoring documentation in Markdown and rendering it to a professional PDF with pandoc + custom LaTeX template.
- Hosting a Docusaurus site on GitHub Pages with Actions.
- Combining two build outputs (site + PDF) in a single deploy pipeline.
Write a Blog Post
- Compare pandoc with just writing
.texby hand — pros and cons. - Explain the Docusaurus
baseUrlgotcha. - Show off your deployed URL!
Next Lab
Lab 1.3 — Bash automation: daily project summary