Visual Intelligence
Lane Detection Studio // Instance Segmentation
Lane Detection Studio implements the LaneNet architecture from the IEEE IV 2018 paper "Towards End-to-End Lane Detection: an Instance Segmentation Approach." Unlike classical methods relying on hand-crafted edge filters or polynomial fitting, this system treats lane detection as a pixel-wise instance segmentation problem — allowing it to generalise across lighting conditions, occlusion, road markings, and complex multi-lane scenarios.
The system processes a single RGB image through a shared ENet encoder and produces two simultaneous outputs: a binary segmentation mask identifying all lane pixels, and an instance embedding map that colours each individual lane uniquely. Both are returned as base64-encoded images via a Flask REST API protected by a threading lock to prevent concurrent GPU conflicts.
Training is performed on the TuSimple benchmark dataset using a combined multi-task loss: Focal Loss for binary detection and Discriminative Loss for instance separation. Pre-trained weights are loaded from log/best_model.pth.
Why Instance Segmentation?
Binary segmentation alone cannot distinguish between lanes — it only tells you where a lane pixel is, not which lane it belongs to. Instance segmentation goes further: each lane is assigned a unique identity in embedding space, enabling the system to count lanes, track lane changes, and support downstream path planning in autonomous systems.
- Binary segmentation alone cannot differentiate lane 1 from lane 2
- Instance embeddings cluster same-lane pixels and push apart different lanes
- No explicit instance labels required — the discriminative loss is self-supervised in structure
- Richer output than bounding boxes: pixel-perfect lane boundaries
Core Capabilities
- Photo mode: single-image lane detection with overlay comparison
- Video mode: full video processing with frame-by-frame inference
- Live mode: real-time webcam detection with FPS display
- Multiple backbone options: ENet (fast), U-Net (balanced), DeepLabv3+ (accurate)
- REST API endpoint at
/predict for programmatic integration
- Docker-ready deployment for cloud and edge environments
The Blueprint
Shared Encoder // Dual Decoder Architecture
LaneNet's core insight is parameter sharing. Rather than running two separate models for binary and instance tasks, a single encoder extracts features once. Two lightweight decoders then specialise independently — one for semantic lane classification, one for instance-level embedding. This halves inference cost while preserving full output fidelity.
The input is an RGB image resized to 512 × 256 pixels, normalised using ImageNet statistics (mean [0.485, 0.456, 0.406], std [0.229, 0.224, 0.225]). The binary decoder outputs a 2-channel logit map; the instance decoder outputs a 3-channel embedding space visualised as an RGB colour map.
| Component | Details |
| Encoder | Shared ENet (or U-Net / DeepLabv3+ backbone) |
| Binary Decoder | 2-channel output — lane vs. background |
| Instance Decoder | 3-channel embedding — unique colour per lane |
| Loss (Binary) | Focal Loss — handles class imbalance (<5% lane pixels) |
| Loss (Instance) | Discriminative Loss — variance + distance + regularisation |
| Input Size | 512 × 256 px (resized from original resolution) |
| Output | Binary mask (2ch) + Instance RGB map (3ch) |
Supported Backbones
ENet — Default
Designed for real-time semantic segmentation on resource-constrained hardware. Uses InitialBlock with parallel max-pool and convolution, followed by BottleneckModules with dilated and asymmetric convolutions. PReLU activations and BatchNorm2d throughout. At ~3M parameters and 25ms inference, it is the recommended backbone for deployment.
3M Params · 25ms · Real-time
U-Net — Balanced
Classic encoder-decoder with skip connections. 5-level hierarchy with DoubleConv blocks (Conv2d + BatchNorm + ReLU, repeated twice). Progressive channel expansion: 64→128→256→512→1024. Skip connections concatenate encoder feature maps to decoder, preserving fine-grained spatial detail lost during downsampling.
30M Params · 45ms · Balanced
DeepLabv3+ — Premium
ResNet-101 backbone with Atrous Spatial Pyramid Pooling (ASPP). Five parallel branches with dilation rates 1, 6, 12, 18 plus global average pooling capture multi-scale context without spatial resolution loss. A shortcut connection from low-level features (48 channels) aids boundary precision. Best accuracy for complex scenes.
50M Params · 80ms · Max Accuracy
Training the Model
Loss Functions · Dataset · Optimisation Strategy
The model is trained end-to-end with a combined multi-task loss. Binary detection is weighted 10× heavier than instance separation, reflecting the primacy of correctly identifying lane pixels before differentiating them. The full formula is:
Total Loss = 10 × FocalLoss + 0.3 × L_var + 1.0 × L_dist
Dataset — TuSimple Benchmark
TuSimple provides highway driving sequences at 1280×720 resolution with JSON-format lane annotations. The tusimple_transform.py script converts these to pixel-level binary masks (lane vs. background) and instance masks (unique colour per lane). Images are resized to 512×256 for training. Augmentation applies ColorJitter (brightness, contrast, saturation, hue each ±0.1) to training samples only, improving robustness to lighting variation.
| Parameter | Value |
| Batch Size | 32 (default) |
| Optimiser | Adam or SGD with step/exponential LR decay |
| Learning Rate | 0.001 (default) |
| Weight Decay | 1e-4 |
| Epochs | Configurable (100 recommended) |
| Validation Split | 10% |
| Input Normalisation | ImageNet mean/std |
| Device | CUDA GPU (CPU fallback) |
Inference Pipeline
From Browser Upload to JSON Response
Every prediction request travels through a six-stage pipeline. A threading lock at stage 2 ensures that on single-GPU deployments, only one forward pass runs at a time — preventing out-of-memory errors under concurrent load.
1
Upload
User submits a multipart form to POST /predict. Flask reads the file from request.files["image"] and opens it via PIL.
2
Acquire Inference Lock
INFERENCE_LOCK (threading.Lock) is acquired. Any concurrent request blocks here until the current forward pass completes and the lock is released.
3
Preprocess
Image resized to 512×256. ToTensor() converts PIL to float32 in [0,1]. Normalize(mean, std) applies ImageNet statistics. Tensor moved to CUDA if available, else CPU.
4
Forward Pass
LaneNet's shared encoder processes the tensor. Binary decoder produces 2-channel logits. Instance decoder produces 3-channel embeddings. Weights loaded from log/best_model.pth.
5
Post-process
Argmax over binary logits yields the lane mask. Instance embeddings are colour-mapped: pixels belonging to the same lane cluster share a colour, different lanes get different colours.
6
Encode and Respond
Both output images are JPEG-compressed and base64-encoded. Returned as {"binary_image": "...", "instance_image": "..."}. Frontend decodes and renders inline without a page reload.
Deployment Options
Local · Docker · AWS · Kubernetes
The Flask application exposes a single REST endpoint at POST /predict. It accepts a multipart image upload and returns a JSON payload with two base64-encoded output images. Any of the four deployment strategies below will serve this endpoint.
Option 1 — Local Development
pip install -r requirements.txt
python app.py
# Access at http://localhost:5000
Option 2 — Docker
docker build -t lane-detection:latest .
docker run -p 5000:5000 lane-detection:latest
# GPU support:
docker run --gpus all -p 5000:5000 lane-detection:latest
Option 3 — AWS EC2
Recommended instance: g4dn.xlarge (NVIDIA T4 GPU). Use Gunicorn for production-grade concurrency:
git clone https://github.com/Siddh-456/Lane-Detection.git
cd Lane-Detection && pip install -r requirements.txt
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app
Option 4 — Kubernetes
kubectl apply -f k8s/deployment.yaml
kubectl expose deployment lane-detection \
--type=LoadBalancer --port=80 --target-port=5000
References
Papers this project builds upon
[1]
Towards End-to-End Lane Detection: an Instance Segmentation Approach
Neven, De Brabandere, Georgoulis, Proesmans, Van Gool · IEEE IV 2018 · arXiv:1802.05591
[2]
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
Paszke, Chaurasia, Kim, Culurciello · arXiv:1606.02147 · 2016
[3]
Semantic Instance Segmentation with a Discriminative Loss Function
De Brabandere, Neven, Van Gool · arXiv:1708.02551 · 2017
[4]
Focal Loss for Dense Object Detection
Lin, Goyal, Girshick, He, Dollár · ICCV 2017 · arXiv:1708.02051
[5]
DeepLabv3+: Encoder-Decoder with Atrous Separable Convolution
Chen, Zhu, Papandreou, Schroff, Adam · ECCV 2018 · arXiv:1802.02611
Tech Stack: PyTorch 1.2+ · Torchvision · Flask · OpenCV · Pillow · NumPy · Pandas · scikit-image · Base64 · threading.Lock · Docker · TuSimple Dataset
Training the Model
Loss functions, datasets, and optimisation strategy.
The model is trained end-to-end with a combined multi-task loss that emphasizes binary lane detection. Total Loss = 10 × Binary Loss + (0.3 × Var Loss + 1.0 × Dist Loss).
Dataset Preparation
TuSimple Dataset: RGB images resized from 1280×720 to 512×256.
Augmentation: ColorJitter (brightness, contrast, saturation, hue each ±0.1) for training stability.
| Parameter |
Value |
| Batch Size |
Default 32 |
| Optimizer |
Adam or SGD |
| Input Size |
512 × 256 |
| Device |
GPU (CUDA) or CPU |
Deployment Options
Scale your implementation from Local to Kubernetes.
Option 1: Local Development
pip install -r requirements.txt
python app.py
Option 2: Docker Delivery
# Build
docker build -t lane-detection:latest .
# Run
docker run -p 5000:5000 lane-detection:latest
Option 3: AWS / EC2
Recommended: g4dn.xlarge for GPU acceleration. Run with Gunicorn for production traffic:
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app
Option 4: Kubernetes
kubectl apply -f k8s/deployment.yaml
kubectl expose deployment lane-detection --type=LoadBalancer --port=80 --target-port=5000