see lane

Deep Learning · Autonomous Driving · Instance Segmentation

see lane

Experience the future of computer vision. A high-performance lane detection pipeline delivering surgical precision in real-time environments.

PyTorch 1.2+ Flask API ENet Backbone TuSimple Dataset Docker Ready

Select an image to begin.

Uploaded Image RAW

Your selected image
will appear here.

Binary Segmentation

FOCAL LOSS

Binary lane mask
will appear here.

Instance Segmentation

DISC. LOSS

Instance segmentation
will appear here.

see lane

Visual Intelligence

Lane Detection Studio // Instance Segmentation

Lane Detection Studio implements the LaneNet architecture from the IEEE IV 2018 paper "Towards End-to-End Lane Detection: an Instance Segmentation Approach." Unlike classical methods relying on hand-crafted edge filters or polynomial fitting, this system treats lane detection as a pixel-wise instance segmentation problem — allowing it to generalise across lighting conditions, occlusion, road markings, and complex multi-lane scenarios.

The system processes a single RGB image through a shared ENet encoder and produces two simultaneous outputs: a binary segmentation mask identifying all lane pixels, and an instance embedding map that colours each individual lane uniquely. Both are returned as base64-encoded images via a Flask REST API protected by a threading lock to prevent concurrent GPU conflicts.

Training is performed on the TuSimple benchmark dataset using a combined multi-task loss: Focal Loss for binary detection and Discriminative Loss for instance separation. Pre-trained weights are loaded from log/best_model.pth.

Why Instance Segmentation?

Binary segmentation alone cannot distinguish between lanes — it only tells you where a lane pixel is, not which lane it belongs to. Instance segmentation goes further: each lane is assigned a unique identity in embedding space, enabling the system to count lanes, track lane changes, and support downstream path planning in autonomous systems.

Binary segmentation alone cannot differentiate lane 1 from lane 2
Instance embeddings cluster same-lane pixels and push apart different lanes
No explicit instance labels required — the discriminative loss is self-supervised in structure
Richer output than bounding boxes: pixel-perfect lane boundaries

Core Capabilities

Photo mode: single-image lane detection with overlay comparison
Video mode: full video processing with frame-by-frame inference
Live mode: real-time webcam detection with FPS display
Multiple backbone options: ENet (fast), U-Net (balanced), DeepLabv3+ (accurate)
REST API endpoint at /predict for programmatic integration
Docker-ready deployment for cloud and edge environments

The Blueprint

Shared Encoder // Dual Decoder Architecture

LaneNet's core insight is parameter sharing. Rather than running two separate models for binary and instance tasks, a single encoder extracts features once. Two lightweight decoders then specialise independently — one for semantic lane classification, one for instance-level embedding. This halves inference cost while preserving full output fidelity.

The input is an RGB image resized to 512 × 256 pixels, normalised using ImageNet statistics (mean [0.485, 0.456, 0.406], std [0.229, 0.224, 0.225]). The binary decoder outputs a 2-channel logit map; the instance decoder outputs a 3-channel embedding space visualised as an RGB colour map.

Component	Details
Encoder	Shared ENet (or U-Net / DeepLabv3+ backbone)
Binary Decoder	2-channel output — lane vs. background
Instance Decoder	3-channel embedding — unique colour per lane
Loss (Binary)	Focal Loss — handles class imbalance (<5% lane pixels)
Loss (Instance)	Discriminative Loss — variance + distance + regularisation
Input Size	512 × 256 px (resized from original resolution)
Output	Binary mask (2ch) + Instance RGB map (3ch)

Supported Backbones

ENet — Default

Designed for real-time semantic segmentation on resource-constrained hardware. Uses InitialBlock with parallel max-pool and convolution, followed by BottleneckModules with dilated and asymmetric convolutions. PReLU activations and BatchNorm2d throughout. At ~3M parameters and 25ms inference, it is the recommended backbone for deployment.

3M Params · 25ms · Real-time

U-Net — Balanced

Classic encoder-decoder with skip connections. 5-level hierarchy with DoubleConv blocks (Conv2d + BatchNorm + ReLU, repeated twice). Progressive channel expansion: 64→128→256→512→1024. Skip connections concatenate encoder feature maps to decoder, preserving fine-grained spatial detail lost during downsampling.

30M Params · 45ms · Balanced

DeepLabv3+ — Premium

ResNet-101 backbone with Atrous Spatial Pyramid Pooling (ASPP). Five parallel branches with dilation rates 1, 6, 12, 18 plus global average pooling capture multi-scale context without spatial resolution loss. A shortcut connection from low-level features (48 channels) aids boundary precision. Best accuracy for complex scenes.

50M Params · 80ms · Max Accuracy

Training the Model

Loss Functions · Dataset · Optimisation Strategy

The model is trained end-to-end with a combined multi-task loss. Binary detection is weighted 10× heavier than instance separation, reflecting the primacy of correctly identifying lane pixels before differentiating them. The full formula is:

Total Loss = 10 × FocalLoss + 0.3 × L_var + 1.0 × L_dist

Focal Loss — Binary Segmentation

FL(p_t) = −α(1−p_t)^γ · log(p_t) γ=2, α=[0.25, 0.75]

Lanes occupy fewer than 5% of pixels in a typical road image — standard cross-entropy collapses to predicting background. Focal Loss corrects this by down-weighting the loss contribution of easy, correctly-classified background pixels by a factor of (1−p_t)^γ. The gamma=2 setting provides aggressive suppression of easy examples. Alpha=[0.25, 0.75] further rebalances background vs. lane classes.

Discriminative Loss — Instance Segmentation

L = L_var + L_dist + L_reg δ_var=0.5, δ_dist=1.5, γ_reg=0.001

Three terms work together. L_var (variance loss) penalises pixels of the same lane instance that are far from their mean embedding — pulling the cluster tight. L_dist (distance loss) penalises different lane mean embeddings that are too close — pushing clusters apart. L_reg (regularisation) prevents embeddings from growing unboundedly. No explicit instance labels are required: the loss is computed from the predicted embeddings themselves.

Dataset — TuSimple Benchmark

TuSimple provides highway driving sequences at 1280×720 resolution with JSON-format lane annotations. The tusimple_transform.py script converts these to pixel-level binary masks (lane vs. background) and instance masks (unique colour per lane). Images are resized to 512×256 for training. Augmentation applies ColorJitter (brightness, contrast, saturation, hue each ±0.1) to training samples only, improving robustness to lighting variation.

Parameter	Value
Batch Size	32 (default)
Optimiser	Adam or SGD with step/exponential LR decay
Learning Rate	0.001 (default)
Weight Decay	1e-4
Epochs	Configurable (100 recommended)
Validation Split	10%
Input Normalisation	ImageNet mean/std
Device	CUDA GPU (CPU fallback)

Inference Pipeline

From Browser Upload to JSON Response

Every prediction request travels through a six-stage pipeline. A threading lock at stage 2 ensures that on single-GPU deployments, only one forward pass runs at a time — preventing out-of-memory errors under concurrent load.

Upload

User submits a multipart form to POST /predict. Flask reads the file from request.files["image"] and opens it via PIL.

Acquire Inference Lock

INFERENCE_LOCK (threading.Lock) is acquired. Any concurrent request blocks here until the current forward pass completes and the lock is released.

Preprocess

Image resized to 512×256. ToTensor() converts PIL to float32 in [0,1]. Normalize(mean, std) applies ImageNet statistics. Tensor moved to CUDA if available, else CPU.

Forward Pass

LaneNet's shared encoder processes the tensor. Binary decoder produces 2-channel logits. Instance decoder produces 3-channel embeddings. Weights loaded from log/best_model.pth.

Post-process

Argmax over binary logits yields the lane mask. Instance embeddings are colour-mapped: pixels belonging to the same lane cluster share a colour, different lanes get different colours.

Encode and Respond

Both output images are JPEG-compressed and base64-encoded. Returned as {"binary_image": "...", "instance_image": "..."}. Frontend decodes and renders inline without a page reload.

Benchmarks and Metrics

Measured on NVIDIA V100 GPU · 640×480 Input

Backbone	Params	Inference	Binary Acc.	Instance Acc.
ENet (default)	3M	25ms	92%	89%
U-Net	30M	45ms	95%	92%
DeepLabv3+	50M	80ms	97%	94%

Evaluation Metrics

Two complementary metrics are used. Dice is preferred for lane tasks due to class imbalance; IoU is the standard segmentation benchmark.

Dice Coefficient (F1-Score)

Dice = 2·|A∩B| / (|A| + |B|)

Range [0,1]. Measures overlap between predicted and ground-truth lane masks. Robust to class imbalance because it normalises by the sum of set sizes rather than the union. Threshold at 0.5 for binarising continuous predictions. A Dice of 0.92 means the predicted mask overlaps the ground truth by 92% on average.

Intersection over Union (IoU / Jaccard Index)

IoU = |A∩B| / |A∪B|

Range [0,1]. Stricter than Dice — penalises false positives and false negatives equally and symmetrically. The standard metric for segmentation challenges. IoU is always lower than Dice for the same prediction: IoU = Dice / (2 − Dice). Used to compare across published benchmarks.

Trade-off Summary

ENet: 40fps real-time capable, suitable for mobile and edge hardware, lowest memory footprint
U-Net: Best balance of speed and accuracy for server-side deployment
DeepLabv3+: Maximum accuracy for challenging scenarios — sharp curves, adverse lighting, occlusion

Deployment Options

Local · Docker · AWS · Kubernetes

The Flask application exposes a single REST endpoint at POST /predict. It accepts a multipart image upload and returns a JSON payload with two base64-encoded output images. Any of the four deployment strategies below will serve this endpoint.

Option 1 — Local Development

pip install -r requirements.txt
python app.py
# Access at http://localhost:5000

Option 2 — Docker

docker build -t lane-detection:latest .
docker run -p 5000:5000 lane-detection:latest
# GPU support:
docker run --gpus all -p 5000:5000 lane-detection:latest

Option 3 — AWS EC2

Recommended instance: g4dn.xlarge (NVIDIA T4 GPU). Use Gunicorn for production-grade concurrency:

git clone https://github.com/Siddh-456/Lane-Detection.git
cd Lane-Detection && pip install -r requirements.txt
pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app

Option 4 — Kubernetes

kubectl apply -f k8s/deployment.yaml
kubectl expose deployment lane-detection \
  --type=LoadBalancer --port=80 --target-port=5000

References

Papers this project builds upon

[1]

Towards End-to-End Lane Detection: an Instance Segmentation Approach

Neven, De Brabandere, Georgoulis, Proesmans, Van Gool · IEEE IV 2018 · arXiv:1802.05591

[2]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, Chaurasia, Kim, Culurciello · arXiv:1606.02147 · 2016

[3]

Semantic Instance Segmentation with a Discriminative Loss Function

De Brabandere, Neven, Van Gool · arXiv:1708.02551 · 2017

[4]

Focal Loss for Dense Object Detection

Lin, Goyal, Girshick, He, Dollár · ICCV 2017 · arXiv:1708.02051

[5]

DeepLabv3+: Encoder-Decoder with Atrous Separable Convolution

Chen, Zhu, Papandreou, Schroff, Adam · ECCV 2018 · arXiv:1802.02611

Tech Stack: PyTorch 1.2+ · Torchvision · Flask · OpenCV · Pillow · NumPy · Pandas · scikit-image · Base64 · threading.Lock · Docker · TuSimple Dataset

The Blueprint

Shared Encoder // Multi-Decoder Branch

LaneNet uses a shared encoder feeding into two independent decoders. This architectural elegance ensures low latency without compromising on spatial resolution.

Component	Details
Encoder	Lightweight ENet (or U-Net/DeepLabv3+ for accuracy)
Binary Decoder	Semantic segmentation (lane pixel detection)
Instance Decoder	Instance segmentation (individual lane identification)
Loss Functions	Focal Loss (binary) + Discriminative Loss (instance)
Input	RGB images (3 channels), resized to 512 × 256
Output	Binary mask (2 channels) + Instance map (3 channels RGB)

Supported Backbones

ENet Engine (Default)

Optimized for real-time edge deployment. Minimizes parameter count (~3M) while maximizing performance via dilated and asymmetric convolutions.

3M Params // 25ms Inference

U-Net Balanced

5-level encoder with DoubleConv blocks. Progressive channel expansion and skip connections preserve fine spatial details.

30M Params // 45ms Inference

DeepLabv3+ Premium

ResNet-101 backbone with ASPP (Atrous Spatial Pyramid Pooling). Captures multi-scale context for maximum accuracy.

50M Params // 80ms Inference

Training the Model

Loss functions, datasets, and optimisation strategy.

The model is trained end-to-end with a combined multi-task loss that emphasizes binary lane detection. Total Loss = 10 × Binary Loss + (0.3 × Var Loss + 1.0 × Dist Loss).

Focal Loss — Binary Segmentation

FL(pt) = -alpha(1-pt)^gamma * log(pt) gamma=2, alpha=[0.25, 0.75]

Addresses severe class imbalance (<5% lane pixels). Down-weights easy, correctly-classified pixels to focus on difficult lane features.

Discriminative Loss — Instance Segmentation

L_var (variance) + L_dist (distance) + L_reg (reg)

L_var pulls pixels of the same lane close to their mean embedding. L_dist pushes different lane mean embeddings apart.

Dataset Preparation

TuSimple Dataset: RGB images resized from 1280×720 to 512×256. Augmentation: ColorJitter (brightness, contrast, saturation, hue each ±0.1) for training stability.

Parameter	Value
Batch Size	Default 32
Optimizer	Adam or SGD
Input Size	512 × 256
Device	GPU (CUDA) or CPU

Inference Pipeline

From raw input to JSON-encoded visualization.

Preprocessing

Resize to 512 × 256 and apply ImageNet normalization. Tensor moved to CUDA if available.

Forward Pass

Encoder extracts features; Binary decoder outputs logits; Instance decoder outputs embeddings.

Post-processing

Binary: argmax + softmax. Instance: sigmoid probability map.

Encode & Respond

Results base64-encoded and returned as JSON payload to the Studio interface.

Benchmarks & Metrics

Measured on NVIDIA V100 GPU with 640×480 images.

Backbone	Params	Inference	Binary Acc.	Instance Acc.
ENet (default)	3M	25ms	92%	89%
U-Net	30M	45ms	95%	92%
DeepLabv3+	50M	80ms	97%	94%

Evaluation Metrics

Dice Coefficient (F1-Score)

Dice = 2 * |A∩B| / (|A| + |B|)

Robust to class imbalance. preferred for lane detection where lanes occupy small image regions.

Intersection over Union (IoU)

IoU = |A∩B| / |A∪B|

Stricter benchmark metric. Penalises false positives and false negatives equally.

Deployment Options

Scale your implementation from Local to Kubernetes.

Option 1: Local Development

pip install -r requirements.txt
python app.py

Option 2: Docker Delivery

# Build
docker build -t lane-detection:latest .
# Run
docker run -p 5000:5000 lane-detection:latest

Option 3: AWS / EC2

Recommended: g4dn.xlarge for GPU acceleration. Run with Gunicorn for production traffic:

pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app

Option 4: Kubernetes

kubectl apply -f k8s/deployment.yaml
kubectl expose deployment lane-detection --type=LoadBalancer --port=80 --target-port=5000

References

The research papers this project builds upon.

[1]

Towards End-to-End Lane Detection: an Instance Segmentation Approach

Neven, De Brabandere et al. · 2018 IEEE IV · arXiv:1802.05591

[2]

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Paszke, Chaurasia et al. · arXiv:1606.02147 · 2016

[3]

Semantic Instance Segmentation with a Discriminative Loss Function

De Brabandere, Neven et al. · arXiv:1708.02551 · 2017

[4]

Focal Loss for Dense Object Detection

Lin, Goyal et al. · ICCV 2017 · arXiv:1708.02051

[5]

DeepLabv3+: Encoder-Decoder with Atrous Separable Convolution

Chen, Zhu et al. · ECCV 2018 · arXiv:1802.02611

Tech Stack: PyTorch 1.2+ · Torchvision · Flask · OpenCV (cv2) · Pillow · NumPy · Pandas · scikit-image · Base64 · threading.Lock · Docker · TuSimple Benchmark Dataset