What you need
The whole job in ten checkboxes. Most reconstruction problems are solved at capture time, not in software — get these right and the rest is mechanical.
Software & Hardware
RealityScan 2.x is the same engine as RealityCapture, rebranded by Epic in June 2025. The one non-negotiable is an NVIDIA GPU.
Software
- RealityScan 2.x — rebranded from RealityCapture on 17 Jun 2025; latest desktop build 2.1 (26 Nov 2025). Same settings and engine as RealityCapture.
- Licensing: fully free for individuals/companies under $1M USD trailing-12-month revenue (students, hobbyists, educators included). Above that, $1,250 / seat / year. Feature sets are identical — the split is purely revenue-based.
- Platform: Windows is primary. A Linux build exists but is CLI / server only.
Without an NVIDIA / CUDA GPU, RealityScan can align images but cannot build meshes or textures — i.e. no surface. AMD/Intel GPUs are not supported for reconstruction.
Hardware targets
| Tier | RAM | GPU / notes |
|---|---|---|
| Official minimum | 8 GB | 64-bit Win, SSE4.2 CPU, CUDA CC 3.0+, ≥1 GB VRAM |
| Recommended | 16 GB | 4+ cores, CC 6.1+, 1024+ CUDA cores, NVMe SSD |
| Drone-practical | 32 GB | RTX 5070/5080 · scales: ~16GB@2k imgs, 64GB@8k |
RAM scales with image count (~16 GB per 2,000 images). A second GPU adds only ~5–13%; max two cards. Keep an NVMe scratch/cache drive.
Drone & Capture Plan
Walls are the worst case for photogrammetry: large, flat, low-texture areas give few features, while repetitive brick/window grids generate near-identical features that fool the aligner into doubled or bent walls. Beat it with overlap, obliques, and slow, sharp frames.
Overlap
- ~80% forward / ~70% side. For tall/slender surfaces Pix4D wants 90% between images at the same height and ≥60% between different heights.
- RealityScan's aligner wants neighbour overlap >60% and treats <20% as "Low" — target ≥80% to stay safely above its threshold.
- DroneDeploy rule: perpendicular (head-on) gimbal → 70/60, oblique gimbal → 80/70.
Camera settings
Shutter ≥
1/500 s (1/1000 in wind, 1/2000+ in bright sun) ·
ISO 100–200 ·
Aperture stopped ~2.5 stops from wide ·
Mode Shutter-Priority or Manual (constant exposure) ·
Focus locked, fixed focal length (no zoom) ·
WB manual ~5500K ·
Format RAW.
Motion blur is the #1 cause of soft imagery: blur = shutter interval × ground speed — keep it ≤ 1× GSD. Fly slow: ≤ 1 m/s horizontal, 0.2–0.3 m/s vertical.
Flight pattern for a facade
- Primary: horizontal strips / vertical lawnmower grid with the camera perpendicular to the wall (gimbal 0°); build columns at ~80% side overlap, ascend/descend slowly for forward overlap.
- Add obliques at ~45° from all four sides (and a centre oblique column) — this "eliminates the bowing effect" and disambiguates repetitive patterns. Pix4D: facades become visible at 10–35° off-vertical.
- Use +5–10° upward pitch to see under eaves/overhangs. Change pitch and GSD as little as possible between nadir and oblique sets so features stay matchable.
- Capture corners well to lock geometry.
GSD = (Distance to wall × Sensor width) / (Focal length × Image width px). Standoff ≈ 3 m for fine close work; 12–18 m for building-scale facades. A Mavic 3 Pro ≈ 2–3 cm GSD at ~120 ft. Final XYZ accuracy ≈ 1–3× GSD.Shoot in overcast / diffuse light (or golden hour) to kill glare and hard shadows. Glass and reflective surfaces reconstruct poorly — reflections move with the camera and create false tie points → holes/noise. Prefer mechanical-shutter drones (Mavic 3E, Phantom 4 RTK) to avoid rolling-shutter skew. RTK gives cm-level geotags — most valuable for georeferencing and texture-poor walls.
Video vs Photos
RealityScan reconstructs from still images. It can ingest video directly, but for an accurate wall, dedicated overlapping stills win — and if you do use video, control the frame extraction yourself.
| Drone stills | Video frames | |
|---|---|---|
| Resolution | 12–48 MP | ~8 MP (4K) |
| Compression | per-image, RAW | heavy inter-frame, lossy |
| GPS EXIF | yes — seeds poses | usually none |
| Blur risk | low | higher (rolling shutter) |
RealityScan imports video via WORKFLOW ▸ Video (or drag-drop), extracting only keyframes to PNG, controlled by a "Jumps' length" time-span setting. Because it skips interpolated frames and the frames carry no GPS EXIF, many practitioners pre-extract with ffmpeg for full control.
Frame extraction (ffmpeg)
Take roughly 1 of every 15–30 frames (~1–2 fps for drone flight) targeting ~80% overlap — not every frame, which floods the aligner with redundant, mismatch-prone data.
# every 15th frame, near-best quality
ffmpeg -i DJI_0123.MP4 -vf "select=not(mod(n\,15))" \
-vsync vfr -qscale:v 2 frames/frame_%05d.png
# keyframes only — sharpest representative frames
ffmpeg -i input.MP4 -vf "select=eq(pict_type\,I)" \
-vsync vfr -qscale:v 2 key_%05d.png
Force a fast shutter (1/1000 s or faster) — the cinematic 180° rule (1/50 s) deliberately adds blur and is wrong for photogrammetry. Cull blurry frames before alignment (Laplacian-variance scripts, or deselect in RealityScan). Frames lack GPS, so plan to georeference with GCPs / known distances.
Processing in RealityScan
Tabs are grouped WORKFLOW / ALIGNMENT / MESH & COLOR / SCENE 3D / TOOLS. The goal of alignment is a single Component containing all cameras.
Import
Drag in the drone photos or extracted frames. GPS-tagged JPEGs auto-populate camera positions.
Align
For a blank/repetitive facade override settings: push Max features above the 40k default, raise preselector features, set Image overlap: Low, lower alignment downscale 3→2, detector sensitivity High/Ultra. Aim for one Component.
Fix alignment
For doubled/bent walls: use the Inspect tool (I); align the bad seam first, lock those poses, then re-align the rest. Align ~50% first, then densify. Control Points (≥3 shared) as a last resort to merge components.
Reconstruction region
Tighten the box to just the wall — excluding sky/ground slashes compute and noise.
Mesh
Use High Detail, depth-map & image downscale = 1, low smoothing to preserve cracks/relief, raise max vertex count. Drop to downscale 2–4 only if you hit RAM limits.
Simplify
⚠ Simplify drops textures (keeps vertex colour only) — simplify BEFORE texturing, then texture the reduced mesh.
Texture
Mosaicing unwrap, 8192–16384 resolution (UDIM if needed), Multi-band colouring for seamless results across varying light.
Export
OBJ / FBX / GLB (note: .gltf not supported) / PLY / USD; point clouds as LAS / XYZ / PTX. Pick the output coordinate system for georeferenced export.
Side orthophoto
The right deliverable for a flat wall: SCENE 3D ▸ Ortho Projection (F11), set Type = Side, set ortho pixel size (= GSD), render, export a true-scale measurable facade image.
Getting Real Accuracy
A reconstruction can look perfect and still be the wrong size or warped. Three things make it metrically trustworthy: scale, control, validation.
A · Set scale
- Easiest: a known real-world distance (F4 constraint) between two identifiable points — measure on site, enter it. This alone fixes scale.
- Drone EXIF GPS georeferences automatically but consumer GPS can be metres off. RTK/PPK drones tighten camera positions dramatically.
B · Ground Control Points
Physical checkerboard / bright-orange targets — minimum 5, ideally 8–12, evenly distributed across the wall, each marked in ≥3 images, surveyed for known XYZ. RealityScan then scales, georeferences, and reduces drift/warping.
C · Validate
- Reprojection error — check the alignment report; sub-pixel is good.
- Checkpoint RMSE — survey independent checkpoints not used in the solution and compare against survey truth. That RMSE is your real accuracy figure.
- Spot-check known distances on the final model / orthophoto.
Common Pitfalls
The failure modes that ruin facade reconstructions — and the fix for each.
Doubled / bent walls
Repetitive masonry or window grids. Fix: oblique passes, more overlap, align-the-seam-first & lock, control points last.
No NVIDIA GPU
Alignment works but no mesh or texture builds. A hard wall — get a CUDA card.
Glass & reflections
Unstable tie points → holes, bubbling, noise. Mask/delete or accept holes; shoot diffuse light.
Inconsistent standoff
Flying too close/far or varying distance → broken overlap, uneven GSD. Keep distance constant.
Harsh / changing light
Glare, hard shadows, colour seams. Use overcast/golden-hour light and fixed manual exposure.
Too much video
Redundant frames → more mismatches, not better results. Subsample ~every 15–30th frame.
Texture before simplify
Simplify keeps only vertex colour. Simplify first, then texture.
No scale / no validation
Good-looking but metrically wrong. Always set scale and check independent checkpoints.
Box left wide open
Sky/ground included → wasted compute, noise, crashes. Tighten the reconstruction region.
Sources
Primary documentation and field guides behind this workflow.