RSS
 

Why smartphone cameras are blowing our minds

28 Apr

An modified version of this article was originally published February 20, 2018.

There’s no getting around physics – smartphone cameras, and therefore sensors, are tiny. And since we all (now) know that, generally speaking, it’s the amount of light you capture that determines image quality, smartphones have a serious disadvantage to deal with: they don’t capture enough light.

But that’s where computational photography comes in. By combining machine learning, computer vision, and computer graphics with traditional optical processes, computational photography aims to enhance what is achievable with traditional methods. Here’s a rundown of some recent developments in smartphone imaging – and why we think they’re a big deal.

Intelligent exposure and processing? Press. Here.

One of the defining characteristics of smartphone photography is the idea that you can get a great image with one button press, and nothing more. No exposure decision, no tapping on the screen to set your exposure, no exposure compensation, and no post-processing. Just take a look at what the Google Pixel 2 XL did with this huge dynamic range sunrise at Banff National Park in Canada:

Sunrise at Banff, with Mt. Rundle in the background. Shot on Pixel 2 with one button press. I also shot this with my Sony a7R II full-frame camera, but that required a 4-stop reverse graduated neutral density (‘Daryl Benson’) filter, and a dynamic range compensation mode (DRO Lv5) to get a usable image. While the resulting image from the Sony was head-and-shoulders above this one at 100%, I got this image from a device in my pocket by just pointing and shooting.

The Pixel 2 was able to achieve the image above by first determining the correct focal plane exposure required to not blow large bright (non-specular) areas (an approach known as ETTR or ‘expose-to-the-right’). When you press the shutter button, the Pixel 2 goes back in time 9 frames, aligning and averaging them to give you a final image with quality similar to what you might expect from a sensor with 9x as much surface area. While it’s not quite that simple – sensor efficiency and the number of usable frames for averaging can vary – it’s not far off: consider the Pixel 2 can hold its own to the 5x larger RX100 sensor when given the same amount of total light per exposure.

When you press the shutter button, the Pixel 2 goes back in time 9 frames

How does it do that? It’s constantly keeping the last 9 frames it shot in memory, so when you press the shutter it can grab them, break each into many square ’tiles’, align them all, and then average them. Breaking each image into small tiles allows for alignment despite photographer or subject movement by ignoring moving elements, discarding blurred elements in some shots, or re-aligning subjects that have moved from frame to frame. Averaging simulates the effects of shooting with a larger sensor by ‘evening out’ noise.

That’s what allows the Pixel 2 to capture such a wide dynamic range scene: expose for the bright regions, while reducing noise in static elements of the scene by image averaging, while not blurring moving (water) elements of the scene by making intelligent decisions about what to do with elements that shift from frame to frame. Sure, moving elements have more noise to them (since they couldn’t have as many of the 9 frames dedicated to them for averaging), but overall, do you see anything but a pleasing image?

Autofocus

Improvements in autofocus, combined with the extended depth-of-field inherent to smaller sensors, are bringing focus performance of smartphones nearer and nearer to that of high performance dedicated cameras. Dual Pixel AF on the Google Pixel 2 uses nearly the entire sensor for autofocus (binning the high-resolution sensor into a low-resolution mode to decrease noise), while also using HDR+ and its 9-frame image averaging to further decrease noise and have a usable signal to make AF calculations from.

Google Pixel 2 can focus lightning fast even in indoor artificial light, thanks to Dual Pixel AF, allowing me to snap this candid before it was over in a split second. Technologies like ‘Dual PDAF’ autofocus – used by recent iPhones – don’t quite offer this level of performance (the iPhone X lagged and caught a less interesting moment seconds later when it eventually achieved focus), but offer potential image quality benefits.

And despite the left and right perspectives the split pixels in the Pixel 2 sensor ‘see’ having less than 1mm stereo disparity, an impressive depth map can be built, rendering an optically accurate lens blur. This isn’t just a matter of masking the foreground and blurring the background, it’s an actual progressive blur based on depth.

Instant AF and zero shutter lag allowed me to nail this candid image the instant after my wife and child whirled around to face the camera. A relatively new autofocus technology on recent iPhones we’re seeing is ‘Dual PDAF’ autofocus, where a 1×2 microlens is placed over a green-blue pixel pair where the blue color filter has been replaced by a green one. This can offer some benefits over masked pixels, which sacrifice light and can affect image quality, and over dual pixel AF by not requiring as much deep trench isolation as split photodiodes require to prevent color cross-talk.

However, current implementations only utilize this modified microlens structure in 2 pixels out of an 8×8 pixel region, which means only 3% of the pixels are used for ‘Dual PDAF’ AF. That means less light and information available compared to the full-sensor Dual Pixel AF approach which, combined with the lack of the multi-frame noise reduction the Pixel 2 phones benefit from even for AF, meant more misfocus or shots captured after the decisive moment. Like every technology though, we expect generational improvements.

Portrait Lighting

While we’ve been praising the Pixel phones, Apple is leading smartphone photography in a number of ways. First and foremost: color accuracy. Apple displays are all calibrated and profiled to display accurate colors, so no matter what Apple or color-managed device (or print) you’re viewing, colors look the same. Android devices are still the Wild West in this regard, but Google is trying to solve this via a proper color management system (CMS) under-the-hood. It’ll be some time before all devices catch up, and even Google itself is struggling with its current display and CMS implementation.

But let’s talk about Portrait Lighting. Look at the iPhone X ‘Contour Lighting’ shot below, left, vs. what the natural lighting looked like at the right (shot on a Google Pixel 2 with no special lighting features). While the Pixel 2 image is more natural, the iPhone X image is arguably more interesting, as if I’d lit my subject with a light on the spot.

Apple iPhone X, ‘Contour Lighting’ Google Pixel 2

Apple builds a 3D map of a face using trained algorithms, then allows you to re-light your subject using modes such as ‘natural’, ‘studio’ and ‘contour’ lighting. The latter highlights points of the face like the nose, cheeks and chin that would’ve caught the light from an external light source aimed at the subject. This gives the image a dimensionality you could normally only achieve using external lighting solutions or a lot of post-processing.

Sure the photo on the left could be better, but this is the first iteration of the technology. It won’t be long before we see other phones and software packages taking advantage of—and improving on—these computational approaches.

HDR and wide-gamut photography

And then we have HDR. Not the HDR you’re used to thinking about, that creates flat images from large dynamic range scenes. No, we’re talking about the ability of HDR displays—like bright contrasty OLEDs—to display the wide range of tones and colors cameras can capture these days, rather than sacrificing global contrast just to increase and preserve local contrast, as traditional camera JPEGs do.

iPhone X is the first device ever to support the HDR display of HDR photos. That is: it can capture a wide dynamic range and color gamut but then also display them without clipping tones and colors on its class-leading OLED display, all in an effort to get closer to reproducing the range of tones and colors we see in the real world.

iPhone X is the first device ever to support HDR display of HDR photos

Have a look below at a Portrait Mode image I shot of my daughter that utilizes colors and luminances in the P3 color space. P3 is the color space Hollywood is now using for most of its movies (it’s similar, though shifted, to Adobe RGB). You’ll only see the extra colors if you have a P3-capable display and a color-managed OS/browser (macOS + Google Chrome, or the newest iPads and iPhones). On a P3 display, switch between ‘P3’ and ‘sRGB’ to see the colors you’re missing with sRGB-only capture.

Or, on any display, hover over ‘Colors in P3 out-of-gamut of sRGB’ to see (in grey) what you’re missing with a sRGB-only capture/display workflow.

iPhone X Portrait Mode, image in P3 color space iPhone X Portrait mode, image in sRGB color space Colors in P3 out-of-gamut of sRGB highlighted in grey

Apple is not only taking advantage of the extra colors of the P3 color space, it’s also encoding its images in the ‘High Efficiency Image Format’ (HEIF), which is an advanced format aimed to replace JPEG that is more efficient and also allows for 10-bit color encoding (to avoid banding while allowing for more colors) and HDR encoding to allow the display of a larger range of tones on HDR displays.

But will smartphones replace traditional cameras?

For many, yes, absolutely. Autofocus speeds on the Pixel 2 are phenomenal, assisted by not only dual pixel AF but also laser AF. HDR+ like image stacking algorithms will only get better with time, averaging more frames or frames of various time intervals. The Huawei P20 can do exactly this and results are impressive. The P20 can also combine information from both color and higher-sensitivity monochrome sensors to yield impressive noise – and resolution – performance. Dual (or even triple) lens units give you the focal lengths of a camera body and two or more primes, and we’ve seen the ability to selectively blur backgrounds and isolate subjects like the pros do. Folded optics can give you far reaching zoom.

Below is a shot from the Pixel 2 vs. a shot from a $ 4,000 full-frame body and 55mm F1.8 lens combo—which is which?

Full Frame or Pixel 2? Pixel 2 or Full Frame?

Yes, the trained—myself included—can pick out which is the smartphone image. But when is the smartphone image good enough?

Smartphone cameras are not only catching up with traditional cameras, they’re actually exceeding them in many ways. Take for example…

Creative control…

The image below exemplifies an interesting use of computational blur. The camera has chosen to keep much of the subject—like the front speaker cone, which has significant depth to it—in focus, while blurring the rest of the scene significantly. In fact, if you look at the upper right front of the speaker cabinet, you’ll see a good portion of it in focus. After a certain point, the cabinet suddenly-yet-gradually blurs significantly.

The camera and software has chosen to keep a significant depth-of-focus around the focus plane before blurring objects far enough away from the focus plane significantly. That’s the beauty of computational approaches: while F1.2 lenses can usually only keep one eye in focus—much less the nose or the ear—computational approaches allow you to choose how much you wish to keep in focus even if you wish to blur the rest of the scene to a degree where traditional optics wouldn’t allow for much of your subject to remain in focus.

B&W speakers at sunrise. Take a look at the depth-of-focus vs. depth-of-field in this image. If you look closely, the entire speaker cone and a large front portion of the black cabinet is in focus. There is then a sudden, yet gradual blur to very shallow depth-of-field. That’s the beauty of computational approaches: one can choose extended (say, F5.6 equivalent) depth-of-focus near the focus plane, but then gradually transition to far shallower – say F2.0 – depth-of-field outside of the focus plane. This allows one to keep much of the subject in focus, bet achieve the subject isolation of a much faster lens.

Surprise and delight…

Digital assistants. Love them or hate them, they will be a part of your future, and they’re another way in which smartphone photography augments and exceeds traditional photography approaches. My smartphone is always on me, and when I have my full-frame Sony a7R III with me, I often transfer JPEGs from it to my smartphone. Those images (and 720p video proxies) automatically upload to my Google Photos account. From there any image or video that has my or my daughter’s face in it automatically gets shared with my wife without my so much as lifting a finger.

Better yet? Often I get a notification that Google Assistant has pulled a cute animated GIF from my movie it thinks is interesting. And more often than not, the animations are adorable:

Splash splash! in Xcaret, Quintana Roo, Mexico. Animated GIF auto-generated from a movie shot on the Pixel 2.

Machine learning allowed Google Assistant to automatically guess that this clip from a much longer video was an interesting moment I might wish to revisit and preserve. And it was right. Just as it was right in picking the moment below, where my daughter is clapping in response to her cousin clapping at successfully feeding her… after which my wife claps as well.

Claps all around!

Google Assistant is impressive in its ability to pick out meaningful moments from photos and videos. Apple takes a similar approach in compiling ‘Memories’.

But animated GIFs aren’t the only way Google Assistant helps me curate and find the important moments in my life. It also auto-curates videos that pull together photos and clips from my videos—be it from my smartphone or media I’ve imported from my camera—into emotionally moving ‘Auto Awesome’ compilations:

At any time I can hand-select the photos and videos, down to the portions of each video, I want in a compilation—using an editing interface far simpler than Final Cut Pro or Adobe Premiere. I can even edit the auto-compilations Google Assistant generates, choosing my favorite photos, clips and music. And did you notice that the video clips and photos are cut down to the beat in the music?

This is a perfect example of where smartphone photography exceeds traditional cameras, especially for us time-starved souls that hardly have the time to download our assets to a hard drive (not to mention back up said assets). And it’s a reminder that traditional cameras that don’t play well with such automated services like Google and Apple Photos will only be left behind simpler services that surprise and delight a majority of us.

The future is bright

This is just the beginning. The computational approaches Apple, Google, Samsung and many others are taking are revolutionizing what we can expect from devices we have in our pockets, devices we always have on us.

Are they going to defy physics and replace traditional cameras tomorrow? Not necessarily, not yet, but for many purposes and people, they will offer pros that are well-worth the cons. In some cases they offer more than we’ve come to expect of traditional cameras, which will have to continue to innovate—perhaps taking advantage of the very computational techniques smartphones and other innovative computational devices are leveraging—to stay ahead of the curve.

But as techniques like HDR+ and Portrait Mode and Portrait Lighting have shown us, we can’t just look at past technologies to predict what’s to come. Computational photography will make things you’ve never imagined a reality. And that’s incredibly exciting.

If you’d rather digest this article in video form, watch my segment on the TWiT Network (named after its flagship show, This Week in Tech) show ‘The New Screen Savers’ below. And don’t forget to catch our recent smartphone galleries after the video.


Appendix: Studio Scene

We’ve added the Google Pixel 2 and Apple iPhone X to our studio scene widget. You can compare the Daylight and Low Light scenes below to any camera of your choosing, keeping in mind that we shot the smartphones in their default camera apps without controlling exposure to see how they would perform in these light levels (10 and 3 EV, respectively, for Daylight and Low Light).

$ (document).ready(function() { ImageComparisonWidget({“containerId”:”reviewImageComparisonWidget-19227307″,”widgetId”:589,”initialStateId”:3906}) })

Note that we introduced some motion into the Low Light scene to simulate what the iPhone does when there’s movement in the scene. Hence, the ISO 640, 1/30s iPhone X image is more reflective of low light image quality for scenes that can’t be shot at the 1/4s shutter speed (ISO 125) the iPhone X will tend to drop to for completely static (tripod-based) low light scenes.

The Pixel 2 rarely drops to shutter speeds slower than 1/30s in low light, yet impressively almost matches the performance of a 1″-type sensor at these shutter speeds in low light (though the ‘i’ tab shows the RX100 shot at 1/6s F4, you’d get an equivalent exposure at 1/30s were you to shoot the Sony at F1.8 like the Pixel 2).

Articles: Digital Photography Review (dpreview.com)

 
Comments Off on Why smartphone cameras are blowing our minds

Posted in Uncategorized

 

Tags: , , ,

Comments are closed.