Apple

Pair of iPhones Strapped to the Face

Apple’s new Reality OS hardware may become a reality in 2023, perhaps this quarter or early next quarter. With many credible rumors flying, a lot of smart people have given what I think are sort of superficial first-order thoughts about Apple’s entrance into the headset market. Apple did things differently with the iPhone and it changed the world and changed Apple. Will Apple just enter the headset space and do the same things as other players?

What is Possible?

What we know about the current state of the art:

It is important to understand just how difficult headset technology is, from a processing power, weight, specifications, and battery standpoint. It is all very non-trivial.

What we know:

  • Apple has custom low-power silicon expertise that could power a portable headset device, probably outpacing the capabilities of any competitors. Apple can sell you a 14” laptop with 96 GB unified memory, which means more VRAM than most desktops and laptops!
  • Apple has camera expertise and entire teams that have built sophisticated pipelines. They are not new to this.
  • Apple has already shipped AR demos in their iOS and iPad devices, which have LiDAR sensors. They have solved some tracking (keeping an object in place) and even occlusion problems in existing software. However, when you are using your phone to change the POV, you can’t use your hands to interact. Apple needs hand tracking, and there is probably no reason Apple can’t do this well.
  • Apple has global map data for their Apple Maps product which could be useful in software that interacts with the real world. Apple has shipped GPS for years now, which tell where you are in the real world at a global scale, down to the building you are in.
  • Apple has beacon technology that can tell devices where objects are in the real world, at the room or building or meter-proximity scale. And near-field expertise to tell when devices touch things in the real world.
  • Apple has shipping technology that can track the face and expressions and deform a 3D avatar, in the form of the otherwise pointless Memoji initiative. This tech would be far from pointless in Reality OS.
  • Spatial computing has been around in research labs and in Sci-Fi for a long time (see Technologies in Spielberg’s 2002 film Minority Report. Underkoffler would later consult on the Marvel Iron Man film and found Oblong Industries. I worked at Oblong from 2010 to 2012.) A lot of these gestures and multi-screen and real-world computing ideas have had decades to stew, so this not actually a totally new software world, even if the specialized hardware hasn’t yet stuck. There is good prior art here.
  • Apple has deep content plays with existing content and pipelines producing more (Apple TV+, Apple Arcade, Apple Fitness+) that could be compelling in a Reality OS world. They don't have to wait for others to make compelling content for this space. They could make it themselves.
  • Apple FaceTime is a well-established video calling system that consumers trust and know how to use. Apple recently introduced SharePlay for remotely working with apps and data while sharing presence. It is half-baked but shows where they want to take things.
  • Apple is good at creating new UI paradigms by taking the best of existing (often external) tech demos and turning this into a coherent blending of hardware and software that proves useful and intuitive to people. You might even say that this is Apple’s sine qua non.

Reality Gambit, or Reality Gamut?

I agree with a lot of the pundits that do not have a strong desire to strap a computer to their skulls, unless the advantages are worth it.

In my view, the holy grail might need to be a device that can

  • Show the real world in high fidelity with no virtual pixels intruding. Bonus points if legacy systems are usable without taking the headset on and off. (More about this below, under Rectangular Proxies.)
  • Show overlays of useful information superimposed on top of the real world, such as floating marquees showing contextual information or directions (Augmented Reality). The user(s) need to be able to interact with this data, avatars, creatures, and whatnot, probably with hand gestures and voice (Siri).
  • Allow the option for a fully virtual experience that can completely block out the outside world for meetings, gaming, relaxation, movie watching, and future blending of these activities. (With outward-facing depth sensors that switch the software back to showing external cameras when danger is imminent, as current VR headsets do.)

I think if Apple can’t create a product and a platform that can span this entire spectrum or Reality Gamut of Mixed Reality = AR + VR, then I think any Apple headset product is dead in the water. We need more from Apple than just AR or just VR to justify a new software platform. But I think this full Reality Gamut could be enough to carry one new device category.

Spectacles

No way will Apple debut in 2023 with lightweight “glasses” that are transparent yet can do all of the above. Not with current technology. They know this is five to twenty years off.

Yet no way would Apple not understand the long term goal: to subsume all computing into one platform or paradigm that can link the old portable and desktop and large-screen paradigms (watchOS, iOS, iPadOS, macOS, tvOS) with a new pervasive computing paradigm, which would eventually require “magical” non-existent spectacles to be long-term viable for most people.

Yet, if Apple can launch an expensive XR (truly mixed gamut of normal reality to AR to fully immersive VR) headset in 2023—with some drawbacks like (a) weight, (b) battery and (c) price, but no compromises on tracking, interaction, lag, etc.—then they can start building the future now. In other words, developers could start writing software this summer, in 2023, that might run with a similar experience on a “magical” spectacle headset in the future, decades hence. I believe this will be their strategy. Most importantly, the interaction paradigm will not need to change even if the hardware changes to allow transparent views of the actual real world instead of cameras projecting the real world onto screens. If the platform is conceptualized and built right, up front, the same software might run on future more-advanced hardware.

Imagine what you could build with just that: a pair of iPhones strapped to your head. You would see out of the high-quality iPhone camera(s) with each eye looking at an expensive, high-dynamic-range, high-resolution “Retina” display or displays. The system could then superimpose pixels with UI and content experiences. Existing LiDAR sensors and AR software can run occlusion so things appear in the real world where they ought to be, especially behind objects. And you could opt into a fully immersive environment for VR if your surroundings were safe, like the “I’m not driving” button in iOS. And mixed experiences, like turning your floor into lava, would make for pointless but powerful demos that would at least get the point across.

For some reason I have not seen tech journalists and pundits frame the headset Gordian Knot the following way (perhaps their jobs are to critique existing things instead of figuring out how to shift the future using logic and creativity?):

  • Apple only needs to solve the software platform paradigm problem up front, while purposely embracing hardware tradeoffs (a), (b), and (c) (above), and commit to making gradual headway on the hardware going forward, while building the app and content experiences that will become increasingly accessible as (a) weight decreases, (b) battery-life improves, and (c) prices drop.

Apple did exactly this with the watch, the phone, the tablet. The original iPhone is terrible by today’s standards, but got the software off the ground. That software still runs in the same paradigm, sixteen years later. The original iPad is terrible by today’s standards, but that iPad was leaps-and-bounds better than anything on the market at the time (and existing, modern iPads are still way better than anything on the market). The original Apple Watch barely did what it needed to do, but allowed the Apple Watch to start off as a platform. The current watches for sale are almost the exact same in terms of software paradigm, as that original Apple Watch, just much better in terms of form factor, battery life, etc.

Rectangular Proxies

I will speculate that Apple could bring legacy software into the Reality OS world by allowing virtual devices of various sizes, from wearable (wrist) to pocketable (phone-sized) to holdable (tablet-sized) to ergonomic desk-like, to movie-theater or fully immersive. My working idea is that they could ship “foam block rounded-rect proxies” with sexy fiducial markers (Apple have done similar with the experience that allows transfering iPhone data to set up a new device, which is way, way sexier than it needs to be). These holdable objects could be interacted with by projecting pixels of virtual screens running legacy apps onto the “devices” as seen from the headset, which would function very similar to current devices including touch, hopefully. I am not sure about the details for this, but they have brought iPhone and iPad software to the Mac once it ran Apple Silicon, so they understand the value of bringing software from one platform to another. They could bring 2D computing into Reality OS, out of the gate in 2023.

Conclusion

I recognize that there are significant challenges to creating a new immersive computing paradigm. I think Apple has a good shot at this, if they take it seriously and have thought deeply about it from a user- and a design- and a “how-things-should-work” perspective. They have done it before, with the Apple II, Mac, iPod, iPhone, iPad, and Apple Watch. I am pretty sure that this new category will not be initially as popular as these six past categories, but I believe eventually it could be more popular. Immersive computing could be the future. I think it is just a matter of when and how.

(And I would rather Apple define that future than Meta—which is just Facebook, a company design entirely to sell ads through rage-inducing “engagement.”)


Design

Design Blindness

John Gruber quoted an Android user who is perplexed about why iOS users have affection for wonderfully designed and polished apps.

Gruber then continues:

That’s like asking for “measurable criteria” for evaluating a movie or novel or song or painting. I will offer another quote from Kubrick: “The test of a work of art is, in the end, our affection for it, not our ability to explain why it is good.”

I agree with Gruber’s entire article about apps, but I would amend Kubrick and add that works of art, or apps, or fonts can actually have many explanations for why they are not good. Critics do this all the time. A film where there is nothing or very little to criticize might be a great film. Less than great films can be picked apart, if they are even worth the time and effort. (Maybe Gruber is saying that there are not enough third-party Android apps in the Mastodon and RSS Feed Reader categories to even bother?)

I think apps are much the same as films, for example. When you launch some unfamiliar software and immediately get taken out of the experience, this is akin to unintentionally breaking the fourth wall with poor filmmaking. This can happen in many ways:

  • crashing bugs
  • glitches
  • UI weirdness
  • frustrating inability to intuit how or where or why
  • need for workarounds
  • failure to anticipate the user’s needs
  • unintuitiveness (hidden gestures, requirements to read the manual even for simple apps)
  • failure to do work for the user that could easily be automated by the software
  • fiddliness, jankiness, jumpiness, wonkiness, wobliness
  • inconsistency with system controls
  • lack of detent in slider or knob controls
  • making the user do things manually or painstakingly
  • lack of dark mode
  • failure to integrate with the system or hardware features
  • lack of orthogonal features that multiply to allow sophisticated workflows and flexible integrations (e.g. multi-select, keyboard shortcuts)
  • poor information design or layout (too dense, not dense enough)
  • clunky transitions (or no transitions)
  • lack of springiness (like my bank’s app on my phone)
  • large download sizes without commensurate content, usually to ship a massive cross-platform framework for what should be a simple app, or to include ad-network code
  • ugly or thin or whispy or unrecognizable icons

If you quickly run and into one or more of these issues, and you keep finding more, even if the app does what it purports to do, you still know that the developers are just ticking boxes and don’t give a damn, or don’t even know how to give a damn, or, if we are being charitable, are not allowed to give a damn. (Or the users don’t care either and will not pay for better software, so no one can afford to put the effort into making better apps, which appears to be the case for native-only third-party Android apps. Large-player third-party apps such as Disney+ or whatever on Android might be mostly fine, but these are not native-only. iOS has tons of iOS-only native-only apps.)

Explaining Poor Graphic Design, Typography and Poor Fonts

To switch gears, the quote highlighted in the DF article could be slightly modified to explain (often technical people’s) similar misunderstandings and perplexity about typography and font choices:

What on earth is he asking for out of these [fonts]? How do you objectively compare one [font]’s “panache” with another? If I [were] a developer [writer or designer], what are the steps I can follow to program some “comfort” into my app[‘s typography]? These complaints seem so wishy-washy and underspecified.

Then he leaves with the Kubrick quote: “Sometimes the truth of a thing isn’t in the think of it, but in the feel of it.” We’re fully in the realm of mysticism now, this is not an attempt to fairly compare or measure anything. [...]

I think if he’s going to praise some [fonts] and dunk on the other ones, he should compare using measurable criteria. Otherwise, it’s only one person’s opinion. Just saying “[Font] X feels right” is like saying “[Font] X has a better chakra energy.” What is any developer [writer, designer] supposed to do with that feedback? The whole article could have boiled down to “I personally like these [fonts] and I don’t like those.”

As mentioned above about being able to explain what is wrong with an app or wrong with art, or a film, when a typeface (such as Papyrus) has a lot to criticize, especially regarding technical aspects then it is objectively a bad typeface, or fails to find any valid usage scenarios.

To make this more clear, there can be boring fonts that are technically perfect, but no beautiful fonts that are technically flawed. A truly beautiful font must be technically well-executed and have some character, some panache, some swagger.

Helvetica’s capital R at the left, and Arial’s capital R on the right

Compare Helvetica and Arial. Arial shares all its metrics with Helvetica and even if we grant that it is technically acceptable, which capital R makes your heart sing, and which looks like a flaccid eggplant?

It’s not arbitrary. Helvetica’s R has a better balance horizontally but Arial’s juts out to the right awkwardly, and makes kerning worse. So in Helvetica the word “Right” in Arial at a glance looks like “R ight,” or worse the “R” touches the bottom of the “i.” Arial copied the entirety of Helvetica and then changed a few glyphs for no reason (or worse, to claim that it’s not actually technically identical), and certainly made few if any improvements.

Typography and information design can be criticized candidly too. Bad kerning, illegibility, weird contrast issues, bad size hierarchy and poor alignment, overuse or misuse of formatting such as both space and indentation for paragraphs, use of underline instead of italics, use of built-in fonts, lack of polish with layout and spacing metrics, typos and missing hyphens, lack of title-casing, and on and on, all of these give away the game that someone is a visual amateur and hasn’t even seen or noticed that some design works and some just doesn’t. Some design punishes the reader or user. Trips them up, like a piece of uneven sidewalk. Makes them think, instead of having the designer, writer or programmer do the thinking up front. It’s akin to not speaking the native tongue versus being fluent.

It’s not just arbitrary snobbism or rule-making as a shibboleth. It’s called learning to see and learning human empathy.

Maybe the opposite should be called design blindness?


Humor

The Elephants’ New Clothes

If you are familiar with the children’s illustrated books by Jean de Brunhoff featuring Babar the King of the elephants, you will know that civilized elephants wear clothes. What is less obvious is this weird double standard that developed.

Civilized animal cultures tolerate nudity in art to a greater extent than nudity in real life, with different parameters for what is acceptable: for example, even in a museum where nude elephants are displayed, nudity of the visiting elephants is generally not acceptable.

Strange, eh?


Design

Training Generative Models

I would consider myself an AI art enthusiast and optimist. Despite being starkly anti-Luddite, I do recognize the arguments about how the models are trained seem to have the most substance. (Let’s cut out all the whining about how these new tools will bring about a sea change and “won’t someone think of the poor starving artists”. This Ludditism is indistinguishable from past laments about innovations that are now boring commodities, such as the novel, the teddy bear, the bicycle, and on and on. Perhaps I am wrong and AI will be different?—won’t someone think about the poor grandmaster chess player with nothing better to do than sue for peace, now that the machines consistently beat them handily. Or perhaps scrappy humans will adapt and commercial and fine art and photography will flourish.)

Links About Training ML Models and Licensing (or Not) Input Images or Text Corpus

Here are some links about generative models and the ethics and legality of training data. I will add more as I come across them.

I’m not sure if the result of Copilot will be the erosion of open source contributions or not, but the copyright aspects obviously have merits. A lot is unknown at this point.

  • Shutterstock announced an alliance with Open AI (also Microsoft essentially) to sell and license their contributors’ tagged and organized photos to train DALL-E 2. What they don’t say in the press release is what it says in the email they sent to their contributors, that Shutterstock will not accept any AI-generated contributions:

Working together to lead the way with AI

We’re excited to announce that we are partnering with OpenAI to bring the tools and experiences to the Shutterstock marketplace that will enable our customers to instantly generate and download images based on the keywords they enter.

As we step into this emerging space, we are going to do it in the best way we know how—with an approach that both compensates our contributor community and protects our customers.

In this spirit, we will not accept content generated by AI to be directly uploaded and sold by contributors in our marketplace because its authorship cannot be attributed to an individual person consistent with the original copyright ownership required to license rights. Please see our latest guidelines here. When the work of many contributed to the creation of a single piece of AI-generated content, we want to ensure that the many are protected and compensated, not just the individual that generated the content.

In the spirit of compensating our contributor community, we are excited to announce an additional form of earnings for our contributors. Given the collective nature of generative content, we developed a revenue share compensation model where contributors whose content was involved in training the model will receive a share of the earnings from datasets and downloads of ALL AI-generated content produced on our platform.

We see generative as an exciting new opportunity—an opportunity that we’re committed to sharing with our contributor community. For more information, please see our FAQ on the subject, which will be updated regularly.

More about Shutterstock:

So even Shutterstock is playing both sides of this issue—feeding the beast (models) but saying no to the quagmire of feeding legally questionable images back into the system. A complex and nuanced stance?! (Or just putting out FUD to bolster their own position, where they benefit if other gratis and open projects (like Stable Diffusion) get sidelined by actual or perceived legal issues? A little from column A, a little from column B?)

Interesting dilemma. Working hypothesis: all illustrators will be expected to work in all styles, since a single style is always too easy to copy. Also, graphic designers have done this for like a century: adapt their style to the needs of each project, client, or product. Perhaps illustrators will be required to up their game?

Strangely he belittles to the tool as merely “a collage” tool, and though he is actually a technical person, unsurprisingly frames things as unfavorably as possible for the defendant, Stable Diffusion. A collage tool, but it is very threatening! He seems to miss some obvious things in his skewed framing. For example he says the model compresses and stores a lossy copy of 4 billion images. He compares this to lossy MP3s, which at 10% file size compared to uncompressed recording are just passable auditorily (128kbit quality, 1 minute stereo MP3 of original WAV file is about 1 MB / minute, or 70 MB for a full 700 MB uncompressed CD.) A 1% compressed MP3 sounds like this (insert 12kbit/sec ear-grating audio sandpaper). So in a model measure in gigabytes (or one billion bytes) each image has been lossy compressed by approximately a factor of one million (512x512x3 = 786 KB) or more depending on whether the input images are tiled into many smaller images or lossily downsampled first. Anyway, for training billions of images on Stable Diffusion into a model that can run on my laptop, this can only work because the system is learning to share information from all the other images.

He is a lawyer tasked with using rhetoric to tear into SD as a tool for creativity so he has to try to make it appear as ordinary as possible such that if this were done by humans it would be very illegal. The better he does this, the more money the lawyers make. He may genuinely believe he is helping artists by trying to destroy SD as a creative tool. But we can never know because he is financially invested (obligatory Upton Sinclair quote) in the outcome now so there is no way he could ever be convinced otherwise by probably any means.

So, for example, as part of this die-hard rhetoric, he never mentions image scraping by Google for Google Image Search. Why not sue Google over this, for the big bucks? Has that ship sailed? Can people honestly make a consistent case about scraping public images being totally wrong and immoral under all circumstances? What about the longstanding, totally unenforceable situation where working artists regularly use image search to find many reference images, and create a collage or amalgam the old-fashioned way, without paying for any of the images, and create their own drawing or illustration, mostly but not really from scratch, and hide their image-inspiration tracks? Now that computers can do this at scale it is suddenly immoral? Weird.

I wonder why smart people take different sides when these new, innovative tools appear. I know other technologists (who have never seriously used the tools) who have reacted strongly and were surprised that not everyone thinks it is a clear case of “absolute evil leviathan-versus-little-guy art heist.” Just like the MPAA and RIAA spent tons of money over decades labeling “music piracy” (sharing) as “theft” because it hurt the fat cat label execs’ bottom line and expensive middle-man lifestyles (they were systematically fleecing and hurting artists way worse than any music sharing ever could), the record execs purposely never mention the following: Taking a chair from someone’s house leaves the house short of one chair (same for an art museum with the heist of a specific piece of art) but walking into a museum and carefully measuring the dimensions of a chair and going home and making one’s own duplicate does not. This creates two chairs, not just one that was stolen and removed to a different place. Thus IP copying is not theft nor burglary nor larceny in the usual sense. It may be illegal or wrong in some sense, but it is not a zero-sum action.

Ironically Butterick is mad at Microsoft for training Github CoPilot on Github repos, but if the SD lawsuit is successful, only financially larger players such as OpenAI (effectively a Microsoft subsidiary now, after a recent $10B infusion) will have the money to license input images to train models, and all tools that rely on models will have to be paid (unless you know someone who has downloaded the old contraband models and can illegally share that contraband with you). Thus eventually only big businesses will control AI image art creation, which is the nightmare Butterick and other lawyers are supposedly trying to prevent with their rhetoric about stealing from the artists. He may be doing OpenAI and hence Microsoft’s dirty work for them by killing off the small players or free players such as Stable Diffusion.

Archive