Winner-takes all effects in autonomous cars

By Benedict Evans for a16z

There are now several dozen companies trying to make the technology for autonomous cars, across OEMs, their traditional suppliers, existing major tech companies and startups. Clearly, not all of these will succeed, but enough of them have a chance that one wonders what and where the winner-take-all effects could be, and what kinds of leverage there might be. Are there network effects that would allow the top one or two companies to squeeze the rest out, as happened in smartphone or PC operating systems? Or might there be room for five or ten companies to compete indefinitely? And for what layers in the stack does victory give power in other layers? 

These kinds of question matter because they point to the balance of power in the car industry of the future. A world in which car manufacturers can buy commodity ‘autonomy in a box’ from any of half a dozen companies (or make it themselves), much as they buy ABS today, is very different from one in which Waymo and perhaps Uber are the only real options, and can set the business model of their choice, as Google did with Android. Microsoft and Intel found choke points in the PC world, and Google did in smartphones – what might those points be in autonomy?

To begin with, it seems pretty clear that the hardware and sensors for autonomy – and, probably, for electric – will be commodities. There is plenty of science and engineering in these (and a lot more work to do), just as there is in, say, LCD screens, but there is no reason why you have to use one rather than another just because everyone else is. There are strong manufacturing scale effects, but no network effect. So, LIDAR, for example, will go from a ‘spinning KFC bucket’ that costs $50k to a small solid-state widget at a few hundred dollars or less, and there will be winners within that segment, but there’s no network effect, while winning LIDAR doesn’t give leverage at other layers of the stack (unless you get a monopoly), anymore than than making the best image sensors (and selling them to Apple) helps Sony’s smartphone business. In the same way, it’s likely that batteries (and motors and battery/motor control) will be as much of a commodity as RAM is today – again, scale, lots of science and perhaps some winners within each category, but no broader leverage.

On the other hand, there probably won’t be direct parallels to the third party software developer ecosystems that we see in PCs or smartphones. Windows squashed the Mac and then iOS and Android squashed Windows Phone because of the virtuous circle of developer adoption above anything else, but you won’t buy a car (if you own a car at all, of course) based on how many apps you can run on it. They’ll all run Uber and Lyft and Didi, and have Netflix embedded in the screens, but any other apps will happen on your phone (or watch, or glasses).

Rather, the place to look is not within the cars directly but still further up the stack – in the autonomous software that enables a car to move down a road without hitting anything, in the city-wide optimization and routing that mean we might automate all cars as a system including some mechanic parts like the Bosch diesel nozzle, not just each individual car, and in the on-demand fleets of ‘robo-taxis’ that will ride on all of this. The network effects in on-demand are self-evident, but will will get much more complex with autonomy (which will cut the cost of an on-demand ride by three quarters or more). On-demand robo-taxi fleets will dynamically pre-position their cars, and both these and quite possibly all other cars will co-ordinate their routes in real time for maximum efficiency, perhaps across fleets, to avoid, for example, all cars picking the same route at the same time. This in turn could be combined not just with surge pricing but with all sorts of differential road pricing – you might pay more to get to your destination faster in busy times, or pick an arrival time by price.

From a technological point of view, these three layers (driving, routing & optimisation, and on-demand) are largely independent – you could install the Lyft app in a GM autonomous car and let the pre-installed Waymo autonomy module drive people around, hypothetically. Clearly, some people hope there will be leverage across layers, or perhaps bundling – Tesla says that it plans to forbid people from using its autonomous cars with any on-demand service other than its own. This doesn’t work the other way – Uber won’t insist you use only its own autonomous systems. But though Microsoft cross-leveraged Office and Windows, both of these won in their own markets with their own network effects: a small OEM insisting you use its small robo-taxi service would be like Apple insisting you buy AppleWorks instead of Microsoft Office in 1995. I suspect that a more neutral approach might prevail. This would especially be the case if we have cross-city co-ordination of all vehicles, or even vehicle-to-vehicle communication at junctions – you would need some sort of common layer (though my bias is always towards decentralised systems).

All this is pretty speculative, though, like trying to predict what traffic jams would look like from 1900. The one area where we can talk about what the key network effects might look like is in autonomy itself. This is about hardware, and sensors, and software, but mostly it’s about data, and there are two sorts of data that matter for autonomy – maps and driving data. First, ‘maps.’

Our brains are continuously processing sensor data and building a 3D model of the world around us, in real time and quite unconsciously, such that when we run through a forest we don’t trip over a root or bang our head on a branch (mostly). In autonomy this is referred to as SLAM (Simultaneous Localisation And Mapping) – we map our surroundings and localise ourselves within them. This is obviously a basic requirement for autonomy – AVs need to work out where they are on the road and what features might be around (lanes, turnings, curbs, traffic lights etc), and they also need to work out what other vehicles are on the road and how fast they’re moving.

Doing this in real time on a real road remains very hard. Humans drive using vision (and sound), but extracting a sufficiently accurate 3D model of your surroundings from imaging alone (especially 2D imaging) remains an unsolved problem: machine learning makes it conceivable but no-one can do it yet with the accuracy necessary for driving. So, we take shortcuts. This is why almost all autonomy projects are combining imaging with 360 degree LIDAR: each of these sensors have their limitations, but by combining them (‘sensor fusion’) you can get a complete picture. Building a model of the world around you with imaging alone will certainly be possible at some point in the future, but using more sensors gets you there a lot quicker, even given that you have to wait for the cost and form factor of those sensors to become practical. That is, LIDAR is a shortcut to get to a model of the world around you. Once you’ve got that, you often use machine learning to understand what’s in it – that shape is a car, or a cyclist, but for this, there don’t seem to be a network effect (or a strong one): you can get enough images of cyclists yourself without needing a fleet of cars.

If LIDAR is one shortcut to SLAM, the other and more interesting one is to use prebuilt maps, which actually means ‘high-definition 3D models’. You survey the road in advance, process all the data at leisure, build a model of the street and then put it onto any car that’s going to drive down the road. The autonomous car doesn’t now have to process all that data and spot the turning or traffic light against all the other clutter in real-time at 65 miles an hour – instead it knows where to look for the traffic light, and it can take sightings of key landmarks against the model to localise itself on the road at any given time. So, your car uses cameras and LIDAR to work out where it is on the road and where the traffic signals etc are by comparing what it can see with a pre-built map instead of having to do it from scratch, and also uses those inputs to spot other vehicles around it in real time.

Maps have network effects. When any autonomous car drives down a pre-mapped road, it is both comparing the road to the map and updating the map: every AV can also be a survey car. If you have sold 500,000 AVs and someone else has only sold 10,000, your maps will be updated more often and be more accurate, and so your cars will have less chance of encountering something totally new and unexpected and getting confused. The more cars you sell the better all of your cars are – the definition of a network effect.

The risk here is that in the long term it is possible that just as cars could do SLAM without LIDAR, they could also do it without pre-built maps – after all, again, humans do. When and whether that would happen is unclear, but at the moment it appears that it would be long enough after autonomous cars go on sale that all the rest of the landscape might look quite different as well (that is, ??‍♂️).

So, maps are the first network effect in data – the second comes in what the car does once it understands its surroundings. Driving on an empty road, or indeed on a road full of other AVs, is one problem, once you can see it, but working out what the other humans on the road are going to do, and what to do about it, is another problem entirely.

One of the breakthroughs supporting autonomy is that machine learning should work very well for this: instead of trying to write complex rules explaining how you think that people will behave, machine learning uses data – the more the better. The more data that you can collect of how real drivers behave and react in the real world (both other drivers and the drivers of your survey vehicles themselves), the better your software will be at understanding what is going on around it and the better it will be at planning what to do next. Just as for maps, before launch your test cars collect this data, but after launch, every car that you sell is collecting this data and sending it home. So, just as for maps, the more cars you sell the better all of your cars are – the definition of a network effect.

Driving data also has another, secondary use for that driving data, in simulation. This seeks to solve the question “if X happens, how will our autonomous software react?” One way to do this is by making an AV and letting it drive itself around the city all day to see how it reacts to whatever random things any other drivers happen to do. The problem is that this is not a controlled experiment – you can’t rerun a scenario with new software to see what changes and whether any problems have been fixed. Hence, a great deal of effort is now going into simulation – you put your AV software into Grand Theft Auto (almost literally) and test it on whatever you want. This doesn’t necessarily capture some things (“will the LIDAR detect that truck?”), and some simulation scenarios would be circular, but it does tell you how your system will react to defined situations, and you can collect those situations from your real-world driving data. So, there is an indirect network effect: the more real world driving data that you have, the more accurate you can make your simulation and therefore the better you can make your software. There are also clear scale advantages to simulation, in how much computing resource you can afford to devote to this, how many people you have working on it, and how much institutional expertise you have in large computing projects. Being part of Google clearly gives Waymo an advantage: it reports driving 25,000 ‘real’ autonomous miles each week, but also one billion simulated miles in 2016 (an average of 19 million miles a week).

It could be argued that Tesla has a lead in both maps and driving data: since late 2016, those of its new vehicles whose buyers bought the ‘Autopilot’ add-on have eight cameras giving a near-360 degree field of view, supplemented by a forward-facing radar (there is also a set of ultrasonic sensors, which have pretty short range and are mostly used for parking). All of those can collect both mapping and driver behaviour data and send it back to Tesla, and it appears that Tesla has very recently begun actually collecting some of this. The catch is that since the radar only points forwards, Tesla will have to use imaging alone to build most of the model of the world around itself, but, as I noted above, we don’t yet know how to do that accurately. This means that Tesla is effectively collecting data that no-one today can read (or at least, read well enough to produce a complete solution). Of course, you would have to solve this problem both to collect the data and actually to drive the car, so Tesla is making a big contrarian bet on the speed of computer vision development. Tesla saves time by not waiting for cheap/practical LIDAR (it would be impossible for Tesla to put LIDAR on all of its cars today), but doing without LIDAR means the computer vision software will have to solve harder problems and so could well take longer. And if all the other parts of the software for autonomy – the parts that decide what the car should actually do – take long enough, then LIDAR might get cheap and practical long before autonomy is working anyway, making Tesla’s shortcut irrelevant. We’ll see.

So, the network effects – the winner-takes-all effects – are in data: in driving data and in maps. This prompts two questions: who gets that data, and how much do you need?

Ownership of the data is an interesting power and value chain question. Obviously Tesla plans to make all of the significant parts of the technology itself and put it in its own cars, so it owns the data as well. But some OEMs have argued that it’s their vehicle and their customer relationship, so it’s their data to own and allocate, and not for any technology partners. This looks like a reasonable position to take in regard to a sensor vendor: I’m not sure that it’s sustainable to sell commodity GPUs, cameras or LIDAR on their own and want to keep the data. But the company that makes the actual autonomous unit itself needs to have the data, because that’s how it works. If you don’t cycle the data back into the technology it can’t improve. This means that the OEM is generating network value for a supplier without getting any of that value itself, except in the form of better autonomy, but that better autonomy becomes a commodity across all products from any OEM using it. This is the position of PC or Android OEMs: they create the network effect by agreeing to use the software in their products, and this makes it possible to sell their products, but their product has become a near-commodity with the network value going to the tech company. It’s s virtuous circle where most of the value goes to the vendor, not the OEM. This is course is why most car OEMs want to make it themselves: they don’t want to end up like Compaq.

This leads me to the final question: how much data do you really need? Does the system get better more or less indefinitely as you add more data, or is there an S-Curve – is there a point at which adding more data has diminishing returns?

That is – how strong is the network effect? 

This is a pretty obvious question for maps. What density of cars with what frequency do you need for the maps to be good enough, and what minimum market share does that translate to? How many participants does the market have room for? Could ten companies have this, or two? Could a bunch of second-tier OEMs get together and pool all of their mapping data? Can delivery trucks sell their data just as they sell other kinds of mapping data today? Again, this isn’t like consumer software ecosystems – RIM and Nokia couldn’t pool Blackberry and S60 user bases, but you could pool maps. Is this a barrier to entry or a condition of entry?

This question also applies to driving data, and indeed to all machine learning projects: at what point are there diminishing returns as you add more data and at what point does the curve flatten, and how many people can get that amount of data? For, say, general purpose search, the improvement does seem indefinite – the answers can (almost) always get more relevant. But for autonomy, deterministically, it does seem as though there should be a ceiling – if a car can drive in Naples for a year without ever getting confused, how much more is there to improve? At some point you’re effectively finished. So, a network effect means that your product gets better if you have more users, but how many users do you need before the product stops getting significantly better? How many cars do you need to sell before your autonomy is as good as the best on the market? How many companies might be able to reach that? And meanwhile, machine learning itself is changing quickly – one cannot rule out the possibility that the amount of data you need to get autonomy working might shrink dramatically.

Implicit in all of this, finally, is an assumption there is even such as thing as better and worse autonomy. But what would ‘worse’ autonomy mean? Would it mean you are slightly likely to die, or just that the car is more likely to get confused, pull over to the side of the road and connect to a remote support centre for a human operator to take over? Would manual controls burst out of a console in a shower of polystyrene packaging, and would the car make encouraging comments?

IMG_0069.jpg

The answer, I suspect, is that Level 5 will come as an evolution out of Level 4 – that every car will have manual controls, but they will be used less and less, and explicit Level 5 will emerge in stages, as the manual controls shrink, and then are hidden, and then removed – they atrophise. This will probably come by scenario – we might have Level 5 for Germany before Naples, or Moscow. This would meant that the data was being collected at network scale and used well before full autonomy.

We can’t really know the answers to these questions now. Very few people in the field expect full, ‘Level 5’ autonomy within the next five years and most tend closer to ten years. However, they point to a range of outcomes that would have dramatically different implications for the car industry. At one extreme, it might be that network effects are relatively weak and there are five or ten companies with a viable autonomy platform. In this case, the car industry would buy autonomy as a component at a price much like ABS, air bags or satnav today. It would still face radical change – autonomy means the cost of an on-demand ride falls by at least three quarters, which would make many people reconsider car ownership, while the shift to electric reduces the number of moving parts in a car by five to ten times, totally changing the engineering dynamics, supplier base and barriers to entry. But it wouldn’t get Androided.  At the other extreme, only Waymo gets it working, and the industry would look very different.