Howard Butler AWS Public Dataset 3DEP Data Lead
Lidar News recently had the opportunity to interview Howard Butler of Hobu Inc., who along with researchers from the USACE CRREL worked with the USGS and the Amazon Web Services (AWS) Public Dataset Team to create free, public access to the 3D Elevation Program (3DEP) map database in the cloud. If you want to see some amazing datasets be sure to have a look.
How did this idea come about?
Howard Butler – A couple of things came together to trigger this idea. First, nearly two years ago, the Amazon Web Services (AWS) Public Dataset team and the District of Columbia’s CTO released the DC LiDAR dataset as an AWS Public Dataset. At the time, we used our open source Greyhound and Entwine software with that dataset to process that data into a web-viewable interaction using Potree.
Through our collaboration with CRREL, we learned that USGS had begun pushing full density 3DEP data into a Requester Pays bucket. Its purpose was to allow anyone to process the data at scale in AWS. We had been continuing to prototype and deploy large scale point cloud data management solutions with CRREL, and we were looking for a way to test the deployment of our tools on a very large collection of data. The 3DEP data was well suited to that task, and making the resulting output available as an AWS Public Dataset allows our efforts to serve the LiDAR and elevation community as well.
I approached the AWS Public Datasets (PDS) team about a project that took the Requester Pays USGS 3DEP bucket and processed it into Entwine Point Tiles (EPT) , which is an open, static, implicit, and lossless octree-like tile service for point cloud data. The PDS team asked for a proposal and gave us a grant to process the data and make it available as a free-to-download-and-use AWS Public Dataset.
What was your involvement with this effort?
Howard Butler – I came up with the idea to process the 3DEP data into an AWS Public Dataset, and I led the administrative churn to put the effort in motion. Connor Manning of Hobu, Inc. developed the systems and tooling used to monitor and process the data transformation task with Entwine, an open source software project of which he is also the lead developer. Jason Stoker of USGS helped route 3DEP data questions, and David Finnegan of USACE CRREL supported our time to complete the processing task.
How does this make it easier for people to access/use 3DEP data?
Howard Butler – EPT is a static octree tiling data organization for really large point cloud collections. It is an open specification that leverages the widely supported LASzip data encoding, and it allows applications to query both breadth and depth to free applications from fixed, full-depth tiling schemes. This organization can easily support visualization, like we did with the website, but it can also give processing scenarios options to control query pace and query patterns through the data. The first step in many processing algorithms is reorganization of the data to some kind of spatial structure, and EPT provides that bit of effort in a commonly desired (octree) one.
EPT is different than some other static web service approaches such as Cesium 3D Tiles or Esri I3S that can support point clouds. First, EPT only supports point cloud data and it is not a generic container for hierarchically organized geospatial data. Second, EPT can leverage LASzip, the most common compressed encoding for geospatial point clouds for storage, to eliminate the need to support another data encoding. Third, EPT can losslessly store points to any tree depth with all their attribute data and any supporting JSON metadata.
The AWS Public Datasets team has graciously made the entire USGS 3DEP EPT collection free-use, and applications are welcome to start using the data to pull in point cloud data. For applications that want bulk access to entire collections, the full density data in the Requester Pays bucket is going to be more convenient and faster. For applications that want to visualize, quickly scan and pan across data and adaptive or low resolutions, or simply summarize small areas of large collections at full density, EPT is a great solution.
What were some of the challenges you had to overcome?
Howard Butler – The 3DEP data is not a monolith of consistency – especially for older collections that did not benefit from the LiDAR Base Specification and activities that sought to harmonize the consistency of the data products. We estimate we were able to conveniently process about two thirds of the data, but another third of the data has challenges with consistency that prevented hands-off processing. We may yet go back and catch up some of these collects, where many were simply missing coordinate system information, but we want to make sure people are using the existing collection with some regularity before continuing the build out.
The other obvious challenge is simply the volume of data. Getting through terabytes of data is tough, even with the tools of the cloud at your disposal. Hobu, Inc. has been at it for a while, and we were excited to be able to construct services that had the potential to unlock the raw LiDAR data content that most beyond our industry know little about. More specifically, encountering the challenges and developing solutions to overcome them was a lot of what the project was about in support of our activities with CRREL. We support them developing continental-scale point cloud management solutions for the GRiD project and we are deploying these tools in that context as well.
Were there some lessons learned?
Howard Butler – Consistent collection metadata is extremely important. The 3DEP Requester Pays bucket is currently 1.4+ million files, and there is no way for us to inspect or manage it individually. Without correct and consistent metadata, the ability for tools to adaptively process the data is severely hampered.
We also learned a lot about heuristics and parameter tuning suited for processing such large aerial collections with Entwine. That experience with such a large corpus allowed our processing efficiency to improve as we worked through more data.
How can we spread the word and encourage people to use it?
Howard Butler – Ask your favorite application for EPT support. EPT is a completely open specification, based on an open encoding that nearly every application already supports (LASzip), and for applications that can stream or progressively access point cloud data, support for EPT should be a natural fit. Open source tools, such as Potree and PDAL already have support for it, and commercial ones such as FME have announced upcoming support for it.
I’m thankful to AWS for giving our industry free access to the data to build the applications we need upon it. If you build something interesting from it, please let us know. Write a blog post or let the AWS Public Datasets team know how this capability enabled you to build something you otherwise could not have achieved.
Note – If you liked this post click here to stay informed of all of the 3D laser scanning, geomatics, UAS, autonomous vehicle, Lidar News and more. If you have an informative 3D video that you would like us to promote, please forward to firstname.lastname@example.org and if you would like to join the Younger Geospatial Professional movement click here.