By Jake Magner
Published on September 30, 2024
You may have heard that companies are selling data. Do you ever wonder where you can buy it?
AWS Data Exchange is a marketplace where companies publish potentially useful datasets they have for sale. You can view samples, subscribe to datasets, and then have those datasets delivered to your AWS storage systems like S3, Redshift, etc.
There is a wide range of data sets available on AWS Data Exchange, including:
Financial
Automotive
Retail
Location Data
Public Sector
and more
Vendors have long built and sold datasets. Historically, most of these sales were done offline through an enterprise-style sales process. While this is still common, today, some of those datasets are becoming available on AWS Data Exchange and other platforms, where they are easier to browse, instantly purchase, and integrate into your existing data ecosystem.
First, you need an AWS account. Go to the AWS Data Exchange by typing “data exchange” in the search bar and then browse datasets.
Click on a dataset to learn more about it as well as the company providing that dataset. Some will include samples of the data or a data dictionary explaining all the columns.
You can then press subscribe, at which point you may have to fill out a request and wait for the provider to approve it, or you may get access right away, depending on the datasets.
You can find the datasets to which you are already subscribed on the left. When you click on one you can find the datasets inside of it; by clicking through them you can find delivery options for where you want to receive that data, (for example in an S3 file in your environment).
You’ll find that while some datasets are available to nearly instantly purchase and access, other datasets are just teaser samples and the real datasets require talking to the provider’s sales team.
AWS Data Exchange itself is free. You pay for subscriptions to datasets and the prices range from free to hundreds of thousands of dollars. Many datasets are free, either because it’s public data, or because the provider is giving something away so you can try it in the hopes to upsell you later on other datasets they have for sale.
Yes, however, the term “data marketplace” can mean different things in different contexts. Often, when people talk about a data marketplace, they are referring to an internal application that shows data products available within your enterprise. These data products are created by internal teams and shared with other teams. AWS Data Exchange, by contrast, offers datasets from a range of global organizations.
Both marketplace types showcase data products you might want to access. In the internal case, data is not sold; it’s just shared. But depending on how your company has set it up, you might still have to request access and go through a process to gain access.
Yes. There are several similar platforms that support the discovery and acquisition of datasets from third-party providers. Snowflake has a similar platform (Snowflake Marketplace) for users to browse data products.
By contrast, Datarade is a smaller company more focused on this specific use case; they have dataset samples and a smooth user interface for browsing. You can also find datasets from many governments on their own sites (like data.gov for US datasets).
Many datasets are cross-listed on each marketplace, so you can pick the dataset that is most compatible for your tool stack (AWS versus Snowflake, for example).
All sorts of companies may buy useful datasets for various reasons. Some common examples are insurance companies and hedge funds.
Insurance companies are trying to evaluate risk. They will experiment with different datasets to see if they are predictive of risks and, if so, apply them to their models. For example, if an insurance company is exploring insurance packages for wildfire protection, they may analyze a dataset on roofing materials.
Hedge funds seek a unique competitive edge to predict market trends. They often leverage“alternative data”, so-called because it is separate from the basic financial data produced by companies for investors. One example of such data would be anonymized cell phone location data. A hedge fund may use it to see trends in visits to retail stores, which could be used to trade a specific stock or to estimate macro trends like consumer purchasing before the government publishes official numbers.
While it depends, there are some common reasons that providers may choose not to make their datasets available online and continue to sell them through other mechanisms:
Data purchase agreements often contain legal provisions about what you are allowed to do with that dataset. Those terms may vary depending on the dataset and use case, which would require some offline process to get through legal review
Commonly, the data buyers haven’t even realized all the ways they might get value from a dataset, or they don’t have the expertise to use it the way they want. Data providers can’t get them to purchase through a self-service marketplace in these cases. They prefer instead to start a dialogue with the potential buyer to help illustrate the potential uses and offer other services to help the buyer get a full solution.
Some data sellers find that offering their datasets in a self-serve way cannibalizes their existing business. For example, they may do better bundling datasets into one offering that doesn’t fit the model in these marketplaces. When they try to sell them a la carte their overall profit drops.
Some data providers are on older tech stacks and just haven’t prioritized moving their data into these platforms. They also may be selling to companies that aren’t prepared to access the data through these systems. Some data transactions have been happening over an SFTP server every week for many years. Vendors are popping up to make the process of selling data on a marketplace easier for providers so it’s likely this will be less of a reason over time.
If you already purchase datasets from a vendor through another mechanism, you might be able to purchase through AWS Data Exchange instead and make use of AWS credits you have.
Before hitting “buy” on AWS Data Exchange, it’s worth investigating other marketplaces, as you might be able to find that dataset for free somewhere else. Many vendors are taking publicly available and free data and then formatting it nicely and selling it. If you don’t have much budget, search around for that dataset to learn how they got it. You might be able to get it for free.
However, keep in mind that cleaning and formatting the data nicely (as well as resolving issues) is real work. It’s often better to just pay someone for these services, as they are probably much more efficient at it than you will be. For example, a lot of forecasting weather data is based on free data from the government. But that free data is enormously complex, it’s given as readings from specific weather stations rather than a simple table (with a timestamp, zip code, and temperature) that you might find from a vendor.
AWS Data Exchange offers businesses an efficient way to access and share datasets. By simplifying the process of finding, subscribing to, and integrating data, AWS Data Exchange helps organizations innovate, make more informed decisions and unlock new insights. With flexible pricing and a wide variety of data sources, it’s worth exploring for any organization looking to enhance their data capabilities.