Taking A Look At Commercial Fisheries Data From National Oceanic And Atmospheric Agency01 Sep 2015
Someone found me on Twitter regarding our Adopta.Agency work, and introduced me to the commercial fisheries data from the National Oceanic and Atmospheric Agency (NOAA). The dataset(s) fit the profile of what I am looking to evolve with Adopta, because it is potentially extremely valuable to an industry, it is currently locked up in non-machine readable formats, but more importantly I has a domain expert who was passionate about what could be done with the data.
When you land on the home page for the commercial fisheries statistics, you experience classic government "open data", which in reality is anything but, open. Putting it on the web for humans to read, does very little for allowing anyone to analyze, and potentially build anything on top of.
Half of the links take you to PDFs, which is one of the most unusable ways to get data, while the other half takes you to pretty complex forms, which allow you to query the data.
I will always choose an HTML form, over a PDF, because I can almost always write some sort of script for parsing the data in a form. It will take some work, but eventually I should be able to programmatically pull the raw data via these forms. Most of the time, I will walk away from PDFs--they just aren't usually worth the work.
Many IT folks will argue that publishing HTML forms, makes government data available. I have even see government agencies successful defend against FOIA requests, by stating it is available as a form. This is just irresponsible, and encourages scraping, and misuse of government resources (aka bandwidth and compute). If there was a simple download link to a zipped up file, it would be way more efficient and responsible.
The person who contacted me about the NOAA commercial fisheries data isn't 100% up to speed on how to do Adopta projects, and how work with Github--something we will change in a couple weeks, then we'll encourage her to take over the project. I am going to do some of the heavy lifting in writing the scrape scripts for extracting the data via the forms, and publish to a Github project I just setup--stay tuned!