Moving towards a data platform for our city should allow API and other departments both to simplify our operational workstreams and vastly improve our analytical abilities moving forward. Though this prospect is exciting to us over here in data land, oftentimes talking about data infrastructure can easily drift into getting lost in the weeds of acronym soup. So, in an attempt to stave off this fate we’ve decided that rather than discussing the new platform from a strictly technical approach it might be more prudent to take the “5,000-foot view” and attempt to explain more or less:
What’s a data platform and why do we need one?
In what ways will a data platform deliver value to not only city departments, but city residents as well?
What is a ‘Data Platform’:
“Another flaw in the human character is that everybody wants to build, and nobody wants to do maintenance,” Kurt Vonnegut said this in jest, but the quote highlights our tendency not only as individuals but as organizations, to gravitate toward new flashy tools that (we think) will solve all of our problems and think very little of the ongoing maintenance that is going to be involved with ensuring the that new systems don’t fall into delinquency or disrepair. As organizations develop over time their operational as well as analytical needs will almost certainly change. A technology system that worked well for one department at one time may become depreciated, whether it be by a vendor no longer supporting the product, the needs and responsibilities of the department that relies on the system changing, or improved technologies being introduced that do not interact well (or at all) with the previously used systems. Here at the City, we have experienced the above issues, and as a result some of our systems, and our processes for working with and around these systems, have found themselves in need of a little TLC.
So, this is where our concept of a new flashy data platform comes into play.
But didn’t we say earlier that we should avoid ‘new flashy tools’? To look at it another way: a data platform isn’t really a new tool so much as it is a collection of tools and policies to help an organization better manage, and better leverage, its data.
The data platform that we wish to develop will be built with the idea of maintenance in mind. As a central repository for all organizational data, the data platform will serve as a modular hub for connecting various different operational workstreams together under one coherent and consistent roof. We need to remain mindful that as our operational and analytical needs change, so too will the infrastructure of our platform to better model those needs. Well… let’s slow down, recognizing that a lot of what was just said is one step short of a technical buzzword, acronym soup we’ve been trying to avoid, so let’s try a visual:
Above, we can see what the overall change in the process should look like. Whereas currently, we are directly feeding data from our source systems and into our end-user applications and reporting metrics, the data platform will stand as an intermediary to ensure that data is accurate, up to date, and ready for reporting. The bulk of this work lies in efficiently ingesting data from source systems into our platform, and then learning what transformations, cleaning, and contextual information needs to be included with that data to ensure that the deliverables produced by our analysts and customers are accurate and context-aware. In this way, it is best to think of our data platform, not as a new tool, but as a new operational paradigm, a way to structure all our siloed independent systems so that data can be integrated and more easily accessible across and outside of our organization.
What our current progress has been:
Thus far we have made some good progress in identifying datasets that would be great operational use cases for our new data platform, as well as inventorying the underlying data that will be needed in order to bring those projects to fruition. We have identified two major projects that are going to be our first use cases for the data platform that should serve as learning experiences and templates for our further expansion of the platform. These projects should also improve the City’s operations as well as create datasets that can be viewed by the public, allowing for greater accountability and transparency with our constituents. The two current use cases we are perusing are:
An analytical database for our permitting process. This should allow us within the city to not only have a greater understanding of the permitting process and where improvements can be made but should also allow us to make further use for collate and cultivate new and interesting datasets for our open data platform. Additionally, we hope to make end-user applications for both internal and external consumption of this data, so that customers are less in the dark about the state of their permit applications.
A revamped water billing process. This should allow us to not only simplify and save time in the water department, but also allow a public front-facing view into some of our water data - which could save people the hassle of learning too late about potential leaks, shut-offs, or underuse of water at their properties.
While going through this process we will also have an eye on cleaning up some of our old databases and understanding what data is needed to get a comprehensive picture of all the data that this city is in possession of and maintaining. This should help to spur on not only further publicly available datasets but projects that can go deeper to help improve the city’s efficiency when it comes to every day as well as long-term tasks. It can also provide us a sense of what data might be lacking, and where we can make improvements so that we can continue to provide novel solutions to existing problems in the future.
Where we hope to go:
Much of the long road of learning is still ahead of us and we look forward to all of the baby steps (and mistakes) we will make in the implementation of this new data infrastructure. With a solid foundation already beginning to take shape and a little bit (okay, lotta bit) of work we should be able to slowly (but surely) integrate more data streams into the platform so that our users can have more reliable and up to date information. We look forward to reporting on all of our steps throughout this process and keeping everyone updated on our continued progress towards a more integrated, coherent data system that allows our organization to better serve and understand the needs of our community.