A few years ago, the government of Punjab decided that it wanted to give a subsidy to the manufacturing sector with the intention of curbing child labour. Now, by any standard, this is a good policy move that should be supported bipartisanly. Subsidies to the manufacturing sector are good, and subsidies aimed at ending child labour are even better. Normally, it is not the spirit of the initiative that comes under doubt, but rather its execution and the allegations of corruption that arise around it.
In the case of this subsidy, for example, the first step to implementation was collecting data about different manufacturing industries. The official number was that there were 23,392 such industries. Now, the government could very well have carried on with this number, but instead chose to hire a third-party source to double-check and confirm this number. As a result, an independent third-party survey was carried out, in which the locations of all such industries were geo-tagged. The results of this survey revealed that in reality, on the ground, there were 66,318 manufacturing industries eligible for the subsidy.
Now, if the government had decided to give, say, Rs 1 million as subsidy per industry, Rs 22 billion would have been allocated in the Annual Development Budget, instead of the Rs 66 billion that were actually required for the initiative to be successful. As a result, countless children would have continued to suffer as child labour, and nearly two-thirds of the people that the policy was aimed towards would not have gotten their fair share. As a result, they would have cried foul, and allegations of corruption would start to be hurled around. Neither side would be wrong here, and the perception that there was corruption in this situation would be caused because of a lack of accurate data.
All of this points towards an absence of centralized data, which means different departments and offices are collecting their own data instead of using a wider infrastructure, making both the data collection and the access to it incredibly difficult and inefficient. This centralized data would then not only make the lives of policy makers and government departments easier, but a lot of this data would be available publically, and could be used by businesses, entrepreneurs, students, researchers, and organizations.
The fault in our data
As the world continues to revolve more and more every day around more accurate, sensitive, precise, and specific data – policy making machinery will have to catch up. One way in which government’s can both control corruption perception and make sure there is no real corruption either is trying to make their data acquisition more sophisticated.
To address the corruption perception issue, it is critically important that data wins on the policy table, instead of verbosity. For this, it is important that the Bureau of Statistics be revamped. The first thing that must happen is that the latest scientific methods must be adopted the same way they have been by the developed world. For this, they must have funds to send their statisticians abroad to attend international conferences on data collection and statistics. Once this starts happening, we need to make sure that we are getting the right human resources, and then providing them with the best set of technologies available in the market. In the long-run, this will be an important and successful investment.
Effective data management
Essentially, of course, this is as much a question of data collection methods and scientific technique as it is of management, and making sure that the data that has been collectd is being shared and systematically used. The key to this is a concept called Master Data Management (MDM), which is defined as data that is shared across teams and IT applications and defines key information of an organization, company or public sector departments such as assets, locations, reference codes, financial hierarchies, products, customers or suppliers.
Essentially, what this means is that there is a central data repository accessible to different departments and stakeholders. Master data management (MDM) involves creating a single master record for all critical business data from across internal and external data sources and applications. This information becomes a consistent, reliable source for an organization.
It is a technology-enabled discipline in which business and Information Technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official shared master data assets. A system like MDM makes sure that there are no inconsistencies in data and that it is efficiently shared and used across an organization, or in this case, across government institutions.
The natural question is how MDM might be implemented in Punjab, or anywhere else for that matter. For this, it is important to understand that it can be done through different techniques, or styles. One way is to make it centrally authorized, which means that there is one central source where all information is not just stored but created, and it is from this point that it is pushed down into different applications and streams. This would mean data would be created at a central level by a data authority, and then provided to departments.
Another way is the opposite of this, which is that different departments would collect their own data, and that then this data would be consolidated onto a central MDM platform, and will be accessible to all departments. So for example, in this way, the agriculture department would be responsible for collecting their own data, but this would be centrally uploaded and available to the housing and urban development department as well if they may need it.
The other methods are crosses between these two. One is coexistence, which is a mashup of centrally authored and consolidation styles that allows for the creation of data in multiple systems. This would mean departments would still create and upload their own data to the MDM platform, whereas the MDM platform would also create its own data. There is also the possibility of creating a registry, where rather than consolidating all records, the MDM platform joins and aligns unique identifiers across all the systems into intersection tables.
In every single one of these methods to apply MDM, the main building blocks are always going to be creating centrally available data that interacts with other data, collected in the latest scientific ways.
What the Punjab government is planning
The method that the government of Punjab wants to pursue is consolidation. This is mostly because a lot of departments already have applications and data banks that can easily be centrally consolidated, offering the best results. For this, there are a few steps that will have to be diligently followed.
The first is identifying data sources that we have, and making sure that all datasets available with any departments are identified. After this, the data must be scrutinized, which involves data cleansing, conflict resolution, and profiling. This means the data is analyzed, bad data is removed, any conflicts in the data are figured out and the data is ‘profiled’ into appropriate categories.
This is when it will finally be possible to create an internal master data repository. During profiling, the data is classified into three main categories of shareable, common data, and sensitive. These categories are saved together in a repository called Internal Master Data Repo (IMDR) for internal consumption. Now, at this stage the biggest priority is security. It is important to ensure that data classified as shareable is anonymized before loading into External Master Data Repo (EMDR) so that no personal or sensitive information is identifiable. The EMDR comes after anonymization, when the now secure shareable data and the common data is saved in a repo for external consumption. Golden records (or big numbers) aggregated from different datasets can also be stored separately in the EMDR for instant access of consumers.
After this everything is mostly based on making accessing the data more efficient. In addition to direct access to EMDR, data consumers can also subscribe to receive periodic updates through a service bus called Data Service Bus.
How will this help?
If properly implemented, this will enable the availability of the right data to the right people, at the right time, and at the right place. Just imagine making all useful datasets digitally accessible (after anonymizing personal and sensitive information) to businessmen, traders, merchants, and budding entrepreneurs.
These datasets may include but are not limited to population census, the sale and purchase of movable and immovable properties, imports, exports, toll taxes collected on highways, domestic travel, transportation of goods, shipping, commodities in wholesale markets and many other datasets. This master data repository would enable businesses to gauge their potential consumer base as their market size and plan accordingly for their respective product/service at any given point in time.
Considering how impactful this can be, one cannot help but wonder what is stopping us from opening these datasets, and making them searchable via free text mechanism just like Google’s search engine? Also, why can we not have web services for real-time data sharing? As an incentive to the government, there is also no harm in the government monetizing its data, wherein the government gives a certain amount of generic data for free and then charges slightly on each detailed level of granularity. All this can be done digitally over the internet. All the payments for getting more specific datasets can be made online via credit and debit cards.