UN Comtrade数据共享平台设计与实现

Design and implementation of UN Comtrade data sharing platform

  • 摘要: UN Comtrade(United Nations international trade statistics database)是全球最大且应用最为广泛的国际贸易数据库,具有高权威性与完整性.本文在框架体系结构设计与数据表结构设计基础上构建UN Comtrade数据共享平台,以期为地理学研究提供数据与工具支撑.在数据聚合策略方面,平台通过综合集成数据爬取、加载模块并嵌套多种错误修正方法,实现5亿多条商品贸易记录的动态高容错聚合.在检索策略方面,平台通过分区复合索引提升数据检索指令执行效率与可扩展性.检索试验表明,平台能够在80用户并发模式下稳定执行不同类型检索指令,并且通过调用ODBC/JDBC接口将计算过程融入检索任务,可以更加有效利用服务器端资源并节省数据传输与读写耗时,具有效率更高、简化数据处理过程等优点.2017年平台被应用于中-美商品显性比较优势特征检索-计算-格网化表达-对比分析,案例表明平台具有高效、稳定的并发检索效率,以及高可扩展性等优点,可以为贸易特征计算与分析提供便捷快速、形式多样的数据共享服务.

     

    Abstract: United Nations international trade statistics database (UN Comtrade) has provided important data support and application guidance on strengthening the recognition of trade system rules and their driving factors, developing international trade measurement methods, depicting global trade pattern and its changing process. Furthermore, it has been more and more widely applied to global ecological conservation, water-energy-food-land systems, pollution control, energy management, national security and other topics of geographical researches.In this paper, we establish the UN Comtrade data sharing platform with Oracle DBMS based on design of framework and data table structure.This platform is designed to make up for the database’s deficiencies in data sharing methods and retrieval interfaces as well as provide data and tool support for geographic research.We develop an automated data archiving module with Python 3.6 and its Scrapy framework 1.5.2, which achieves a dynamic, stable, and highly fault-tolerant processing of more than 500 million trade records by comprehensively integrating data crawling module, data loading module and nesting multiple error correction methods.In addition, the platform improves the efficiency and scalability of data retrieval instruction execution through range partitioning, partitioned composite indexing, and open ODBC/JDBC interfaces based on the structured characteristics of data.Experiments show that the platform can stably execute different types of retrieval instructions in a concurrent mode of 80 users.By invoking the ODBC/JDBC interface to integrate the calculation process into the retrieval task, the system can more effectively use server-side resources and save time for data transmission, reading and writing with higher efficiency and simplified data processing.Based on the platform, we develop a data query and sharing client and apply it to the retrieval, calculation, grid representation and comparative analysis of the explicit comparative advantage characteristics of Chinese and American products in 2017.The application shows that the platform has the advantages of high efficiency, stable concurrent retrieval efficiency and high scalability.It can provide more convenient, fast, and diverse data sharing services for the calculation and analysis of trade characteristics research.