Data and Methodologies Overivew
With the objective to study the historical pattern of tech investments in Japan and China from 2000 to 2016, I plan to start with a macroscopic view by obtaining country-level data on tech investments and then divide the data into government-led investments and private-led investments. Due to the variety of technologies, in this research, I focus on investments in Internet, automation and advanced analytics.
However, after accessing the relevant data from Economist Intelligence Unit and Bloomberg,
I have run into a few difficulties in terms of data acquisition and data manipulation:
1. There are difficulties in acquiring specific data for the state-led investment since country-level data is usually too general and official government announcements don’t include specific numerical investment amount. When EIU shows as country-level data, it is hard to distinguish whether certain investment includes both government endowment and private investment or just state-led investment.
Attempted solutions:
- Compare the two countries digital policies and present them through two timelines.
- Scrap the news headlines using the keywords “Japan” or “China”, “Technology” and “Policy” in major newspapers including The New York Times Archive, Wall Street Journal and Financial Times from 2000-2016 and run a topic modeling to see whether there are different keywords/trends unique to the two countries.
2. For private sector investment, the most conclusive database I found is Bloomberg’s deal list of 5000 M&A transactions that involved Japan and China[1] from 2000-2016. It includes Deal Type, Announce Date, Target Name, Acquirer Name, Seller Name, Announced Total Value (mil.), Acquirer Country, Seller Country, Seller Industry Sector, Acquirer Industry Sector, Acquirer Industry Subgroup, and Seller Industry Subgroup.
One big issue with the Bloomberg data is that there are many “N.A.” listed in Seller Country and Seller Industry Sector, which posts an issue when I try to establish the connection between the seller and acquirer to see where companies in Japan and China most heavily invested in.
Attempted solutions:
- I decided to visualize the limited available data in Tableau to obtain a preliminary assessment of how the two countries’ companies are investing.
3. Cleaning the data: I have acquired two versions of cross-border M&A deals from China and Japan in the past 16 years from Bloomberg and Zephyr but the data in both versions are messy and require further clean up solution.
After rounds of data cleaning that got rid of the duplicates that are next to each other, such as “U.S., U.S.,” but I could not delete the duplicates showed in Figure 2 where the same value is blocked by a different value. After the first round of cleaning, as shown above, there are duplicated country name in the cell, which can lead to errors in counting the number of acquisitions made by each country.
Attempted solutions:
- Use Open Refine, Excel Macro, and Tableau to clean up the data
- After preliminary data manipulation, put the data in databasic.io to see if there are significant patterns that can be used for further analysis, then input the data into Tableau to visualize the results.
[1] I defined the search to M&A deals that took place in Japan or China. This could mean that Japan and/or China are the acquirers and/or targets, but many other countries can be included in these transactions.