Responsive image
博碩士論文 etd-0016118-155537 詳細資訊
Title page for etd-0016118-155537
論文名稱
Title
以文字探勘建立軟體專利適格性與專利價值之預測模型研究
The Prediction of Software Patent Claim Eligibility and Patent Value using Text-mining Techniques
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
142
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-12-18
繳交日期
Date of Submission
2018-01-16
關鍵字
Keywords
文字探勘、預測模型、專利分析、專利適格性、專利價值
text-mining, prediction, patent analysis, patent eligibility, patent value
統計
Statistics
本論文已被瀏覽 6110 次,被下載 352
The thesis/dissertation has been browsed 6110 times, has been downloaded 352 times.
中文摘要
本論文以文字探勘之技術與機器學習的方法,建立預測電腦軟體適格性與專利價值之模型。由於現今電腦軟體性質的演算法所包含之抽象概念(abstract ideas),於美國的專利法下屬於專利法適格標的之例外,由於2014年 美國Alice v. CLS 等案例否定電腦軟體之適格性,美國最高法院期望釐清此一抽象概念的界線,導致目前電腦軟體之專利申請,在歷年來的判例下,可能被排除於專利保護領域之外。但由於至今商業方法及資訊科技之蓬勃發展,電腦軟體產業之專利申請,對所有相關產業亦有關鍵影響,如何適度保護電腦軟體專利,且如何界定抽象概念與可專利之技術界線,則成為近年來專利法領域重要之議題。因此本研究,針對電腦軟體專利之專利範圍,利用文字探勘之技術,與機器學習之方法,建立專利適格性預測模型,並加以整合專利之申請流程紀錄,與專利文件之資料分析之特徵值,利用訴訟之軟體專利進而建立專利價值之預測模型,研究成果在預測專利適格性之成果可達80%準確率,預測軟體專利性更可達90%的準確率,最後在預測軟體專利價值的準確率在五年內可達88%,以利未來電腦軟體專利申請與軟體專利價值之預測。
Abstract
With the widespread of computer software in recent decades, software patent has become controversial for the patent system. Software patents may easily fall into the gray area of abstract ideas, whose allowance may hinder future innovation. However, without a precise definition of abstract ideas, determining the patent claim subject matter eligibility is a challenging task for examiners and applicants.   In this research, we address the software patent eligibility issues by proposing an effective model to determine patent claim eligibility and examine the patent examination process to predict patentability. Furthermore, with patent claim features and important prosecution events, we attempt to identify important indicators to valuable patents.
We collect patent claims, patent examination records, and patent litigation data of software patents from USPTO website, USPTO PAIR, Google Patents, and MaxVal's Patent Litigation Databank. The experiment results show our patent claim eligibility model reaches the accuracy of more than 80%, and domain knowledge features play a crucial role in our prediction model. Using sequence learning on patentability, our patentability predictive model can achieve around 90% accuracy based on our time-duration features. With the value indicators identified by previous models and prior studies, the accuracy of our patent value model can reach up to 88%.
目次 Table of Contents
Table of Contents
Chapter 1 Introduction 1
Chapter 2 Background Information on Patents 12
2.1. Subject matter eligibility (SME) 12
2.2. Patent Examination at United States Patent and Trademark Office (USPTO) 16
2.3. The USPTO Patent Assignment 19
Chapter 3 Related Work 22
3.1. Patent Analysis 22
3.2. Patent Examination 28
3.3. Patent Value 31
3.4. Machine Learning Classifiers 40
Chapter 4 Methodology 43
4.1. Research Framework for Patent Predictive model 43
4.2. Data Collecting and Data Processing 45
4.3. Patent Claim Eligibility Model 50
4.4. Patent Prosecution Model 62
4.5. Patent Value Model 67
Chapter 5 Evaluation 77
5.1. Patent Claim Eligibility Model 77
5.2. Patent Prosecution Model 88
5.3. Patent Value Model 96
Chapter 6 Conclusion 121
Chapter 7 Limitations and Future Research 125
References 126
參考文獻 References
Abbas, A., Zhang, L., & Khan, S. U. (2014). A literature review on the state-of-the-art in patent analysis. World Patent Information, 37, 3-13.

Abraham, B. P., & Moitra, S. D. (2001). Innovation assessment through patent analysis. Technovation, 21(4), 245-252.

Administration, T. I. T. (2015). Retrieved from https://www.selectusa.gov/software-and-information-technology-services-industry-united-states

Allison, J. R., Lemley, M. A., Moore, K. A., & Trunkey, R. D. (2003). Valuable patents. Geo. Lj, 92, 435.

Allison, J. R., Lemley, M. A., & Schwartz, D. L. (2013). Understanding the realities of modern patent litigation. Tex. L. Rev., 92, 1769.

Bessen, J. (2008). The value of US patents by owner and patent characteristics. Research Policy, 37(5), 932-945.

Bessen, J., & Meurer, M. J. (2013). The patent litigation explosion. Loy. U. Chi. LJ, 45, 401.

Bird, S. (2006). NLTK: the natural language toolkit. Paper presented at the Proceedings of the COLING/ACL on Interactive presentation sessions.

Bishop, C. M. (2006). Pattern recognition and machine learning: springer.

Boalick, S. R. (2003). Patent Quality and the Dedication Rule. J. Intell. Prop. L., 11, 215.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Carley, M., Hedge, D., & Marco, A. (2015). What is the probability of receiving a us patent. Yale JL & Tech., 17, 203.
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. Paper presented at the Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining.

Chengcheng, L. (2010). Automatic text summarization based on rhetorical structure theory. Paper presented at the Computer Application and System Modeling (ICCASM)

Choi, J., & Hwang, Y.-S. (2014). Patent keyword network analysis for improving technology development efficiency. Technological Forecasting and Social Change, 83, 170-182.

Chuzhanova, N. A., Jones, A. J., & Margetts, S. (1998). Feature selection for genetic sequence classification. Bioinformatics (Oxford, England), 14(2), 139-143.

Clizer, J. (2015). Exploring the Abstact: Patent Eligibility Post Alice Corp v. CLS Bank. Missouri Law Review, 80(2), 10.

Cohen, J. E., & Lemley, M. A. (2001). Patent scope and innovation in the software industry. California Law Review, 1-57.

Das, G., Lin, K.-I., Mannila, H., Renganathan, G., & Smyth, P. (1998). Rule Discovery from Time Series. Paper presented at the KDD.

Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC bioinformatics, 7(1), 3.

Dietterich, T. (2002). Machine learning for sequential data: A review. Structural, syntactic, and statistical pattern recognition, 227-246.

Dong, G., & Pei, J. (2007). Sequence data mining (Vol. 33): Springer Science & Business Media.

Durham, A. L. (2014). Two Models of Unpatentable Subject Matter. Santa Clara Computer & High Tech. LJ, 31, 251.

Dykeman, D., & Kopko, D. (2004). Recording Patent License Agreements in the USPTO. Intellectual Property Today, August, 18, 19.
Feldman, R., & Sanger, J. (2007). The text mining handbook: advanced approaches in analyzing unstructured data: Cambridge university press.

Feng, V. W., & Hirst, G. (2014). Two-pass discourse segmentation with pairing and global features. arXiv preprint arXiv:1407.8215.

Feng, V. W., Lin, Z., Hirst, G., & Holdings, S. P. (2014). The Impact of Deep Hierarchical Discourse Structures in the Evaluation of Text Coherence. Paper presented at the COLING.

Fischer, T., & Leidinger, J. (2014). Testing patent value indicators on directly observed patent value—An empirical analysis of Ocean Tomo patent auctions. Research Policy, 43(3), 519-529.

Foundation, B. (2017). Retrieved from https://software.org/

Freund, Y., & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. Paper presented at the European conference on computational learning theory.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.

Friedman, R. S., & Förster, J. (2001). The effects of promotion and prevention cues on creativity. Journal of personality and social psychology, 81(6), 1001.

Galasso, A., Schankerman, M., & Serrano, C. J. (2013). Trading and enforcing patent rights. The Rand Journal of Economics, 44(2), 275-312.

Gambardella, A., Giuri, P., & Luzzi, A. (2007). The market for patents in Europe. Research Policy, 36(8), 1163-1183.

Gambardella, A., Harhoff, D., & Verspagen, B. (2008). The value of European patents. European Management Review, 5(2), 69-84.

Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.

Ghose, A., & Ipeirotis, P. G. (2011). Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Transactions on Knowledge and Data Engineering, 23(10), 1498-1512.

Gossart, C., Ozaygen, A., & Ozman, M. (2016). Are Litigated Patents More Valuable? The Case of Light Emitting Diodes.

Graham, S. J., Marco, A. C., & Miller, R. (2015). The USPTO patent examination research dataset: A window on the process of patent examination.

Graves, A. (2012). Supervised sequence labelling. Supervised sequence labelling with recurrent neural networks, 5-13.

Guerrini, C. J. (2013). Defining Patent Quality. Fordham L. Rev., 82, 3091.

Hall, B. H., & Harhoff, D. (2012). Recent research on the economics of patents. Annu. Rev. Econ., 4(1), 541-565.

Hall, B. H., Jaffe, A., & Trajtenberg, M. (2005). Market value and patent citations. RAND Journal of economics, 16-38.

Harhoff, D., Narin, F., Scherer, F. M., & Vopel, K. (1999). Citation frequency and the value of patented inventions. The review of Economics and Statistics, 81(3), 511-515.

Harhoff, D., & Reitzig, M. (2004). Determinants of opposition against EPO patent grants—the case of biotechnology and pharmaceuticals. International journal of industrial organization, 22(4), 443-480.

Harhoff, D., Scherer, F. M., & Vopel, K. (2003). Citations, family size, opposition and the value of patent rights. Research Policy, 32(8), 1343-1363.

Harhoff, D., & Wagner, S. (2009). The duration of patent examination at the European Patent Office. Management Science, 55(12), 1969-1984.

Hasan, M. A., Spangler, W. S., Griffin, T., & Alba, A. (2009). Coa: Finding novel patents through text analysis. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.

Hastie, T., Rosset, S., Zhu, J., & Zou, H. (2009). Multi-class adaboost. Statistics and its Interface, 2(3), 349-360.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). Overview of supervised learning The elements of statistical learning (pp. 9-41): Springer.

Haupt, R., Kloyer, M., & Lange, M. (2007). Patent indicators for the technology life cycle development. Research Policy, 36(3), 387-398.

Hido, S., Suzuki, S., Nishiyama, R., Imamichi, T., Takahashi, R., Nasukawa, T., . . . Ueno, T. (2012). Modeling patent quality: A system for large-scale patentability analysis using text mining. Information and Media Technologies, 7(3), 1180-1191.

Hirao, T., Nishino, M., Yoshida, Y., Suzuki, J., Yasuda, N., & Nagata, M. (2015). Summarizing a document by trimming the discourse tree. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(11), 2081-2092.

Jaffe, A. B., & Trajtenberg, M. (1996). Flows of knowledge from universities and federal laboratories: Modeling the flow of patent citations over time and across institutional and geographic boundaries. proceedings of the National Academy of Sciences, 93(23), 12671-12677.

Karakashian, S. (2015). A Software Patent War: The Effects of Patent Trolls on Startup Companies, Innovation, and Entrepreneurship. Hastings Bus. LJ, 11, 119.

Karki, M. (1997). Patent citation analysis: A policy analysis tool. World Patent Information, 19(4), 269-272.

Kashima, H., et al. . (2009). Predictive modeling of patent quality by using text mining. Paper presented at the Proc. 19th International Conference on Management of Technology (IAMOT’09).

Kesan, J. P., & Hayes, C. M. (2016). Patent Eligible Subject Matter after Alice.

Kim, H., & Song, J. (2013). Social network analysis of patent infringement lawsuits. Technological Forecasting and Social Change, 80(5), 944-955.

Kumar, A., & Stonebraker, M. (1988). Semantics based transaction management techniques for replicated data. ACM SIGMOD Record, 17(3), 117-125.

Landers, A. L. (2015). Patentable Subject Matter As a Policy Driver.

Lanjouw, J. O., Pakes, A., & Putnam, J. (1998). How to count patents and value intellectual property: The uses of patent renewal and application data. The Journal of Industrial Economics, 46(4), 405-432.

Lanjouw, J. O., & Schankerman, M. (1997). Stylized facts of patent litigation: Value, scope and ownership.

Lanjouw, J. O., & Schankerman, M. (2001). Characteristics of patent litigation: a window on competition. RAND Journal of economics, 129-151.

Lee, C., Song, B., & Park, Y. (2013). How to assess patent infringement risks: a semantic patent claim analysis using dependency relationships. Technology analysis & strategic management, 25(1), 23-38.

Lee, S., Yoon, B., & Park, Y. (2009). An approach to discovering new technology opportunities: Keyword-based patent map approach. Technovation, 29(6), 481-497.

Lemaitre, G., Nogueira, F., & Aridas, C. K. (2017). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17), 1-5.

Lemley, M. (2016). Valuable Patents.

Lemley, M. A., & Myhrvold, N. (2007). How to make a patent market.

Lerner, J. (1994). The importance of patent scope: an empirical analysis. The Rand Journal of Economics, 319-333.
Lerner, J., & Seru, A. (2015). The use and misuse of patent data: Issues for corporate finance and beyond. Booth/Harvard Business School Working Paper.

Lerner, J., Sorensen, M., & Strömberg, P. (2011). Private equity and long‐run investment: The case of innovation. The Journal of Finance, 66(2), 445-477.

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.

Liegsalz, J., & Wagner, S. (2013). Patent examination at the state intellectual property office in China. Research Policy, 42(2), 552-563.

Liu, Y., Hseuh, P.-y., Lawrence, R., Meliksetian, S., Perlich, C., & Veen, A. (2011). Latent graphical models for quantifying and predicting patent quality. Paper presented at the Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining.

Liu, Y., Huang, X., An, A., & Yu, X. (2008). Modeling and predicting the helpfulness of online reviews. Paper presented at the Data mining, 2008. ICDM'08. Eighth IEEE international conference on.

Lupu, M., Mayer, K., Tait, J., & Trippe, A. J. (2011). Current challenges in patent information retrieval (Vol. 29): Springer.

Mann, W. C., & Thompson, S. A. (1988). Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3), 243-281.

Marco, A. C. (2005). The option value of patent litigation: Theory and evidence. Review of Financial Economics, 14(3), 323-351.

Marco, A. C., Myers, A. F., Graham, S. J., D'Agostino, P. A., & Apple, K. (2015). The USPTO patent assignment dataset: Descriptions and analysis.

Minka, T. P. (2003). A comparison of numerical optimizers for logistic regression. Unpublished draft.

Moreno, C. P. (2015). They Know It When They See It: Patentable Subject Matter after Alice. Intellectual Property & Technology Law Journal, 27(1), 6.

Mossoff, A. (2014). A Brief History of Software Patents (and Why They're Valid).

Nagaoka, S., Motohashi, K., & Goto, A. (2010). Patent statistics as an innovation indicator. Handbook of the Economics of Innovation, 2, 1083-1127.

Niemann, H., Moehrle, M. G., & Frischkorn, J. (2017). Use of a new patent text-mining and visualization method for identifying patenting patterns over time: Concept, method and test application. Technological Forecasting and Social Change, 115, 210-220.

Noh, H., Jo, Y., & Lee, S. (2015). Keyword selection and processing strategy for applying text mining to patent analysis. Expert Systems with Applications, 42(9), 4348-4360.

Okada, Y., Naito, Y., & Nagaoka, S. (2016). Claim length as a value predictor of a patent.

Palangkaraya, A., Jensen, P. H., & Webster, E. (2008). Applicant behaviour in patent examination request lags. Economics letters, 101(3), 243-245.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Dubourg, V. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825-2830.

Popp, D., Juhl, T., & Johnson, D. K. (2004). Time in purgatory: Examining the grant lag for US patent applications. Topics in Economic Analysis & Policy, 4(1).

Putnam, J. D. (1997). The value of international patent rights.

Rai, A. K. (2013). Improving (Software) Patent Quality Through the Administrative Process. Houston law review/University of Houston, 51(2), 503.

Régibeau, P., & Rockett, K. (2010). Innovation cycles and learning at the patent office: Does the early patent get the delay? The Journal of Industrial Economics, 58(2), 222-246.
Reitzig, M. (2003). What determines patent value?: Insights from the semiconductor industry. Research Policy, 32(1), 13-26.

Reitzig, M. (2004). Improving patent valuations for management purposes—validating new indicators by analyzing application rationales. Research Policy, 33(6), 939-957.

Robertson, S. (2004). Understanding inverse document frequency: on theoretical arguments for IDF. Journal of documentation, 60(5), 503-520.

Rodríguez, J. J., & Alonso, C. J. (2004). Interval and dynamic time warping-based decision trees. Paper presented at the Proceedings of the 2004 ACM symposium on Applied computing.

Sampat, B. N., & Lemley, M. A. (2009). Examining patent examination.

Sampat, B. N., & Ziedonis, A. A. (2004). Patent citations and the economic value of patents Handbook of quantitative science and technology research (pp. 277-298): Springer.

Sapsalis, E., de la Potterie, B. v. P., & Navon, R. (2006). Academic versus industry patenting: An in-depth analysis of what determines patent value. Research Policy, 35(10), 1631-1645.

Schankerman, M., & Pakes, A. (1986). Estimates of the value of patent rights in European countries during the post-1950 period. The economic journal, 96(384), 1052-1076.

Serrano, C. J. (2010). The dynamics of the transfer and renewal of patents. The Rand Journal of Economics, 41(4), 686-708.

Steinberg, D. R., Anderson, T. E., & Smith, M. H. (2015). USPTO Issues Updated Guidance on Patent Subject Matter Eligibility. Intellectual Property & Technology Law Journal, 27(2), 20.

Tay, F. E., & Cao, L. (2002). Modified support vector machines in financial time series forecasting. Neurocomputing, 48(1), 847-861.
Trajtenberg, M. (1990). A penny for your quotes: patent citations and the value of innovations. The Rand Journal of Economics, 172-187.

Trajtenberg, M., Henderson, R., & Jaffe, A. (1997). University versus corporate patents: A window on the basicness of invention. Economics of Innovation and new technology, 5(1), 19-50.

Tran, J. L. (2016). Two Years After Alice v. CLS Bank.

Tseng, Y.-H., Lin, C.-J., & Lin, Y.-I. (2007). Text mining techniques for patent analysis. Information Processing & Management, 43(5), 1216-1247.

Van Pottelsberghe, B., Denis, H., & Guellec, D. (2001). Using patent counts for cross-country comparisons of technology output.

Van Zeebroeck, N. (2011). The puzzle of patent value indicators. Economics of Innovation and new technology, 20(1), 33-62.

Vroom, V. H. (2000). Leadership and the decision-making process. Organizational dynamics, 28(4), 82-94.

Xie, Z., & Miyazaki, K. (2013). Evaluating the effectiveness of keyword search strategy for patent identification. World Patent Information, 35(1), 20-30.

Xing, Z., Pei, J., & Keogh, E. (2010). A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1), 40-48.

Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Paper presented at the Icml.

Yoon, J., & Kim, K. (2012). Detecting signals of new technological opportunities using semantic patent analysis and outlier detection. Scientometrics, 90(2), 445-461.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code