博主辛苦了,我要打赏银两给博主,犒劳犒劳站长。
【摘要】本文收集了大量数据挖掘或数据聚类使用到的测试数据集下载地址,给自己做一个笔记,一定会用到的,本文非原创,转载自新浪博客。
原文地址:http://blog.sina.com.cn/s/blog_4c98b96001000883.html
关于源代码,网上有很多公开源码的算法包,例如最为著名的Weka,MLC++等。Weka还在不断的更新其算法,下载地址:
http://www.cs.waikato.ac.nz/ml/weka/UCI收集的机器学习数据集
http://www.ics.uci.edu/~mlearn/\MLRepository.htmstatlib
http://liama.ia.ac.cn/SCILAB/scilabindexgb.htmhttp://lib.stat.cmu.edu/样本数据库
http://kdd.ics.uci.edu/http://www.ics.uci.edu/~mlearn/MLRepository.html关于基金的数据挖掘的网站
http://www.gotofund.com/index.asphttp://lans.ece.utexas.edu/~strehl/reuters数据集
http://www.research.att.com/~lewis/reuters21578.html各种数据集:
http://kdd.ics.uci.edu/summary.data.type.htmlhttp://www.mlnet.org/cgi-bin/mlnetois.pl/?File=datasets.htmlhttp://lib.stat.cmu.edu/datasets/http://dctc.sjtu.edu.cn/adaptive/datasets/http://fimi.cs.helsinki.fi/data/http://www.almaden.ibm.com/software/quest/Resources/index.shtmlhttp://miles.cnuce.cnr.it/~palmeri/datam/DCI/进行文本分类&WEB
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.htmlhttp://www.w3.org/TR/WD-logfile-960221.htmlhttp://www.w3.org/Daemon/User/Config/Logging.html#AccessLoghttp://www.w3.org/1998/11/05/WC-workshop/Papers/bala2.htmlhttp://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/http://www.web-caching.com/traces-logs.htmlhttp://www-2.cs.cmu.edu/webkbhttp://www.cs.auc.dk/research/DP/tdb/TimeCenter/TimeCenterPublications/TR-75.pdfhttp://www.cs.cornell.edu/projects/kddcup/index.html时间序列数据的网址
http://www.stat.wisc.edu/~reinsel/bjr-data/apriori算法的测试数据
http://www.almaden.ibm.com/cs/quest/syndata.html数据生成器的链接
http://www.cse.cuhk.edu.hk/~kdd/data_collection.htmlhttp://www.almaden.ibm.com/cs/quest/syndata.html关联:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jarhttp://www.almaden.ibm.com/software/quest/Resources/datasets/syndata.html#assocSynDataWEKA:
http://flow.dl.sourceforge.net/sourceforge/weka/regression-datasets.jar1。A jarfile containing 37 classification problems, originally obtained from the UCI repository
http://prdownloads.sourceforge.net/weka/datasets-UCI.jar2.A jarfile containing 37 regression problems, obtained from various sources
http://prdownloads.sourceforge.net/weka/datasets-numeric.jar3.A jarfile containing 30 regression datasets collected by Luis Torgo
http://prdownloads.sourceforge.net/weka/regression-datasets.jar癌症基因:
http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi金融数据:
http://lisp.vse.cz/pkdd99/Challenge/chall.htmkdnuggets 相关链接数据集:
http://www.kdnuggets.com/datasets/index.html另一个人提供的
http://www.cs.toronto.edu/~roweis/data.htmlhttp://kdd.ics.uci.edu/summary.task.type.htmlhttp://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/http://www.phys.uni.torun.pl/~duch/software.html在下面的网址可以找到reuters数据集
http://www.research.att.com/~lewis/reuters21578.html以下网址上有各种数据集:
http://kdd.ics.uci.edu/summary.data.type.html进行文本分类,还有一个数据集是可以用的,即rainbow的数据集
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.htmlDownload the Financial Data (~17.5M zipped file, ~67M unzipped data)
Download the Medical Data (~2M zipped file, ~6M unzipped data)
http://lisp.vse.cz/pkdd99/Challenge/chall.htmkdnuggets 相关链接数据集:
http://www.kdnuggets.com/datasets/index.html还有另外一个很好的资源网址,里面包含的数据资源如下(按应用领域划分):
http://kdd.ics.uci.edu/Direct Marketing
KDD CUP 1998 Data
GIS
Forest CoverType
Indexing
Corel Image Features
Pseudo Periodic Synthetic Time Series
Intrusion Detection
KDD CUP 1999 Data
Process Control
Synthetic Control Chart Time Series
Recommendation Systems
Entree Chicago Recommendation Data
Robots
Pioneer-1 Mobile Robot Data
Robot Execution Failures
Sign Language Recognition
Australian Sign Language Data
High-quality Australian Sign Language Data
Text Categorization
20 Newsgroups Data
Reuters-21578 Text Categorization Collection
NSF Research Awards Abstracts 199 0-2003
World Wide Web
Microsoft Anonymous Web Data
MSNBC Anonymous Web Data
Syskill Webert Web Data
还有一个网站:
http://www.fs.fed.us/fire/fuelman/注意:本文会不断的更新
版权归 马富天个人博客 所有
本文标题:《数据挖掘中需要用到的数据集来源》
本文链接地址:http://www.mafutian.com/170.html
转载请务必注明出处,小生将不胜感激,谢谢! 喜欢本文或觉得本文对您有帮助,请分享给您的朋友 ^_^
顶1
踩0
评论审核未开启 |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||