博主辛苦了,我要打赏银两给博主,犒劳犒劳站长。
【摘要】很难找到有数据聚类算法是用PHP编写的,不小心从git中找到一篇PHP编写K-means处理一维数据的代码,所以把它转载过来了,作为收藏。
K-means算法是很典型的基于距离的聚类算法,采用距离作为相似性的评价指标,即认为两个对象的距离越近,其相似度就越大。该算法认为簇是由距离靠近的对象组成的,因此把得到紧凑且独立的簇作为最终目标。
K-means算法以欧式距离作为相似度测度。
注意:欧氏距离是一个通常采用的距离定义,指在m维空间中两个点之间的真实距离,或者向量的自然长度(即该点到原点的距离)。在二维和三维空间中的欧氏距离就是两点之间的实际距离。
/*
$data = array(
2,
3,
20,
30,
25,
35,
45,
15,
10,
5,
7,9,8,90,56,78,87,65,78,89,67,79,67,68,69,69,69,69
);
*/
$data=array(1,2,3,4,5,6,7,8,9);
echo "<pre>";
echo "Data:\n\n";
print_r($data);
print_r(kMeans($data, 3));
function initialiseCentroids(array $data, $k) {
$dimensions = count($data[0]);
$centroids = array();
for($i = 0; $i < $k; $i++) {
$centroids[$i] =initialiseCentroid($data,$centroids);
}
return $centroids;
}
function initialiseCentroid($data,$centroids='') {
if(isset($centroids))
{
while(true)
{
$centroid = $data[rand(0, count($data)-1)];
if(!in_array($centroid,$centroids))
{
break;
}
}
}
else
{
$centroid = $data[rand(0, count($data)-1)];
}
echo $centroid;
return $centroid;
}
function kMeans($data, $k) {
$centroids = initialiseCentroids($data, $k);
$mapping = array();
echo "\n\n\nrandom generated centroids:\n\n";
print_r($centroids);
while(true) {
$new_mapping = assignCentroids($data, $centroids);
echo "\n\n\nmapping:\n\n";
print_r($new_mapping);
$changed = false;
foreach($new_mapping as $documentID => $centroidID) {
if(!isset($mapping[$documentID]) || $centroidID != $mapping[$documentID]) {
$mapping = $new_mapping;
$changed = true;
break;
}
}
if(!$changed){
return formatResults($mapping, $data, $centroids);
}
$centroids = updateCentroids($mapping, $data, $k);
}
}
function formatResults($mapping, $data, $centroids) {
$result = array();
$result['centroids'] = $centroids;
foreach($mapping as $documentID => $centroidID) {
$result[$centroidID][] = $data[$documentID];
}
echo "\n\n\n ========================Final results======================================<b>\n\n";
return $result;
}
function assignCentroids($data, $centroids) {
$mapping = array();
foreach($data as $documentID => $document) {
$minDist = 100;
$minCentroid = null;
foreach($centroids as $centroidID => $centroid) {
$dist = 0;
$dist = abs($centroid - $document);
if($dist < $minDist) {
$minDist = $dist;
$minCentroid = $centroidID;
}
}
$mapping[$documentID] = $minCentroid;
}
return $mapping;
}
function updateCentroids($mapping, $data, $k) {
$centroids = array();
$counts = array_count_values($mapping);
foreach($mapping as $documentID => $centroidID) {
if(!isset($centroids[$centroidID])) {
$centroids[$centroidID] = 0;
}
$centroids[$centroidID] += $data[$documentID]/$counts[$centroidID];
}
if(count($centroids) < $k) {
$centroids = array_merge($centroids, initialiseCentroids($data, $k - count($centroids)));
}
foreach($centroids as $centID=>$val)
{
$temp=null;
foreach($mapping as $documentID => $centroidID) {
if($centroidID==$centID)
{
if($temp== null || abs($val - $temp) > abs($data[$documentID] - $val)) {
$temp= $data[$documentID];
}
}
}
$centroids[$centID]=$temp;
}
$centroids=array_map(function($x){return round($x);},$centroids);
echo "\n\n\n>>>>>>>>>>> new centroids\n\n";
print_r($centroids);
return $centroids;
}
个人觉得这个代码写的很好,虽然并没有完全看懂代码。
执行结果图:
版权归 马富天个人博客 所有
本文标题:《一维K-means算法PHP代码实现》
本文链接地址:http://www.mafutian.com/169.html
转载请务必注明出处,小生将不胜感激,谢谢! 喜欢本文或觉得本文对您有帮助,请分享给您的朋友 ^_^
顶4
踩1
第 1 楼 澳门银座平台 2016-07-30 23:02:26 江苏苏州
评论审核未开启 |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||