~~NOTOC~~ ====== RapidMiner Contributions ====== On this website we provide code extensions for [[http://rapid-i.com/content/view/181/190/|RapidMiner]], the leading tool for data mining. In the last years, RapidMiner turned out to be tool of choice for Data Mining for us -- so we decided to give something back to the community and contributed with a variety of algorithms over the years. Most of the software has been developed as part of the BMBF funded project PaREn (Pattern Recognition Engineering). The following list gives a brief overview of the extensions, the code already integrated and our experimental contributions. Many parts have been presented at [[http://www.rcomm2010.org|RCOMM]], the RapidMiner Community Meeting and Conference. {{:rapidminer_190.jpg?nolink|}} \\ ====== RapidMiner Extensions ====== ===== Anomaly Detection Extension ===== The Anomaly Detection extension is the first approach to use RapidMiner for unsupervised anomaly detection. It currently comes with a number of the most well known unsupervised anomaly detection algorithms. A dataset can be analyzed and for all examples in an ExampleSet, an anomaly score is computed. It can be either used for detecting outliers (e.g. in fraud detection or medical applications) as well as for removing outliers as a preprocessing step for training classifiers. More detailed information is available on its website: [[rapidminer:anomalydetection|Anomaly Detection Extension]] \\ \\ {{:rapidminer:ad-logo.png?120&nolink|}} {{:rapidminer:outlier-operators.png?nolink&120|}} \\ ===== PaREn Automatic System Construction Wizard ===== The PaREn Automatic System Construction Wizard is a tool for supporting you in constructing a classification process within RapidMiner. For a given data set, it automatically recommends and constructs a classification process based on certain characteristics of the data set. More info on this webpage: [[rapidminer:wizard|RapidMiner Extension: PaREn Automatic System Construction Wizard]] It also contains the Landmarking Operator for extracting features from data sets used for Meta-Learning. More details can be found in our publication [[http://madm.dfki.de/publication&pubid=4948|Landmarking for Meta-Learning using RapidMiner]]. Contact: Matthias.Reifdfki.de \\ \\ {{:patternrecognition.png?120&nolink|}} {{:landmarking.png?120&nolink|}} ====== RapidMiner Contributions ====== The code in this section has been integrated into RapidMiner and is available if the latest version is used. ===== X-Means Clustering and k-means++ ===== Integrated in RapidMiner since 5.3.x \\ \\ {{:rapidminer:xmeans.png?nolink&135|}} ===== AutoMLP ===== AutoMLP is a simple algorithm for both learning rate and size adjustment of neural networks during training. The algorithm combines ideas from genetic algorithms and stochastic optimization. It maintains a small ensemble of networks that are trained in parallel with different rates and different numbers of hidden units. After a small, fixed number of epochs, the error rate is determined on a validation set and the worst performers are replaced with copies of the best networks, modified to have different numbers of hidden units and learning rates. Hidden unit numbers and learning rates are drawn according to probability distributions derived from successful rates and sizes. You can find more information and the download link here: [[http://madm.dfki.de/rapidminer/automlp| AutoMLP Website]] More details are also in the following publications: [[http://madm.dfki.de/publication&pubid=4947| Pattern Recognition Engineering]] \\ [[http://madm.dfki.de/publication&pubid=4950| AutoMLP: Simple, Effective, Fully Automated Learning Rate and Size Adjustment]] Contact: Faisal.Shafaitdfki.de \\ \\ {{:automlp.png?120&nolink|}} ===== Fast k-Means ===== The Fast k-Means Operator represents an implemenation of the k-Means algorithm according to Charles Elkan, which is in many cases much faster than the standard implementation. You can find more information and the download link here: [[http://madm.dfki.de/rapidminer/fast_kmeans| Fast k-Means Website]] More details are also in the following (external) publication: [[http://cseweb.ucsd.edu/~elkan/kmeansicml03.pdf|Using the Triangle Inequality to Accelerate k-Means.]] Contact: Christian.Koflerdfki.de \\ \\ {{:fast_kmeans.png?120&nolink|}} ====== Experimental Section ====== ===== Distributed Pattern Recognition ===== DisPaRe is a framework for processing RapidMiner operations in a distributed environment. You can find more information in this publication: [[http://madm.dfki.de/publication&pubid=4949| Distributed Pattern Recognition in RapidMiner]] The DisPaRe framework is the result of the diploma thesis by Alexander Arimond. \\ You can find all details here: {{:downloads:thesis_dispare.pdf|diploma thesis by Alexander Arimond}} Finally, the plain code of the system is here: {{:downloads:dispare_code.tar.gz|DisPaRe code}} Please note, that we consider the status of the software as alpha. Since we do not work anymore on this project, we can not provide any support. Contact: Christian.Koflerdfki.de {{:logo_dispare.png?120&nolink|}} \\ ==== Image Mining ==== This extension is intended to make working with images possible in RapidMiner. This includes handling of image collections, doing transformations on these images, and extraction of certain features for further data mining tasks. You can find more information and the download link here: [[http://madm.dfki.de/rapidminer/imagemining| Image Mining Website]] Contact: Christian.Koflerdfki.de {{:rapidminer:imagemining.png?120&nolink|}}