机器人与人工智能爱好者论坛

 找回密码
 立即注册
查看: 10199|回复: 0
打印 上一主题 下一主题

Top 10 Machine Learning Projects on Github

[复制链接]

257

主题

279

帖子

1655

积分

版主

Rank: 7Rank: 7Rank: 7

积分
1655
跳转到指定楼层
楼主
发表于 2015-12-15 13:02:20 | 只看该作者 |只看大图 回帖奖励 |倒序浏览 |阅读模式
本帖最后由 irobot 于 2015-12-15 13:05 编辑

Top 10 Machine Learning Projects on Github


2015.12.15

The top 10 machine learning projects on Github include a number of libraries, frameworks, and education resources.  Have a look at the tools others are using, and the resources they are learning from.  
  By Matthew Mayo.
Open source software is an important piece of the data science puzzle.
According to the most recent KDnuggets data science software poll results, 73% of data scientists used free software in the previous 12 months. While there are many sources of such tools on the internet, Github has become a de facto clearinghouse for all types of open source software, including tools used in the data science community. The importance, and central position, of machine learning to the field of data science does not need to be pointed out.
The following is an overview of the top 10 machine learning projects on Github.*

1. Scikit-learn
Machine learning in Python.

★ 8641, 5125
The top project is, unsurprisingly, the go-to machine learning library for Pythonistas the world over, from industry to academia. Scikit-learn leverages the Python scientific computing stack, built on NumPy, SciPy, and matplotlib. As general purpose a toolkit as there could be, Scikit-learn contains classification, regression, and clustering algorithms, as well as data-preparation and model-evaluation tools.


2. Awesome Machine Learning
A curated list of awesome Machine Learning frameworks, libraries and software.

★ 8404 , 1885
This is a curated list of machine learning libraries, frameworks, and software. The list is categorized by language, and further by machine learning category (general purpose, computer vision, natural language processing, etc.). It also includes data visualization tools, which opens it up as more of a generalized data science list in some sense... which is a good thing.

3. PredictionIO
PredictionIO, a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray.

★ 8145, 1002
PredictionIO is a general purpose framework. It includes several template engines for well-known tasks, such as classification and recommendation, which can be customized, connects to existing applications with REST APIs or SDKs, and includes supports for Spark MLib. Since it is built on top of Spark and utilizes its ecosystem, it should come as no surprise that PredictionIO is developed mainly in Scala.


4. Dive Into Machine Learning
Dive into Machine Learning with Python Jupyter notebook and scikit-learn.

★ 4326, 342
This is a collection of IPython notebook tutorials for scikit-learn, as well as a number of links to related Python-specific and general machine learning topics, and more general data science information. The author isn't greedy either; they are quick to point out many other tutorials covering similar ground, in case this one doesn't tickle your fancy. The repo has no no software, but if you're new to Python machine learning, it may be worth checking out.


5. Pattern
Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

★ 3799, 598
Pattern is a Python-based web mining toolkit coming out of the Computational Linguistics & Psycholinguistics (CLiPS) research center at the University of Antwerp. In this context, it has tools for the tasks of scraping, machine learning, natural language processing, network analysis, and visualization. Pattern can also easily mine data from several well-known web services. The project claims to be well-documented, and to include numerous examples and unit tests.

6. NuPIC (Numenta Platform for Intelligent Computing)
A brain-inspired machine intelligence platform, and biologically accurate neural network based on cortical learning algorithms.

★ 3647, 987
NuPIC implements the Hierarchical Temporal Memory (HTM) machine learning algorithms. HTM is an attempt to model the computation of the neocortex, and focuses on storing and recalling spatial and temporal patterns. NuPIC is ideally suited to pattern-related anomaly detection.
7. Vowpal Wabbit
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

★ 2949, 827
Vowpal Wabbit aims for speedy modelling of massive datasets, and supports parallel learning. The project was started at Yahoo! and is currently developed at MicROSoft Research. Vowpal Wabbit harnesses out-of-core learning, and has been used to learn a tera-feature dataset in an hour across 1000 compute nodes.

8. aerosolve
A machine learning package built for humans.

★ 2538, 245
aerosolve attempts to be different from other libraries, focusing on human-friendly debugging facilities, Scala code for training, an image content analysis engine for easy image ranking, and a feature transformation language giving users flexibility and control over features. aerosolve implements thrift based feature representation, wherein features are logically-grouped for the purposes of applying transformations to, or facilitating interactions between, entire features groups at once.


9. GoLearn
Machine Learning for Go.

★ 2334, 215
GoLearn is an actively developed machine learning library for Go. Its goals are to provide a fully-featured, simple-to-use, customizable package for Go developers. GoLearn implements the familiar (to many) fit/predict interface of Scikit-learn, making it easy to swap out estimators, and implements "helper functions" like cross validation and train/test splitting.

10. Code for Machine Learning for Hackers
Code accompanying the book "Machine Learning for Hackers."

★ 2003, 1446
This repo contains the code from the O'Reilly book Machine Learning for Hackers. All repo code is in R, relies on numerous R packages, and topics covered include the all-too common tasks of classification, ranking, and regression, as well as statistical procedures such as principal component analysis and multidimensional scaling.
* Determined by the top returned results to the query "machine learning" on Github search, sorted by most stars, as of December 10, 2015, 1:00PM EST.
Related:




回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

站长推荐上一条 /1 下一条

QQ|Archiver|手机版|小黑屋|陕ICP备15012670号-1    

GMT+8, 2024-5-7 10:52 , Processed in 0.064149 second(s), 27 queries .

Powered by Discuz! X3.2

© 2001-2013 Comsenz Inc.

快速回复 返回顶部 返回列表