Diffify – Python release

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been 6 months since the launch of Diffify, our website for comparing package releases. We are delighted to announce that, in addition to CRAN’s 20,000 R packages, you can now track 1600 popular Python packages!

What’s included?

The current criteria for a Python package to be included in Diffify are:

  • The package is listed in the top 2000 PyPI packages according to download statistics.
  • The package has had version releases since 1st May 2020.
  • The package wheel is downloadable from pypi.org.

If your favourite package is not currently accessible, don’t worry! We are actively working to expand the list to as many PyPI packages as possible, as we’ll explain below.


Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.


New content

The first change you’ll notice is to our homepage, where we now have buttons for both R and Python.

A screenshot of the new Diffify homepage: In the sidebar there are links to home, R and Python. The main body has some introduction text, and now contains a link to “Get started with Python”.

Clicking on the Python button will take you through to the package search bar. For this walkthrough, we will compare versions 3.3.0 and 3.5.0 of the Matplotlib package. Diffify provides a breakdown of the changes to the package dependencies, functions and classes.

A screenshot of the version comparison page for the Python package Matplotlib: The later version is set to 3.5.0 and the earlier version is set to 3.3.0. Collapsable windows are displayed which contain changes to Dependencies, Functions and Classes.

Dependencies

We consider three kinds of dependencies:

  • The Python version requirement.
  • Required Python packages – these must be installed.
  • Optional Python packages – installing these will enable extra package features.

A screenshot of the Dependencies window: This includes tabs for the “Python”, “Required” and “Optional” dependencies. The Python requirement has changed from 3.6 to 3.7.

In our example, we see that the Python version requirement has changed from >=3.6 to >=3.7.

Functions

Here we provide a list of functions that have been added, removed or changed between the two versions.

A screenshot of the Functions window: A list of package functions is displayed. Each entry displays the function name prefixed by the module path on the left, and a button for accessing the function “Details” on the right. Each function is colour-coded based on whether it has been added, removed or changed.

Clicking on the “Details” dropdown will bring up the function arguments, including the argument name and default value. If type annotations are included in the package source code, Diffify will also display the argument type and the function return type.

A screenshot of the expanded “Details” for the matplotlib.pyplot.grid function: A table is displayed showing the function arguments for each version, including the argument name, default value, and type. Changed arguments are highlighted. The return type of the function is displayed above this table.

For the pyplot.grid() function, the name of the first positional argument has changed from b to visible.

Classes

Here we provide a list of classes that have been added, removed or changed.

A screenshot of the Classes window: A list of package classes is displayed. Each entry displays the class name prefixed by the module path on the left, and a button for accessing the class methods is displayed on the right. Each class is colour-coded based on whether it has been added, removed or changed.

Clicking on the “Methods” button for a class will bring up a pop-up that lists the methods that belong to that class. The example below shows the methods .__init__() and .from_dict(), which belong to the spines.Spines class.

A screenshot of the Methods pop-up window for the matplotlib,spines.Spines class: A list of methods belonging to the class is displayed. Each entry displays the method name on the left, and a button for accessing the method “Details” is displayed on the right. Each method is colour-coded based on whether it has been added, removed or changed.

Similar to functions, you can access the method arguments by clicking on “Details”.

Removing clutter

The functions and classes listed above have been detected by analysing the package source code. We have taken various steps to filter out code that is intended for internal use by the package developers, including

  • ignoring functions and scripts whose names start with a leading underscore
  • ignoring functions whose names start test* and classes whose names start Test*
  • leaving out scripts whose names start test_* or end *_test.py

These criteria are intended to leave out internal code and unit tests.

Looking ahead

Python has been around for quite a while, and consequently it has many packages – 400,000 to be precise! Perhaps unsurprisingly, analysing so many packages for Diffify has proven to be a bit of a challenge…

This is why we have initially chosen to focus on the 2000 most popular PyPI packages. We will soon extend this to the top 5000, according to Top PyPI Packages. And we won’t be stopping there! It remains to be seen whether we will manage to add all 400,000, but we will certainly try our utmost.

Despite our best efforts to filter out clutter, you may still come across some functions and classes that are clearly intended for internal use or unit testing. We will continue to look at ways to improve our filters.

We hope you enjoy the new content! As always, if you spot any bugs or have any suggestions please add an issue to our public GitHub.

Stay tuned for more updates…

For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)