NumPy 2.4.0 Release Notes#

Highlights#

We’ll choose highlights for this release near the end of the release cycle.

Deprecations#

Setting the `strides` attribute is deprecated#

Setting the strides attribute is now deprecated since mutating an array is unsafe if an array is shared, especially by multiple threads. As an alternative, you can create a new view (no copy) via: * np.lib.stride_tricks.strided_window_view if applicable, * np.lib.stride_tricks.as_strided for the general case, * or the np.ndarray constructor (buffer is the original array) for a light-weight version.

(gh-28925)

Compatibility notes#

NumPy’s C extension modules have begun to use multi-phase initialisation, as defined by PEP 489. As part of this, a new explicit check has been added that each such module is only imported once per Python process. This comes with the side-effect that deleting numpy from sys.modules and re-importing it will now fail with an ImportError. This has always been unsafe, with unexpected side-effects, though did not previously raise an error.

(gh-29030)

The Macro NPY_ALIGNMENT_REQUIRED has been removed#

The macro was defined in the npy_cpu.h file, so might be regarded as semipublic. As it turns out, with modern compilers and hardware it is almost always the case that alignment is required, so numpy no longer uses the macro. It is unlikely anyone uses it, but you might want to compile with the -Wundef flag or equivalent to be sure.

(gh-29094)

Performance improvements and changes#

Performance improvements to `np.unique` for string dtypes#

The hash-based algorithm for unique extraction provides an order-of-magnitude speedup on large string arrays. In an internal benchmark with about 1 billion string elements, the hash-based np.unique completed in roughly 33.5 seconds, compared to 498 seconds with the sort-based method – about 15× faster for unsorted unique operations on strings. This improvement greatly reduces the time to find unique values in very large string datasets.

(gh-28767)

Changes#

Multiplication between a string and integer now raises OverflowError instead of MemoryError if the result of the multiplication would create a string that is too large to be represented. This follows Python’s behavior.

(gh-29060)

`unique_values` for string dtypes may return unsorted data#

np.unique now supports hash‐based duplicate removal for string dtypes. This enhancement extends the hash-table algorithm to byte strings (‘S’), Unicode strings (‘U’), and the experimental string dtype (‘T’, StringDType). As a result, calling np.unique() on an array of strings will use the faster hash-based method to obtain unique values. Note that this hash-based method does not guarantee that the returned unique values will be sorted. This also works for StringDType arrays containing None (missing values) when using equal_nan=True (treating missing values as equal).

(gh-28767)

Fix bug in `matmul` for non-contiguous out kwarg parameter#

In some cases, if out was non-contiguous, np.matmul would cause memory corruption or a c-level assert. This was new to v2.3.0 and fixed in v2.3.1.

(gh-29179)