NumPy 2.xx.x Release Notes#

Highlights#

We’ll choose highlights for this release near the end of the release cycle.

Deprecations#

Setting the strides attribute is deprecated#

Setting the strides attribute is now deprecated since mutating an array is unsafe if an array is shared, especially by multiple threads. As an alternative, you can create a new view (no copy) via: * np.lib.stride_tricks.strided_window_view if applicable, * np.lib.stride_tricks.as_strided for the general case, * or the np.ndarray constructor (buffer is the original array) for a light-weight version.

(gh-28925)

align= must be passed as boolean to np.dtype()#

When creating a new dtype a VisibleDeprecationWarning will be given if align= is not a boolean. This is mainly to prevent accidentally passing a subarray align flag where it has no effect, such as np.dtype("f8", 3) instead of np.dtype(("f8", 3)). We strongly suggest to always pass align= as a keyword argument.

(gh-29301)

Compatibility notes#

  • NumPy’s C extension modules have begun to use multi-phase initialisation, as defined by PEP 489. As part of this, a new explicit check has been added that each such module is only imported once per Python process. This comes with the side-effect that deleting numpy from sys.modules and re-importing it will now fail with an ImportError. This has always been unsafe, with unexpected side-effects, though did not previously raise an error.

    (gh-29030)

The Macro NPY_ALIGNMENT_REQUIRED has been removed#

The macro was defined in the npy_cpu.h file, so might be regarded as semipublic. As it turns out, with modern compilers and hardware it is almost always the case that alignment is required, so numpy no longer uses the macro. It is unlikely anyone uses it, but you might want to compile with the -Wundef flag or equivalent to be sure.

(gh-29094)

New Features#

  • Let np.size accept multiple axes.

    (gh-29240)

Improvements#

Improved error message for assert_array_compare#

The error message generated by assert_array_compare which is used by functions like assert_allclose, assert_array_less etc. now also includes information about the indices at which the assertion fails.

(gh-29112)

Performance improvements and changes#

Performance improvements to np.unique for string dtypes#

The hash-based algorithm for unique extraction provides an order-of-magnitude speedup on large string arrays. In an internal benchmark with about 1 billion string elements, the hash-based np.unique completed in roughly 33.5 seconds, compared to 498 seconds with the sort-based method – about 15× faster for unsorted unique operations on strings. This improvement greatly reduces the time to find unique values in very large string datasets.

(gh-28767)

Changes#

  • Multiplication between a string and integer now raises OverflowError instead of MemoryError if the result of the multiplication would create a string that is too large to be represented. This follows Python’s behavior.

    (gh-29060)

unique_values for string dtypes may return unsorted data#

np.unique now supports hash‐based duplicate removal for string dtypes. This enhancement extends the hash-table algorithm to byte strings (‘S’), Unicode strings (‘U’), and the experimental string dtype (‘T’, StringDType). As a result, calling np.unique() on an array of strings will use the faster hash-based method to obtain unique values. Note that this hash-based method does not guarantee that the returned unique values will be sorted. This also works for StringDType arrays containing None (missing values) when using equal_nan=True (treating missing values as equal).

(gh-28767)

Fix bug in matmul for non-contiguous out kwarg parameter#

In some cases, if out was non-contiguous, np.matmul would cause memory corruption or a c-level assert. This was new to v2.3.0 and fixed in v2.3.1.

(gh-29179)

__array_interface__ with NULL pointer changed#

The array interface now accepts NULL pointers (NumPy will do its own dummy allocation, though). Previously, these incorrectly triggered an undocumented scalar path. In the unlikely event that the scalar path was actually desired, you can (for now) achieve the previous behavior via the correct scalar path by not providing a data field at all.

(gh-29338)