NumPy 2.4.0 Release Notes#
Highlights#
We’ll choose highlights for this release near the end of the release cycle.
Deprecations#
Setting the strides
attribute is deprecated#
Setting the strides attribute is now deprecated since mutating
an array is unsafe if an array is shared, especially by multiple
threads. As an alternative, you can create a new view (no copy) via:
* np.lib.stride_tricks.strided_window_view if applicable,
* np.lib.stride_tricks.as_strided for the general case,
* or the np.ndarray constructor (buffer
is the original array) for a light-weight version.
(gh-28925)
align=
must be passed as boolean to np.dtype()
#
When creating a new dtype
a VisibleDeprecationWarning
will be
given if align=
is not a boolean.
This is mainly to prevent accidentally passing a subarray align flag where it
has no effect, such as np.dtype("f8", 3)
instead of np.dtype(("f8", 3))
.
We strongly suggest to always pass align=
as a keyword argument.
(gh-29301)
Compatibility notes#
NumPy’s C extension modules have begun to use multi-phase initialisation, as defined by PEP 489. As part of this, a new explicit check has been added that each such module is only imported once per Python process. This comes with the side-effect that deleting
numpy
fromsys.modules
and re-importing it will now fail with anImportError
. This has always been unsafe, with unexpected side-effects, though did not previously raise an error.(gh-29030)
The Macro NPY_ALIGNMENT_REQUIRED has been removed#
The macro was defined in the npy_cpu.h file, so might be regarded as semipublic. As it turns out, with modern compilers and hardware it is almost always the case that alignment is required, so numpy no longer uses the macro. It is unlikely anyone uses it, but you might want to compile with the -Wundef flag or equivalent to be sure.
(gh-29094)
New Features#
Let
np.size
accept multiple axes.(gh-29240)
Improvements#
Fix flatiter
indexing edge cases#
The flatiter
object now shares the same index preparation logic as
ndarray
, ensuring consistent behavior and fixing several issues where
invalid indices were previously accepted or misinterpreted.
Key fixes and improvements:
Stricter index validation
Boolean non-array indices like
arr.flat[[True, True]]
were incorrectly treated asarr.flat[np.array([1, 1], dtype=int)]
. They now raise an index error. Note that indices that match the iterator’s shape are expected to not raise in the future and be handled as regular boolean indices. Usenp.asarray(<index>)
if you want to match that behavior.Float non-array indices were also cast to integer and incorrectly treated as
arr.flat[np.array([1.0, 1.0], dtype=int)]
. This is now deprecated and will be removed in a future version.0-dimensional boolean indices like
arr.flat[True]
are also deprecated and will be removed in a future version.
Consistent error types:
Certain invalid
flatiter
indices that previously raised ValueError now correctly raise IndexError, aligning withndarray
behavior.Improved error messages:
The error message for unsupported index operations now provides more specific details, including explicitly listing the valid index types, instead of the generic
IndexError: unsupported index operation
.
(gh-28590)
Improved error message for assert_array_compare#
The error message generated by assert_array_compare which is used by functions like assert_allclose, assert_array_less etc. now also includes information about the indices at which the assertion fails.
(gh-29112)
Show unit information in __repr__
for datetime64("NaT")
#
When a datetime64
object is “Not a Time” (NaT), its __repr__
method now
includes the time unit of the datetime64 type. This makes it consistent with
the behavior of a timedelta64
object.
(gh-29396)
Performance improvements and changes#
Performance improvements to np.unique
for string dtypes#
The hash-based algorithm for unique extraction provides an order-of-magnitude speedup on large string arrays. In an internal benchmark with about 1 billion string elements, the hash-based np.unique completed in roughly 33.5 seconds, compared to 498 seconds with the sort-based method – about 15× faster for unsorted unique operations on strings. This improvement greatly reduces the time to find unique values in very large string datasets.
(gh-28767)
Changes#
Multiplication between a string and integer now raises OverflowError instead of MemoryError if the result of the multiplication would create a string that is too large to be represented. This follows Python’s behavior.
(gh-29060)
unique_values
for string dtypes may return unsorted data#
np.unique now supports hash‐based duplicate removal for string dtypes. This enhancement extends the hash-table algorithm to byte strings (‘S’), Unicode strings (‘U’), and the experimental string dtype (‘T’, StringDType). As a result, calling np.unique() on an array of strings will use the faster hash-based method to obtain unique values. Note that this hash-based method does not guarantee that the returned unique values will be sorted. This also works for StringDType arrays containing None (missing values) when using equal_nan=True (treating missing values as equal).
(gh-28767)
Fix bug in matmul
for non-contiguous out kwarg parameter#
In some cases, if out
was non-contiguous, np.matmul
would cause
memory corruption or a c-level assert. This was new to v2.3.0 and fixed in v2.3.1.
(gh-29179)
__array_interface__
with NULL pointer changed#
The array interface now accepts NULL pointers (NumPy will do
its own dummy allocation, though).
Previously, these incorrectly triggered an undocumented
scalar path.
In the unlikely event that the scalar path was actually desired,
you can (for now) achieve the previous behavior via the correct
scalar path by not providing a data
field at all.
(gh-29338)