You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
255 lines
9.7 KiB
255 lines
9.7 KiB
1 month ago
|
From be21651a1fada038b8ca00938d063fbb3336b989 Mon Sep 17 00:00:00 2001
|
||
|
From: =?UTF-8?q?Zbigniew=20J=C4=99drzejewski-Szmek?= <zbyszek@in.waw.pl>
|
||
|
Date: Wed, 16 Aug 2023 17:23:26 +0200
|
||
|
Subject: [PATCH 2/7] Revert "Make more difficult sanitize of the expression
|
||
|
string before eval"
|
||
|
|
||
|
This reverts commit 00b035c78ca5ac209b58b56b5dcc99596cac423c.
|
||
|
---
|
||
|
ANNOUNCE.rst | 23 ++---------------------
|
||
|
RELEASE_NOTES.rst | 19 +------------------
|
||
|
doc/user_guide.rst | 27 ++++++++++-----------------
|
||
|
numexpr/necompiler.py | 27 ++++++++-------------------
|
||
|
numexpr/tests/test_numexpr.py | 18 ++----------------
|
||
|
5 files changed, 23 insertions(+), 91 deletions(-)
|
||
|
|
||
|
diff --git a/ANNOUNCE.rst b/ANNOUNCE.rst
|
||
|
index 4e9070f2cc..f038df4f44 100644
|
||
|
--- a/ANNOUNCE.rst
|
||
|
+++ b/ANNOUNCE.rst
|
||
|
@@ -4,10 +4,7 @@ Announcing NumExpr 2.8.5
|
||
|
|
||
|
Hi everyone,
|
||
|
|
||
|
-In 2.8.5 we have added a new function, `validate` which checks an expression `ex`
|
||
|
-for validity, for usage where the program is parsing a user input. There are also
|
||
|
-consequences for this sort of usage, since `eval(ex)` is called, and as such we
|
||
|
-do some string sanitization as described below.
|
||
|
+**Under development.**
|
||
|
|
||
|
Project documentation is available at:
|
||
|
|
||
|
@@ -16,23 +13,7 @@ http://numexpr.readthedocs.io/
|
||
|
Changes from 2.8.4 to 2.8.5
|
||
|
---------------------------
|
||
|
|
||
|
-* A `validate` function has been added. This function checks the inputs, returning
|
||
|
- `None` on success or raising an exception on invalid inputs. This function was
|
||
|
- added as numerous projects seem to be using NumExpr for parsing user inputs.
|
||
|
- `re_evaluate` may be called directly following `validate`.
|
||
|
-* As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
|
||
|
- calls `eval` on the inputs. A regular expression is now applied to help sanitize
|
||
|
- the input expression string, forbidding '__', ':', and ';'. Attribute access
|
||
|
- is also banned except for '.r' for real and '.i' for imag.
|
||
|
-* Thanks to timbrist for a fix to behavior of NumExpr with integers to negative
|
||
|
- powers. NumExpr was pre-checking integer powers for negative values, which
|
||
|
- was both inefficient and causing parsing errors in some situations. Now NumExpr
|
||
|
- will simply return 0 as a result for such cases. While NumExpr generally tries
|
||
|
- to follow NumPy behavior, performance is also critical.
|
||
|
-* Thanks to peadar for some fixes to how NumExpr launches threads for embedded
|
||
|
- applications.
|
||
|
-* Thanks to de11n for making parsing of the `site.cfg` for MKL consistent among
|
||
|
- all shared platforms.
|
||
|
+**Under development.**
|
||
|
|
||
|
|
||
|
What's Numexpr?
|
||
|
diff --git a/RELEASE_NOTES.rst b/RELEASE_NOTES.rst
|
||
|
index 9cf5d3977b..4929a42e12 100644
|
||
|
--- a/RELEASE_NOTES.rst
|
||
|
+++ b/RELEASE_NOTES.rst
|
||
|
@@ -5,24 +5,7 @@ Release notes for NumExpr 2.8 series
|
||
|
Changes from 2.8.4 to 2.8.5
|
||
|
---------------------------
|
||
|
|
||
|
-* A `validate` function has been added. This function checks the inputs, returning
|
||
|
- `None` on success or raising an exception on invalid inputs. This function was
|
||
|
- added as numerous projects seem to be using NumExpr for parsing user inputs.
|
||
|
- `re_evaluate` may be called directly following `validate`.
|
||
|
-* As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
|
||
|
- calls `eval` on the inputs. A regular expression is now applied to help sanitize
|
||
|
- the input expression string, forbidding '__', ':', and ';'. Attribute access
|
||
|
- is also banned except for '.r' for real and '.i' for imag.
|
||
|
-* Thanks to timbrist for a fix to behavior of NumExpr with integers to negative
|
||
|
- powers. NumExpr was pre-checking integer powers for negative values, which
|
||
|
- was both inefficient and causing parsing errors in some situations. Now NumExpr
|
||
|
- will simply return 0 as a result for such cases. While NumExpr generally tries
|
||
|
- to follow NumPy behavior, performance is also critical.
|
||
|
-* Thanks to peadar for some fixes to how NumExpr launches threads for embedded
|
||
|
- applications.
|
||
|
-* Thanks to de11n for making parsing of the `site.cfg` for MKL consistent among
|
||
|
- all shared platforms.
|
||
|
-
|
||
|
+**Under development.**
|
||
|
|
||
|
Changes from 2.8.3 to 2.8.4
|
||
|
---------------------------
|
||
|
diff --git a/doc/user_guide.rst b/doc/user_guide.rst
|
||
|
index 3a3cf63d9c..74306eb658 100644
|
||
|
--- a/doc/user_guide.rst
|
||
|
+++ b/doc/user_guide.rst
|
||
|
@@ -1,7 +1,7 @@
|
||
|
-NumExpr 2.8 User Guide
|
||
|
+NumExpr 2.0 User Guide
|
||
|
======================
|
||
|
|
||
|
-The NumExpr package supplies routines for the fast evaluation of
|
||
|
+The :code:`numexpr` package supplies routines for the fast evaluation of
|
||
|
array expressions elementwise by using a vector-based virtual
|
||
|
machine.
|
||
|
|
||
|
@@ -11,33 +11,23 @@ Using it is simple::
|
||
|
>>> import numexpr as ne
|
||
|
>>> a = np.arange(10)
|
||
|
>>> b = np.arange(0, 20, 2)
|
||
|
- >>> c = ne.evaluate('2*a + 3*b')
|
||
|
+ >>> c = ne.evaluate("2*a+3*b")
|
||
|
>>> c
|
||
|
array([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72])
|
||
|
|
||
|
|
||
|
-It is also possible to use NumExpr to validate an expression::
|
||
|
-
|
||
|
- >>> ne.validate('2*a + 3*b')
|
||
|
-
|
||
|
-which returns `None` on success or raises an exception on invalid inputs.
|
||
|
-
|
||
|
-and it can also re_evaluate an expression::
|
||
|
-
|
||
|
- >>> b = np.arange(0, 40, 4)
|
||
|
- >>> ne.re_evaluate()
|
||
|
-
|
||
|
Building
|
||
|
--------
|
||
|
|
||
|
-*NumExpr* requires Python_ 3.7 or greater, and NumPy_ 1.13 or greater. It is
|
||
|
+*NumExpr* requires Python_ 2.6 or greater, and NumPy_ 1.7 or greater. It is
|
||
|
built in the standard Python way:
|
||
|
|
||
|
.. code-block:: bash
|
||
|
|
||
|
- $ pip install .
|
||
|
+ $ python setup.py build
|
||
|
+ $ python setup.py install
|
||
|
|
||
|
-You must have a C-compiler (i.e. MSVC Build tools on Windows and GCC on Linux) installed.
|
||
|
+You must have a C-compiler (i.e. MSVC on Windows and GCC on Linux) installed.
|
||
|
|
||
|
Then change to a directory that is not the repository directory (e.g. `/tmp`) and
|
||
|
test :code:`numexpr` with:
|
||
|
@@ -278,6 +268,9 @@ General routines
|
||
|
* :code:`detect_number_of_cores()`: Detects the number of cores on a system.
|
||
|
|
||
|
|
||
|
+
|
||
|
+
|
||
|
+
|
||
|
Intel's VML specific support routines
|
||
|
-------------------------------------
|
||
|
|
||
|
diff --git a/numexpr/necompiler.py b/numexpr/necompiler.py
|
||
|
index cbf290932b..fef886baf5 100644
|
||
|
--- a/numexpr/necompiler.py
|
||
|
+++ b/numexpr/necompiler.py
|
||
|
@@ -260,17 +260,15 @@ class Immediate(Register):
|
||
|
def __str__(self):
|
||
|
return 'Immediate(%d)' % (self.node.value,)
|
||
|
|
||
|
-
|
||
|
-_forbidden_re = re.compile('[\;[\:]|__|\.[abcdefghjklmnopqstuvwxyzA-Z_]')
|
||
|
+_forbidden_re = re.compile('[\;[\:]|__')
|
||
|
def stringToExpression(s, types, context):
|
||
|
"""Given a string, convert it to a tree of ExpressionNode's.
|
||
|
"""
|
||
|
# sanitize the string for obvious attack vectors that NumExpr cannot
|
||
|
# parse into its homebrew AST. This is to protect the call to `eval` below.
|
||
|
- # We forbid `;`, `:`. `[` and `__`, and attribute access via '.'.
|
||
|
- # We cannot ban `.real` or `.imag` however...
|
||
|
- no_whitespace = re.sub(r'\s+', '', s)
|
||
|
- if _forbidden_re.search(no_whitespace) is not None:
|
||
|
+ # We forbid `;`, `:`. `[` and `__`
|
||
|
+ # We would like to forbid `.` but it is both a reference and decimal point.
|
||
|
+ if _forbidden_re.search(s) is not None:
|
||
|
raise ValueError(f'Expression {s} has forbidden control characters.')
|
||
|
|
||
|
old_ctx = expressions._context.get_current_context()
|
||
|
@@ -768,6 +766,7 @@ def getArguments(names, local_dict=None, global_dict=None, _frame_depth: int=2):
|
||
|
_names_cache = CacheDict(256)
|
||
|
_numexpr_cache = CacheDict(256)
|
||
|
_numexpr_last = {}
|
||
|
+_numexpr_sanity = set()
|
||
|
evaluate_lock = threading.Lock()
|
||
|
|
||
|
# MAYBE: decorate this function to add attributes instead of having the
|
||
|
@@ -829,13 +828,6 @@ def validate(ex: str,
|
||
|
_frame_depth: int
|
||
|
The calling frame depth. Unless you are a NumExpr developer you should
|
||
|
not set this value.
|
||
|
-
|
||
|
- Note
|
||
|
- ----
|
||
|
- Both `validate` and by extension `evaluate` call `eval(ex)`, which is
|
||
|
- potentially dangerous on unsanitized inputs. As such, NumExpr does some
|
||
|
- sanitization, banning the character ':;[', the dunder '__', and attribute
|
||
|
- access to all but '.r' for real and '.i' for imag access to complex numbers.
|
||
|
"""
|
||
|
global _numexpr_last
|
||
|
|
||
|
@@ -865,6 +857,8 @@ def validate(ex: str,
|
||
|
kwargs = {'out': out, 'order': order, 'casting': casting,
|
||
|
'ex_uses_vml': ex_uses_vml}
|
||
|
_numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs)
|
||
|
+ # with evaluate_lock:
|
||
|
+ # return compiled_ex(*arguments, **kwargs)
|
||
|
except Exception as e:
|
||
|
return e
|
||
|
return None
|
||
|
@@ -924,12 +918,7 @@ def evaluate(ex: str,
|
||
|
The calling frame depth. Unless you are a NumExpr developer you should
|
||
|
not set this value.
|
||
|
|
||
|
- Note
|
||
|
- ----
|
||
|
- Both `validate` and by extension `evaluate` call `eval(ex)`, which is
|
||
|
- potentially dangerous on unsanitized inputs. As such, NumExpr does some
|
||
|
- sanitization, banning the character ':;[', the dunder '__', and attribute
|
||
|
- access to all but '.r' for real and '.i' for imag access to complex numbers.
|
||
|
+
|
||
|
"""
|
||
|
# We could avoid code duplication if we called validate and then re_evaluate
|
||
|
# here, but they we have difficulties with the `sys.getframe(2)` call in
|
||
|
diff --git a/numexpr/tests/test_numexpr.py b/numexpr/tests/test_numexpr.py
|
||
|
index a9f917fccd..ebc41c8d54 100644
|
||
|
--- a/numexpr/tests/test_numexpr.py
|
||
|
+++ b/numexpr/tests/test_numexpr.py
|
||
|
@@ -536,27 +536,13 @@ class test_evaluate(TestCase):
|
||
|
|
||
|
# Forbid semicolon
|
||
|
try:
|
||
|
- evaluate('import os;')
|
||
|
+ evaluate('import os; os.cpu_count()')
|
||
|
except ValueError:
|
||
|
pass
|
||
|
else:
|
||
|
self.fail()
|
||
|
|
||
|
- # Attribute access
|
||
|
- try:
|
||
|
- evaluate('os.cpucount()')
|
||
|
- except ValueError:
|
||
|
- pass
|
||
|
- else:
|
||
|
- self.fail()
|
||
|
-
|
||
|
- # But decimal point must pass
|
||
|
- a = 3.0
|
||
|
- evaluate('a*2.')
|
||
|
- evaluate('2.+a')
|
||
|
-
|
||
|
-
|
||
|
-
|
||
|
+ # I struggle to come up with cases for our ban on `'` and `"`
|
||
|
|
||
|
|
||
|
|