DataFrameGroupBy ================ .. cpp:class:: pandas::DataFrameGroupBy GroupBy class for split-apply-combine operations. Example ------- .. code-block:: cpp #include using namespace pandas; // Use DataFrameGroupBy DataFrameGroupBy obj; // ... operations ... Constructors ------------ .. list-table:: :widths: 55 25 20 :header-rows: 1 * - Signature - Location - Example * - ``DataFrameGroupBy(const DataFrame& df, const std::vector& by, bool as_index = true, bool sort = true, bool dropna = true, bool observed = true, bool group_keys = true)`` - pd_groupby.h:100 - * - ``DataFrameGroupBy(const DataFrame& df, const std::string& by, bool as_index = true, bool sort = true, bool dropna = true, bool observed = true, bool group_keys = true)`` - pd_groupby.h:111 - Indexing / Selection -------------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``DataFrame first() const`` - DataFrame - pd_groupby.h:301 - :ref:`View ` * - ``std::optional first_by_index_name_() const`` - std::optional - pd_groupby.h:90 - * - ``DataFrame get_group(const std::string& key) const`` - DataFrame - pd_groupby.h:323 - :ref:`View ` * - ``DataFrame get_group(const std::string& key, const std::set& exclude_cols) const`` - DataFrame - pd_groupby.h:331 - :ref:`View ` * - ``std::vector get_numeric_value_columns() const`` - std::vector - pd_groupby.h:447 - :ref:`View ` * - ``std::vector get_value_columns(const std::string& agg_name = "") const`` - std::vector - pd_groupby.h:453 - * - ``DataFrame head(int n = 5) const`` - DataFrame - pd_groupby.h:313 - :ref:`View ` * - ``DataFrame idxmax(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:465 - :ref:`View ` * - ``DataFrame idxmin(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:466 - :ref:`View ` * - ``DataFrame idxmin_with_dtype(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:263 - :ref:`View ` * - ``DataFrame last() const`` - DataFrame - pd_groupby.h:304 - :ref:`View ` * - ``DataFrame tail(int n = 5) const`` - DataFrame - pd_groupby.h:316 - :ref:`View ` Data Manipulation ----------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``bool dropna() const`` - bool - pd_groupby.h:407 - :ref:`View ` Statistics ---------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``DataFrame count() const`` - DataFrame - pd_groupby.h:166 - :ref:`View ` * - ``DataFrame describe() const`` - DataFrame - pd_groupby.h:171 - :ref:`View ` * - ``DataFrame max(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:163 - :ref:`View ` * - ``DataFrame mean(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:161 - :ref:`View ` * - ``DataFrame median(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:167 - :ref:`View ` * - ``DataFrame min(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:162 - :ref:`View ` * - ``DataFrame nunique(bool dropna = true) const`` - DataFrame - pd_groupby.h:170 - :ref:`View ` * - ``DataFrame prod(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:168 - :ref:`View ` * - ``DataFrame sem(int ddof = 1, bool numeric_only = false) const`` - DataFrame - pd_groupby.h:169 - :ref:`View ` * - ``DataFrame std_(int ddof = 1, bool numeric_only = false) const`` - DataFrame - pd_groupby.h:164 - :ref:`View ` * - ``DataFrame sum(bool numeric_only = false) const`` - DataFrame - pd_groupby.h:160 - :ref:`View ` * - ``DataFrame var(int ddof = 1, bool numeric_only = false) const`` - DataFrame - pd_groupby.h:165 - :ref:`View ` Aggregation ----------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``DataFrame agg(const std::string& func_name) const`` - DataFrame - pd_groupby.h:177 - :ref:`View ` * - ``DataFrame agg(const std::vector& funcs) const`` - DataFrame - pd_groupby.h:183 - :ref:`View ` * - ``DataFrame agg(const std::vector>>& col_funcs) const`` - DataFrame - pd_groupby.h:193 - :ref:`View ` * - ``DataFrame agg(const std::map& col_func_map) const`` - DataFrame - pd_groupby.h:204 - :ref:`View ` * - ``DataFrame agg(std::initializer_list>> col_funcs_init) const`` - DataFrame - pd_groupby.h:234 - :ref:`View ` * - ``PANDASCORE_API Result agg(const FuncArg& func) const`` - PANDASCORE_API Result - pd_groupby.h:352 - :ref:`View ` * - ``DataFrame agg_callable_with_dtype( const std::function&)>& cb) const`` - DataFrame - pd_groupby.h:257 - :ref:`View ` * - ``DataFrame agg_impl( const std::vector>>& col_funcs, bool list_form) const`` - DataFrame - pd_groupby.h:500 - * - ``DataFrame agg_named(const std::vector& specs) const`` - DataFrame - pd_groupby.h:339 - :ref:`View ` * - ``DataFrame agg_with_dtype(const std::string& how) const`` - DataFrame - pd_groupby.h:248 - :ref:`View ` * - ``DataFrame agg_with_dtype_list(const std::vector& funcs) const`` - DataFrame - pd_groupby.h:252 - :ref:`View ` * - ``std::vector aggregate_column(size_t col_idx, const std::string& func) const`` - std::vector - pd_groupby.h:621 - * - ``DataFrame apply(std::function fn, bool include_groups = true) const`` - DataFrame - pd_groupby.h:282 - :ref:`View ` * - ``Series apply_collect_scalar_results( const std::vector& keys, const std::vector& values) const`` - Series - pd_groupby.h:526 - :ref:`View ` * - ``Series apply_collect_scalar_string_results( const std::vector& keys, const std::vector& values) const`` - Series - pd_groupby.h:536 - * - ``DataFrame apply_collect_series_results( const std::vector& keys, const std::vector& col_names, const std::map>& num_cols, const std::map>& str_cols, const std::string& columns_axis_name = "") const`` - DataFrame - pd_groupby.h:549 - :ref:`View ` * - ``DataFrame apply_concat_dataframe_results( const std::vector& keys, const std::vector& dfs, bool use_group_keys) const`` - DataFrame - pd_groupby.h:563 - :ref:`View ` * - ``void apply_int_dtype_if_needed(DataFrame& result, const std::string& result_col, const std::string& source_col, const std::string& func) const`` - void - pd_groupby.h:636 - * - ``DataFrameGroupByResampler resample(const std::string& rule, const std::string& closed = "left", const std::string& label = "left") const`` - DataFrameGroupByResampler - pd_groupby.h:512 - :ref:`View ` * - ``DataFrame transform_apply_numeric( std::function(const std::string&, const Series&)> fn) const`` - DataFrame - pd_groupby.h:473 - * - ``DataFrame transform_concat_results( const std::map>& col_data, const std::vector& value_cols) const`` - DataFrame - pd_groupby.h:584 - * - ``DataFrame transform_named(const std::string& func_name) const`` - DataFrame - pd_groupby.h:593 - :ref:`View ` Reshaping --------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``squeeze_result(DataFrame& result) const`` - - pd_groupby.h:441 - :ref:`View ` Other Methods ------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``bool as_index() const`` - bool - pd_groupby.h:398 - * - ``void build_groups()`` - void - pd_groupby.h:617 - * - ``std::vector by_column_dtypes() const`` - std::vector - pd_groupby.h:388 - * - ``const std::vector& by_columns() const`` - const std::vector& - pd_groupby.h:385 - * - ``std::vector>> col_funcs( col_funcs_init.begin(), col_funcs_init.end())`` - std::vector>> - pd_groupby.h:235 - * - ``DataFrameGroupByColumn column(const std::string& col_name) const`` - DataFrameGroupByColumn - pd_groupby.h:292 - :ref:`View ` * - ``static double compute_agg(const std::vector& values, const std::string& func, int ddof = 1)`` - static double - pd_groupby.h:624 - :ref:`View ` * - ``const DataFrame& dataframe() const`` - const DataFrame& - pd_groupby.h:382 - :ref:`View ` * - ``DataFrame filter(std::function predicate) const`` - DataFrame - pd_groupby.h:274 - :ref:`View ` * - ``DataFrame filter_by_group_mask( const std::map& group_mask, bool use_dropna = true) const`` - DataFrame - pd_groupby.h:574 - :ref:`View ` * - ``bool group_keys() const`` - bool - pd_groupby.h:404 - * - ``const std::vector& group_keys_order() const`` - const std::vector& - pd_groupby.h:377 - :ref:`View ` * - ``const std::unordered_map>& groups() const`` - const std::unordered_map>& - pd_groupby.h:372 - :ref:`View ` * - ``DataFrame idx_extreme_impl_(int which, bool numeric_only) const`` - DataFrame - pd_groupby.h:492 - * - ``bool list_selected() const`` - bool - pd_groupby.h:413 - :ref:`View ` * - ``std::string make_group_key(size_t row_idx) const`` - std::string - pd_groupby.h:618 - * - ``Series ngroup(bool ascending = true) const`` - Series - pd_groupby.h:359 - * - ``size_t ngroups() const { return group_keys_order_.size()`` - size_t - pd_groupby.h:369 - :ref:`View ` * - ``DataFrame nth(int n) const`` - DataFrame - pd_groupby.h:310 - :ref:`View ` * - ``DataFrame nth(const std::vector& positions, const std::string& dropna_mode = "") const`` - DataFrame - pd_groupby.h:613 - :ref:`View ` * - ``DataFrame nth_by_resolved_slices( const std::vector>& per_group_slices) const`` - DataFrame - pd_groupby.h:488 - * - ``void rebuild_groups_with_empty_seeds(std::vector keys)`` - void - pd_groupby.h:151 - * - ``DataFrameGroupBy select(const std::vector& columns) const`` - DataFrameGroupBy - pd_groupby.h:421 - :ref:`View ` * - ``DataFrameGroupBy select_as_list(const std::vector& columns) const`` - DataFrameGroupBy - pd_groupby.h:429 - :ref:`View ` * - ``DataFrame select_rows_by_indices( const std::vector& row_indices, const std::vector& columns = {}, bool exclude_internal = false) const`` - DataFrame - pd_groupby.h:602 - :ref:`View ` * - ``const std::vector& selected_columns() const`` - const std::vector& - pd_groupby.h:410 - :ref:`View ` * - ``void set_extra_empty_keys(std::vector keys)`` - void - pd_groupby.h:141 - * - ``void set_owned_df(std::shared_ptr df)`` - void - pd_groupby.h:123 - * - ``void set_result_index(DataFrame& result) const`` - void - pd_groupby.h:627 - * - ``void set_synthetic_freq_key(bool value)`` - void - pd_groupby.h:133 - * - ``bool should_squeeze_to_series() const`` - bool - pd_groupby.h:416 - :ref:`View ` * - ``Series size() const`` - Series - pd_groupby.h:366 - :ref:`View ` * - ``bool sort_flag() const`` - bool - pd_groupby.h:401 - Code Examples ------------- The following examples are extracted from the test suite. .. _example-dataframegroupby-first-0: .. dropdown:: first (pd_test_1_all.cpp:11616) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11606 :emphasize-lines: 11 void pd_test_groupby_first_last() { std::cout << "========= GroupBy first/last ===================="; std::map> data = { {"category", {1.0, 1.0, 2.0, 2.0}}, {"value", {10.0, 20.0, 30.0, 40.0}} }; pandas::DataFrame df(data); auto first_result = df.groupby("category").first(); auto last_result = df.groupby("category").last(); // First for group 1: 10, group 2: 30 // Last for group 1: 20, group 2: 40 double first1 = std::stod(first_result["value"].get_value_str(0)); double first2 = std::stod(first_result["value"].get_value_str(1)); bool passed = ((std::abs(first1 - 10.0) < 0.001 && std::abs(first2 - 30.0) < 0.001) || (std::abs(first1 - 30.0) < 0.001 && std::abs(first2 - 10.0) < 0.001)); if (!passed) { .. _example-dataframegroupby-get_group-1: .. dropdown:: get_group (pd_test_2_all.cpp:20487) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20477 :emphasize-lines: 11 ++g_fail; } } static bool approx_eq(double a, double b, double tol = 1e-9) { if (std::isnan(a) && std::isnan(b)) return true; return std::abs(a - b) < tol; } // ===================================================================== // Test: get_group() with exclude_cols removes groupby columns // ===================================================================== void pd_test_groupby_apply_get_group_exclude() { std::cout << " -- pd_test_groupby_apply_get_group_exclude --" << std::endl; pandas::DataFrame df; df.add_column("key", std::vector{"a", "a", "b", "b"}); df.add_column("val1", std::vector{1.0, 2.0, 3.0, 4.0}); df.add_column("val2", std::vector{10.0, 20.0, 30.0, 40.0}); .. _example-dataframegroupby-get_group-2: .. dropdown:: get_group (pd_test_2_all.cpp:20487) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20477 :emphasize-lines: 11 ++g_fail; } } static bool approx_eq(double a, double b, double tol = 1e-9) { if (std::isnan(a) && std::isnan(b)) return true; return std::abs(a - b) < tol; } // ===================================================================== // Test: get_group() with exclude_cols removes groupby columns // ===================================================================== void pd_test_groupby_apply_get_group_exclude() { std::cout << " -- pd_test_groupby_apply_get_group_exclude --" << std::endl; pandas::DataFrame df; df.add_column("key", std::vector{"a", "a", "b", "b"}); df.add_column("val1", std::vector{1.0, 2.0, 3.0, 4.0}); df.add_column("val2", std::vector{10.0, 20.0, 30.0, 40.0}); .. _example-dataframegroupby-get_numeric_value_columns-3: .. dropdown:: get_numeric_value_columns (pd_test_5_all.cpp:36793) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 36783 :emphasize-lines: 11 } void case_1_groupby_numeric_columns_Int64() { const std::string tag = "[X1]"; try { pandas::DataFrame df; df.add_column("g", {"a","a","b","b"}); df.add_column_nullable("v_Int64", {1, 2, 3, 4}); df.add_column("v_Float64", {1.0, 2.0, 3.0, 4.0}); auto gb = df.groupby(std::vector{"g"}); auto cols = gb.get_numeric_value_columns(); std::cout << tag << " numeric_cols.size=" << cols.size(); for (auto& c : cols) std::cout << " [" << c << "]"; std::cout << "\n"; bool has_Int64 = std::find(cols.begin(), cols.end(), std::string("v_Int64")) != cols.end(); std::cout << tag << " has_Int64=" << has_Int64 << "\n"; } catch (const std::exception& e) { std::cout << tag << " exception: " << e.what() << "\n"; } } .. _example-dataframegroupby-head-4: .. dropdown:: head (pd_test_1_all.cpp:6301) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6291 :emphasize-lines: 11 void pd_test_dataframe_indexing() { std::cout << "========= indexing (loc/iloc) =============="; std::map> data; data["A"] = {10.0, 20.0, 30.0, 40.0, 50.0}; data["B"] = {1.0, 2.0, 3.0, 4.0, 5.0}; pandas::DataFrame df(data); // Test head auto head_df = df.head(3); if (head_df.nrows() != 3) { std::cout << " [FAIL] : in pd_test_dataframe_indexing() : head(3) nrows != 3" << std::endl; throw std::runtime_error("pd_test_dataframe_indexing failed: head(3) nrows != 3"); } // Test tail auto tail_df = df.tail(2); if (tail_df.nrows() != 2) { std::cout << " [FAIL] : in pd_test_dataframe_indexing() : tail(2) nrows != 2" << std::endl; throw std::runtime_error("pd_test_dataframe_indexing failed: tail(2) nrows != 2"); .. _example-dataframegroupby-idxmax-5: .. dropdown:: idxmax (pd_test_1_all.cpp:23956) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 23946 :emphasize-lines: 11 std::cout << "====================================== [OK] pd_test_ffill_bfill test suite ========================== " << std::endl; return 0; } } // namespace dataframe_tests // ------------------- pd_test_ffill_bfill.cpp (end) ----------------------------- // ------------------- pd_test_idxmax_idxmin.cpp (start) ----------------------------- // dataframe_tests/pd_test_idxmax_idxmin.cpp // Test for DataFrame.idxmax() and idxmin() methods #include #include #include #include #include "../pandas/pd_dataframe.h" // CRITICAL: No using namespace directives namespace dataframe_tests { .. _example-dataframegroupby-idxmin-6: .. dropdown:: idxmin (pd_test_1_all.cpp:23956) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 23946 :emphasize-lines: 11 std::cout << "====================================== [OK] pd_test_ffill_bfill test suite ========================== " << std::endl; return 0; } } // namespace dataframe_tests // ------------------- pd_test_ffill_bfill.cpp (end) ----------------------------- // ------------------- pd_test_idxmax_idxmin.cpp (start) ----------------------------- // dataframe_tests/pd_test_idxmax_idxmin.cpp // Test for DataFrame.idxmax() and idxmin() methods #include #include #include #include #include "../pandas/pd_dataframe.h" // CRITICAL: No using namespace directives namespace dataframe_tests { .. _example-dataframegroupby-idxmin_with_dtype-7: .. dropdown:: idxmin_with_dtype (pd_test_5_all.cpp:95397) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 95387 :emphasize-lines: 11 void case_701_dfgb_idxmin_rangeindex(int& local_fail) { std::cout << "-- case_701_dfgb_idxmin_rangeindex\n"; // Default RangeIndex (int64). Result columns must keep int64 dtype. pandas::DataFrame df; df.add_column("v", std::vector{3.0, 1.0, 2.0, 0.5}); df.add_column("key", std::vector{0, 0, 1, 1}); auto gb = df.groupby("key"); pandas::DataFrame out; std::string err; try { out = gb.idxmin_with_dtype(); } catch (const std::exception& e) { err = e.what(); } catch (...) { err = ""; } pandas_tests::check(err.empty(), "C_26_case_701_dfgb_idxmin_rangeindex()_no_throw", local_fail); if (!err.empty()) { std::cout << " err: " << err << "\n"; return; } std::string got = df_col_dtype(out, "v"); bool ok = (got == "int64"); pandas_tests::check(ok, "C_26_case_701_dfgb_idxmin_rangeindex()_dtype", local_fail); if (!ok) std::cout << " got=[" << got << "] expected=[int64]\n"; .. _example-dataframegroupby-last-8: .. dropdown:: last (pd_test_1_all.cpp:11617) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11607 :emphasize-lines: 11 void pd_test_groupby_first_last() { std::cout << "========= GroupBy first/last ===================="; std::map> data = { {"category", {1.0, 1.0, 2.0, 2.0}}, {"value", {10.0, 20.0, 30.0, 40.0}} }; pandas::DataFrame df(data); auto first_result = df.groupby("category").first(); auto last_result = df.groupby("category").last(); // First for group 1: 10, group 2: 30 // Last for group 1: 20, group 2: 40 double first1 = std::stod(first_result["value"].get_value_str(0)); double first2 = std::stod(first_result["value"].get_value_str(1)); bool passed = ((std::abs(first1 - 10.0) < 0.001 && std::abs(first2 - 30.0) < 0.001) || (std::abs(first1 - 30.0) < 0.001 && std::abs(first2 - 10.0) < 0.001)); if (!passed) { std::cout << " [FAIL] : in pd_test_groupby_first_last() : first values incorrect" << std::endl; .. _example-dataframegroupby-tail-9: .. dropdown:: tail (pd_test_1_all.cpp:6308) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6298 :emphasize-lines: 11 pandas::DataFrame df(data); // Test head auto head_df = df.head(3); if (head_df.nrows() != 3) { std::cout << " [FAIL] : in pd_test_dataframe_indexing() : head(3) nrows != 3" << std::endl; throw std::runtime_error("pd_test_dataframe_indexing failed: head(3) nrows != 3"); } // Test tail auto tail_df = df.tail(2); if (tail_df.nrows() != 2) { std::cout << " [FAIL] : in pd_test_dataframe_indexing() : tail(2) nrows != 2" << std::endl; throw std::runtime_error("pd_test_dataframe_indexing failed: tail(2) nrows != 2"); } // Test iloc_rows range auto slice = df.iloc_rows(1, 4); if (slice.nrows() != 3) { std::cout << " [FAIL] : in pd_test_dataframe_indexing() : iloc_rows(1,4) nrows != 3" << std::endl; throw std::runtime_error("pd_test_dataframe_indexing failed: iloc_rows(1,4) nrows != 3"); .. _example-dataframegroupby-dropna-10: .. dropdown:: dropna (pd_test_1_all.cpp:531) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 521 :emphasize-lines: 11 } // Test isna array numpy::NDArray na_mask = arr.isna(); if (na_mask.getSize() != 4) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : isna size != 4" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: isna size != 4"); } // Test dropna pandas::CategoricalArray dropped = arr.dropna(); if (dropped.size() != 2) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : dropna size != 2" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: dropna size != 2"); } // Test fillna (fill with existing category) pandas::CategoricalArray filled = arr.fillna("a"); // 'a' is in categories if (filled.has_na()) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : fillna should have no NA" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: fillna should have no NA"); .. _example-dataframegroupby-count-11: .. dropdown:: count (pd_test_1_all.cpp:66) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 56 :emphasize-lines: 11 if (arr.is_na(0)) { std::cout << " [FAIL] : in pd_test_boolean_array_na_handling() : is_na(0) should be false" << std::endl; throw std::runtime_error("pd_test_boolean_array_na_handling failed: is_na(0) should be false"); } if (!arr.has_na()) { std::cout << " [FAIL] : in pd_test_boolean_array_na_handling() : has_na() should be true" << std::endl; throw std::runtime_error("pd_test_boolean_array_na_handling failed: has_na() should be true"); } if (arr.count() != 2) { std::cout << " [FAIL] : in pd_test_boolean_array_na_handling() : count() should be 2" << std::endl; throw std::runtime_error("pd_test_boolean_array_na_handling failed: count() should be 2"); } std::cout << " -> tests passed" << std::endl; } void pd_test_boolean_array_kleene_and() { std::cout << "========= BooleanArray: Kleene AND ======================= "; .. _example-dataframegroupby-describe-12: .. dropdown:: describe (pd_test_2_all.cpp:19793) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 19783 :emphasize-lines: 11 ++g_fail; } } static bool approx_eq(double a, double b, double tol = 1e-9) { if (std::isnan(a) && std::isnan(b)) return true; return std::abs(a - b) < tol; } // ===================================================================== // Test: describe() default mode — numeric columns only // ===================================================================== void pd_test_describe_numeric_only() { std::cout << " -- pd_test_describe_numeric_only --" << std::endl; pandas::DataFrame df; df.add_column("A", std::vector{1.0, 2.0, 3.0, 4.0, 5.0}); df.add_column("B", std::vector{10.0, 20.0, 30.0, 40.0, 50.0}); df.add_column("Name", std::vector{"a", "b", "c", "d", "e"}); .. _example-dataframegroupby-max-13: .. dropdown:: max (pd_test_1_all.cpp:771) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 761 :emphasize-lines: 11 pandas::CategoricalArray arr = pandas::CategoricalArray::from_codes(codes, cats, true); // ordered // Test min std::optional min_val = arr.min(); if (!min_val.has_value() || *min_val != "low") { std::cout << " [FAIL] : in pd_test_categorical_array_ordered_operations() : min != 'low'" << std::endl; throw std::runtime_error("pd_test_categorical_array_ordered_operations failed: min != 'low'"); } // Test max std::optional max_val = arr.max(); if (!max_val.has_value() || *max_val != "high") { std::cout << " [FAIL] : in pd_test_categorical_array_ordered_operations() : max != 'high'" << std::endl; throw std::runtime_error("pd_test_categorical_array_ordered_operations failed: max != 'high'"); } // Test unordered throws for min/max pandas::CategoricalArray unordered = arr.as_unordered(); bool threw = false; try { unordered.min(); .. _example-dataframegroupby-mean-14: .. dropdown:: mean (pd_test_1_all.cpp:282) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 272 :emphasize-lines: 11 std::optional(true), std::optional(true) }); auto s = arr.sum(); if (!s.has_value() || s.value() != 3) { std::cout << " [FAIL] : in pd_test_boolean_array_reductions() : sum should be 3" << std::endl; throw std::runtime_error("pd_test_boolean_array_reductions failed: sum"); } auto m = arr.mean(); if (!m.has_value() || std::abs(m.value() - 0.75) > 0.001) { std::cout << " [FAIL] : in pd_test_boolean_array_reductions() : mean should be 0.75" << std::endl; throw std::runtime_error("pd_test_boolean_array_reductions failed: mean"); } std::cout << " -> tests passed" << std::endl; } void pd_test_boolean_array_dtype() { std::cout << "========= BooleanArray: dtype ======================= "; .. _example-dataframegroupby-median-15: .. dropdown:: median (pd_test_1_all.cpp:20910) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20900 :emphasize-lines: 11 throw std::runtime_error("pd_test_expanding_var failed: expanding var values incorrect"); } std::cout << " -> tests passed" << std::endl; } void pd_test_expanding_median() { std::cout << "========= Expanding median ======================"; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}); auto result = s.expanding().median(); // Expanding median: 1, 1.5, 2, 2.5, 3 bool passed = std::abs(result[0] - 1.0) < 0.001 && std::abs(result[1] - 1.5) < 0.001 && std::abs(result[2] - 2.0) < 0.001 && std::abs(result[3] - 2.5) < 0.001 && std::abs(result[4] - 3.0) < 0.001; if (!passed) { std::cout << " [FAIL] : in pd_test_expanding_median() : expanding median values incorrect" << std::endl; throw std::runtime_error("pd_test_expanding_median failed: expanding median values incorrect"); .. _example-dataframegroupby-min-16: .. dropdown:: min (pd_test_1_all.cpp:764) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 754 :emphasize-lines: 11 } void pd_test_categorical_array_ordered_operations() { std::cout << "========= CategoricalArray: ordered operations (min/max) ======================= "; std::vector cats = {"low", "medium", "high"}; std::vector codes = {0, 2, 1, 0, -1}; // low, high, medium, low, NA pandas::CategoricalArray arr = pandas::CategoricalArray::from_codes(codes, cats, true); // ordered // Test min std::optional min_val = arr.min(); if (!min_val.has_value() || *min_val != "low") { std::cout << " [FAIL] : in pd_test_categorical_array_ordered_operations() : min != 'low'" << std::endl; throw std::runtime_error("pd_test_categorical_array_ordered_operations failed: min != 'low'"); } // Test max std::optional max_val = arr.max(); if (!max_val.has_value() || *max_val != "high") { std::cout << " [FAIL] : in pd_test_categorical_array_ordered_operations() : max != 'high'" << std::endl; throw std::runtime_error("pd_test_categorical_array_ordered_operations failed: max != 'high'"); .. _example-dataframegroupby-nunique-17: .. dropdown:: nunique (pd_test_1_all.cpp:10604) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10594 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_nunique() { std::cout << "========= nunique ========================="; pandas::CategoricalArray arr({"a", "b", "a", "c", "b", std::nullopt}); pandas::CategoricalIndex idx(arr); bool passed = (idx.nunique(true) == 3 && idx.nunique(false) == 4); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_nunique() : nunique check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_nunique failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_factorize() { std::cout << "========= factorize ========================="; .. _example-dataframegroupby-prod-18: .. dropdown:: prod (pd_test_1_all.cpp:26082) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 26072 :emphasize-lines: 11 std::cout << "====================================== [OK] pd_test_pivot_table test suite ========================== " << std::endl; return 0; } } // namespace dataframe_tests // ------------------- pd_test_pivot_table.cpp (end) ----------------------------- // ------------------- pd_test_prod.cpp (start) ----------------------------- // dataframe_tests/pd_test_prod.cpp // Tests for DataFrame.prod() and DataFrame.prod_cols() methods #include #include #include #include #include "../pandas/pd_dataframe.h" // CRITICAL: No using namespace directives namespace dataframe_tests { .. _example-dataframegroupby-sem-19: .. dropdown:: sem (pd_test_1_all.cpp:4525) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 4515 :emphasize-lines: 11 #include "../pandas/pd_dataframe.h" #include "../pandas/pd_series.h" namespace dataframe_tests { namespace dataframe_tests_aggregation { void pd_test_aggregation_series_sem() { std::cout << "========= Series sem ============================"; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}); auto sem_val = s.sem(); // std(ddof=1) = sqrt(2.5), sem = sqrt(2.5)/sqrt(5) ≈ 0.707 bool passed = sem_val.has_value() && std::abs(*sem_val - 0.707) < 0.01; if (!passed) { std::cout << " [FAIL] : in pd_test_aggregation_series_sem() : sem value incorrect" << std::endl; throw std::runtime_error("pd_test_aggregation_series_sem failed: sem value incorrect"); } std::cout << " -> tests passed" << std::endl; } .. _example-dataframegroupby-std_-20: .. dropdown:: std_ (pd_test_1_all.cpp:20752) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20742 :emphasize-lines: 11 throw std::runtime_error("pd_test_rolling_min_periods failed: with min_periods=1, idx 1 should be 3.0"); } std::cout << " -> tests passed" << std::endl; } void pd_test_rolling_std() { std::cout << "========= Rolling std ==========================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}); auto result = s.rolling(3).std_(); // std([1,2,3]) = 1.0 (ddof=1) // std([2,3,4]) = 1.0 // std([3,4,5]) = 1.0 bool passed = std::abs(result[2] - 1.0) < 0.001; if (!passed) { std::cout << " [FAIL] : in pd_test_rolling_std() : rolling std should be 1.0" << std::endl; throw std::runtime_error("pd_test_rolling_std failed: rolling std should be 1.0"); } .. _example-dataframegroupby-sum-21: .. dropdown:: sum (pd_test_1_all.cpp:276) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 266 :emphasize-lines: 11 } // Test sum/mean pandas::BooleanArray arr({ std::optional(true), std::optional(false), std::optional(true), std::optional(true) }); auto s = arr.sum(); if (!s.has_value() || s.value() != 3) { std::cout << " [FAIL] : in pd_test_boolean_array_reductions() : sum should be 3" << std::endl; throw std::runtime_error("pd_test_boolean_array_reductions failed: sum"); } auto m = arr.mean(); if (!m.has_value() || std::abs(m.value() - 0.75) > 0.001) { std::cout << " [FAIL] : in pd_test_boolean_array_reductions() : mean should be 0.75" << std::endl; throw std::runtime_error("pd_test_boolean_array_reductions failed: mean"); } .. _example-dataframegroupby-var-22: .. dropdown:: var (pd_test_1_all.cpp:20890) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20880 :emphasize-lines: 11 throw std::runtime_error("pd_test_expanding_std failed: expanding std values incorrect"); } std::cout << " -> tests passed" << std::endl; } void pd_test_expanding_var() { std::cout << "========= Expanding var ========================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}); auto result = s.expanding().var(); // Expanding var (ddof=1): NaN, 0.5, 1.0, 1.6667, 2.5 bool passed = std::isnan(result[0]) && std::abs(result[1] - 0.5) < 0.001 && std::abs(result[2] - 1.0) < 0.001 && std::abs(result[3] - 1.6667) < 0.001 && std::abs(result[4] - 2.5) < 0.001; if (!passed) { std::cout << " [FAIL] : in pd_test_expanding_var() : expanding var values incorrect" << std::endl; throw std::runtime_error("pd_test_expanding_var failed: expanding var values incorrect"); .. _example-dataframegroupby-agg-23: .. dropdown:: agg (pd_test_1_all.cpp:11100) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11090 :emphasize-lines: 11 } void pd_test_func_apply_series_agg() { std::cout << "========= Series agg =================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}, "values"); bool passed = true; // Test string-based aggregation auto sum_result = s.agg("sum"); if (!sum_result.has_value() || !approx_equal(sum_result.value(), 15.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : sum failed" << std::endl; throw std::runtime_error("pd_test_func_apply_series_agg failed: sum failed"); } auto mean_result = s.agg("mean"); if (!mean_result.has_value() || !approx_equal(mean_result.value(), 3.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : mean failed" << std::endl; .. _example-dataframegroupby-agg-24: .. dropdown:: agg (pd_test_1_all.cpp:11100) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11090 :emphasize-lines: 11 } void pd_test_func_apply_series_agg() { std::cout << "========= Series agg =================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}, "values"); bool passed = true; // Test string-based aggregation auto sum_result = s.agg("sum"); if (!sum_result.has_value() || !approx_equal(sum_result.value(), 15.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : sum failed" << std::endl; throw std::runtime_error("pd_test_func_apply_series_agg failed: sum failed"); } auto mean_result = s.agg("mean"); if (!mean_result.has_value() || !approx_equal(mean_result.value(), 3.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : mean failed" << std::endl; .. _example-dataframegroupby-agg-25: .. dropdown:: agg (pd_test_1_all.cpp:11100) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11090 :emphasize-lines: 11 } void pd_test_func_apply_series_agg() { std::cout << "========= Series agg =================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}, "values"); bool passed = true; // Test string-based aggregation auto sum_result = s.agg("sum"); if (!sum_result.has_value() || !approx_equal(sum_result.value(), 15.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : sum failed" << std::endl; throw std::runtime_error("pd_test_func_apply_series_agg failed: sum failed"); } auto mean_result = s.agg("mean"); if (!mean_result.has_value() || !approx_equal(mean_result.value(), 3.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : mean failed" << std::endl; .. _example-dataframegroupby-agg-26: .. dropdown:: agg (pd_test_1_all.cpp:11100) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11090 :emphasize-lines: 11 } void pd_test_func_apply_series_agg() { std::cout << "========= Series agg =================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}, "values"); bool passed = true; // Test string-based aggregation auto sum_result = s.agg("sum"); if (!sum_result.has_value() || !approx_equal(sum_result.value(), 15.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : sum failed" << std::endl; throw std::runtime_error("pd_test_func_apply_series_agg failed: sum failed"); } auto mean_result = s.agg("mean"); if (!mean_result.has_value() || !approx_equal(mean_result.value(), 3.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : mean failed" << std::endl; .. _example-dataframegroupby-agg-27: .. dropdown:: agg (pd_test_1_all.cpp:11100) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11090 :emphasize-lines: 11 } void pd_test_func_apply_series_agg() { std::cout << "========= Series agg =================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}, "values"); bool passed = true; // Test string-based aggregation auto sum_result = s.agg("sum"); if (!sum_result.has_value() || !approx_equal(sum_result.value(), 15.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : sum failed" << std::endl; throw std::runtime_error("pd_test_func_apply_series_agg failed: sum failed"); } auto mean_result = s.agg("mean"); if (!mean_result.has_value() || !approx_equal(mean_result.value(), 3.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : mean failed" << std::endl; .. _example-dataframegroupby-agg-28: .. dropdown:: agg (pd_test_1_all.cpp:11100) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11090 :emphasize-lines: 11 } void pd_test_func_apply_series_agg() { std::cout << "========= Series agg =================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}, "values"); bool passed = true; // Test string-based aggregation auto sum_result = s.agg("sum"); if (!sum_result.has_value() || !approx_equal(sum_result.value(), 15.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : sum failed" << std::endl; throw std::runtime_error("pd_test_func_apply_series_agg failed: sum failed"); } auto mean_result = s.agg("mean"); if (!mean_result.has_value() || !approx_equal(mean_result.value(), 3.0)) { passed = false; std::cout << " [FAIL] : in pd_test_func_apply_series_agg() : mean failed" << std::endl; .. _example-dataframegroupby-agg_callable_with_dtype-29: .. dropdown:: agg_callable_with_dtype (pd_test_5_all.cpp:95045) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 95035 :emphasize-lines: 11 run_sgb_case("count", "object:bool", "int64", "C_26_case_412_sgb_count_objbool()", lf); } void case_501_callable_int_returns_int64(int& local_fail) { std::cout << "-- case_501_callable_int_returns_int64\n"; pandas::DataFrame df = make_mixed_df(); auto gb = df.groupby("key"); pandas::DataFrame out; std::string err; try { out = gb.agg_callable_with_dtype(make_int_callable(42)); } catch (const std::exception& e) { err = e.what(); } catch (...) { err = ""; } pandas_tests::check(err.empty(), "C_26_case_501_callable_int_returns_int64()_no_throw", local_fail); if (!err.empty()) { std::cout << " err: " << err << "\n"; .. _example-dataframegroupby-agg_named-30: .. dropdown:: agg_named (pd_test_2_all.cpp:20534) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20524 :emphasize-lines: 11 check(approx_eq(sub_b["val1"].get_value_double(0), 3.0), "get_group_b_val1_r0"); check(approx_eq(sub_b["val1"].get_value_double(1), 4.0), "get_group_b_val1_r1"); // Empty exclude_cols: same as no-exclude overload std::set empty_exclude; auto sub_empty = gb.get_group("a", empty_exclude); check(sub_empty.ncols() == 3, "get_group_empty_excl_cols_3"); } // ===================================================================== // Test: agg_named() basic execution // ===================================================================== void pd_test_groupby_apply_named_agg_basic() { std::cout << " -- pd_test_groupby_apply_named_agg_basic --" << std::endl; pandas::DataFrame df; df.add_column("key", std::vector{"a", "a", "b", "b"}); df.add_column("val", std::vector{1.0, 3.0, 5.0, 7.0}); auto gb = df.groupby("key"); .. _example-dataframegroupby-agg_with_dtype-31: .. dropdown:: agg_with_dtype (pd_test_5_all.cpp:94652) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 94642 :emphasize-lines: 11 static void run_dfgb_case(const std::string& fn, const std::string& col, const std::string& expected_dtype, const std::string& label, int& local_fail) { pandas::DataFrame df = make_mixed_df(); auto gb = df.groupby("key"); pandas::DataFrame out; std::string err; try { out = gb.agg_with_dtype(fn); } catch (const std::exception& e) { err = e.what(); } catch (...) { err = ""; } pandas_tests::check(err.empty(), label + "_no_throw", local_fail); if (!err.empty()) { std::cout << " err: " << err << "\n"; .. _example-dataframegroupby-agg_with_dtype_list-32: .. dropdown:: agg_with_dtype_list (pd_test_5_all.cpp:94682) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 94672 :emphasize-lines: 11 static void run_dfgb_list_case(const std::vector& fns, const std::string& src_col, const std::vector& expected, const std::string& label, int& local_fail) { pandas::DataFrame df = make_mixed_df(); auto gb = df.groupby("key"); pandas::DataFrame out; std::string err; try { out = gb.agg_with_dtype_list(fns); } catch (const std::exception& e) { err = e.what(); } catch (...) { err = ""; } pandas_tests::check(err.empty(), label + "_no_throw", local_fail); if (!err.empty()) { std::cout << " err: " << err << "\n"; .. _example-dataframegroupby-apply-33: .. dropdown:: apply (pd_test_1_all.cpp:11244) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11234 :emphasize-lines: 11 void pd_test_func_apply_dataframe_apply_axis0() { std::cout << "========= DataFrame apply axis=0 ======================"; std::map> data = { {"A", {1.0, 2.0, 3.0}}, {"B", {4.0, 5.0, 6.0}} }; pandas::DataFrame df(data); // apply axis=0 applies function to each column auto result = df.apply([](const std::vector& col) { return std::accumulate(col.begin(), col.end(), 0.0); }, 0); bool passed = true; // Plan F·dtype: axis=0 reduce now returns a single "result" column // with the original column names ("A", "B") as the row index. // Sum of A: 1+2+3=6, Sum of B: 4+5+6=15 const auto& result_col = result["result"]; double sum_a = std::stod(result_col.get_value_str(0)); .. _example-dataframegroupby-apply_collect_scalar_results-34: .. dropdown:: apply_collect_scalar_results (pd_test_3_all.cpp:27341) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27331 :emphasize-lines: 11 std::vector values; for (const auto& key : keys) { auto sub = gb.get_group(key); double sum = 0; for (size_t r = 0; r < sub.nrows(); ++r) { sum += sub["B"].get_value_double(r); } values.push_back(sum); } auto result = gb.apply_collect_scalar_results(keys, values); check(result.size() == keys.size(), "scalar results size matches keys size"); bool found_bar = false, found_foo = false; for (size_t i = 0; i < result.size(); ++i) { std::string idx = result.index().get_value_str(i); if (idx == "bar") { check(result[i] == 6.0, "bar sum = 6"); found_bar = true; } if (idx == "foo") { check(result[i] == 9.0, "foo sum = 9"); found_foo = true; } } check(found_bar, "bar key found"); check(found_foo, "foo key found"); .. _example-dataframegroupby-apply_collect_series_results-35: .. dropdown:: apply_collect_series_results (pd_test_3_all.cpp:27376) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27366 :emphasize-lines: 11 auto sub = gb.get_group(key); double b_sum = 0, c_sum = 0; for (size_t r = 0; r < sub.nrows(); ++r) { b_sum += sub["B"].get_value_double(r); c_sum += sub["C"].get_value_double(r); } num_cols["B"].push_back(b_sum / sub.nrows()); num_cols["C"].push_back(c_sum / sub.nrows()); } auto result = gb.apply_collect_series_results(keys, col_names, num_cols, str_cols); check(result.ncols() == 2, "series results has 2 columns"); check(result.nrows() == keys.size(), "series results has correct rows"); check(result.has_column("B"), "has column B"); check(result.has_column("C"), "has column C"); } void pd_test_gb_apply_dataframe_results() { std::cout << " -- pd_test_gb_apply_dataframe_results --" << std::endl; auto df = make_test_df(); .. _example-dataframegroupby-apply_concat_dataframe_results-36: .. dropdown:: apply_concat_dataframe_results (pd_test_3_all.cpp:27398) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27388 :emphasize-lines: 11 std::vector keys = gb.group_keys_order(); std::vector dfs; std::set exclude; exclude.insert("A"); for (const auto& key : keys) { dfs.push_back(gb.get_group(key, exclude)); } auto result_gk = gb.apply_concat_dataframe_results(keys, dfs, true); check(result_gk.nrows() == df.nrows(), "concat with MI has all rows"); check(result_gk.has_multiindex(), "concat with group_keys=true has MultiIndex"); auto result_no_gk = gb.apply_concat_dataframe_results(keys, dfs, false); check(result_no_gk.nrows() == df.nrows(), "concat without MI has all rows"); } void pd_test_gb_filter_basic() { std::cout << " -- pd_test_gb_filter_basic --" << std::endl; .. _example-dataframegroupby-resample-37: .. dropdown:: resample (pd_test_1_all.cpp:20321) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20311 :emphasize-lines: 11 "2020-01-01 00:00:00", "2020-01-01 12:00:00", "2020-01-02 00:00:00", "2020-01-02 12:00:00", "2020-01-03 00:00:00", "2020-01-03 12:00:00" }; df.set_index(std::make_unique>(dates)); // Resample to daily auto resampler = df.resample("D"); pandas::DataFrame result = resampler.sum(); // Check that we got aggregated results bool passed = (result.nrows() <= df.nrows()); if (!passed) { std::cout << " [FAIL] : in pd_test_timeseries_resample_basic() : resample didn't reduce rows" << std::endl; throw std::runtime_error("pd_test_timeseries_resample_basic failed"); } .. _example-dataframegroupby-transform_named-38: .. dropdown:: transform_named (pd_test_3_all.cpp:27465) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27455 :emphasize-lines: 11 auto result_nodrop = gb.filter_by_group_mask(mask, false); check(result_nodrop.nrows() == 5, "dropna=false keeps all rows"); } void pd_test_gb_transform_same_shape() { std::cout << " -- pd_test_gb_transform_same_shape --" << std::endl; auto df = make_test_df(); auto gb = df.groupby("A"); auto result = gb.transform_named("sum"); check(result.nrows() == df.nrows(), "transform sum same nrows as input"); check(result["B"].get_value_double(0) == 9.0, "row 0 (foo) B sum = 9"); check(result["B"].get_value_double(1) == 6.0, "row 1 (bar) B sum = 6"); check(result["B"].get_value_double(2) == 9.0, "row 2 (foo) B sum = 9"); auto result_mean = gb.transform_named("mean"); check(result_mean.nrows() == df.nrows(), "transform mean same nrows"); check(result_mean["B"].get_value_double(0) == 3.0, "row 0 (foo) B mean = 3"); check(result_mean["B"].get_value_double(1) == 3.0, "row 1 (bar) B mean = 3"); .. _example-dataframegroupby-squeeze_result-39: .. dropdown:: squeeze_result (pd_test_2_all.cpp:20697) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20687 :emphasize-lines: 11 std::cout << " -- test_groupby_squeeze_single_col --" << std::endl; pandas::DataFrame df; df.add_column("key", std::vector{"A", "A", "B", "B"}); df.add_column("val", std::vector{1.0, 2.0, 3.0, 4.0}); auto gb = df.groupby("key"); auto gb_sel = gb.select({"val"}); // single col, not list pandas::DataFrame result = gb_sel.sum(); auto squeezed = gb_sel.squeeze_result(result); // Should be a Series check(std::holds_alternative>(squeezed), "is_float64_series"); auto& s = std::get>(squeezed); check(s.size() == 2, "size_2"); check(s.name() == "val", "name_val"); check(approx_eq(s[0], 3.0), "A_sum_3"); check(approx_eq(s[1], 7.0), "B_sum_7"); } .. _example-dataframegroupby-column-40: .. dropdown:: column (pd_test_1_all.cpp:22039) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 22029 :emphasize-lines: 11 std::string a1 = result.iat(1, col_a_idx) == -1.0 ? "ok" : "fail"; std::string a2 = result.iat(2, col_a_idx) == 3.0 ? "ok" : "fail"; std::string a3 = result.iat(3, col_a_idx) == 4.0 ? "ok" : "fail"; if (a0 != "ok" || a1 != "ok" || a2 != "ok" || a3 != "ok") { passed = false; error_msg = "Column A values incorrect: A[0]=" + a0 + ", A[1]=" + a1 + ", A[2]=" + a2 + ", A[3]=" + a3; } // Check B column (all should be original) double b0 = result.iat(0, col_b_idx); if (b0 != 5.0) { passed = false; error_msg = "B[0] should be 5, got " + std::to_string(b0); } if (!passed) { std::cout << " [FAIL] : in pd_test_where_basic() : " << error_msg << std::endl; throw std::runtime_error("pd_test_where_basic failed: " + error_msg); } .. _example-dataframegroupby-compute_agg-41: .. dropdown:: compute_agg (pd_test_5_all.cpp:112204) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 112194 :emphasize-lines: 11 // Default signature is groupby(by, axis, level, as_index, sort, group_keys, observed, dropna). auto gb = df_in.groupby("k", 0, std::nullopt, /*as_index=*/true, /*sort=*/true, /*group_keys=*/true, /*observed=*/false, /*dropna=*/true); pandas::DataFrame df = gb.agg("sum"); std::string actual = df.to_string(); // Pandas oracle (verified by analysis1 H3 logic + compute_agg empty=0.0): // - "a" observed, sum=10 // - "b" observed, sum=20 // - "c" unobserved -> compute_agg(empty, "sum") -> 0 // Plan 12 (Logic-C int widening) has landed: aggregate_column now // preserves int64 for integer inputs, so the oracle is int64 with // integer literal display (no .0 suffix). std::string expected = " v\n" "k \n" "a 10\n" "b 20\n" "c 0"; check_case("groupby_agg_dispatch_7c3a91_case_41", .. _example-dataframegroupby-dataframe-42: .. dropdown:: dataframe (pd_test_2_all.cpp:11742) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11732 :emphasize-lines: 11 std::cout << " [FAIL] : wrong dimensions" << std::endl; std::remove(temp_path.c_str()); throw std::runtime_error("pd_test_to_hdf_mixed_types failed"); } std::remove(temp_path.c_str()); std::cout << " -> tests passed" << std::endl; } void pd_test_to_hdf_empty_dataframe() { std::cout << "========= to_hdf empty dataframe (real HDF5) ==================="; pandas::DataFrame df; std::string temp_path = "temp/test_hdf5_empty.h5"; df.to_hdf(temp_path, "df", "w"); // Just verify file was created std::ifstream file(temp_path); if (!file.is_open()) { std::cout << " [FAIL] : file not created" << std::endl; throw std::runtime_error("pd_test_to_hdf_empty_dataframe failed"); .. _example-dataframegroupby-filter-43: .. dropdown:: filter (pd_test_3_all.cpp:2805) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 2795 :emphasize-lines: 11 threw = true; } if (!threw) { throw std::runtime_error("bool_() should throw for multi-element DataFrame"); } std::cout << " -> tests passed" << std::endl; } void pd_test_3_all_df_filter() { std::cout << "========= DataFrame.filter() ============================="; std::map> data = { {"col_a", {1.0, 2.0, 3.0}}, {"col_b", {4.0, 5.0, 6.0}}, {"other", {7.0, 8.0, 9.0}} }; pandas::DataFrame df(data); // Test filter by items pandas::DataFrame filtered_items = df.filter({"col_a", "col_b"}); .. _example-dataframegroupby-filter_by_group_mask-44: .. dropdown:: filter_by_group_mask (pd_test_3_all.cpp:27422) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27412 :emphasize-lines: 11 std::map mask; for (const auto& key : gb.group_keys_order()) { auto sub = gb.get_group(key); double sum = 0; for (size_t r = 0; r < sub.nrows(); ++r) { sum += sub["B"].get_value_double(r); } mask[key] = (sum > 5); } auto result = gb.filter_by_group_mask(mask, true); check(result.nrows() == 5, "all rows pass filter (both groups sum > 5)"); std::map mask3; mask3["bar"] = false; mask3["foo"] = true; auto result3 = gb.filter_by_group_mask(mask3, true); check(result3.nrows() == 3, "only foo rows kept (3 rows)"); } void pd_test_gb_filter_preserves_order() { .. _example-dataframegroupby-group_keys_order-45: .. dropdown:: group_keys_order (pd_test_3_all.cpp:23393) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 23383 :emphasize-lines: 11 pandas::Series s({10.0, 20.0, 30.0, 40.0}); std::vector> level_values = { {"a", "a", "b", "b"}, {"x", "y", "x", "y"} }; std::vector> level_names = {"first", "second"}; auto mi = pandas::MultiIndex::from_arrays(level_values, level_names); s.set_multiindex(mi); auto gb = s.groupby_by_level(static_cast(0), true); if (gb.group_keys_order().size() != 2) throw std::runtime_error("expected 2 groups"); auto sums = gb.sum(); if (sums[0] != 30.0 || sums[1] != 70.0) throw std::runtime_error("sum mismatch"); if (!gb.get_index_name().has_value() || *gb.get_index_name() != "first") throw std::runtime_error("index name mismatch"); std::cout << " -> tests passed" << std::endl; } .. _example-dataframegroupby-groups-46: .. dropdown:: groups (pd_test_2_all.cpp:20864) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20854 :emphasize-lines: 11 // ===================================================================== // Per-group expanding tests // ===================================================================== void test_series_groupby_expanding_sum() { std::cout << " -- test_series_groupby_expanding_sum --" << std::endl; // Two groups: A=[1,2,3], B=[10,20] std::vector vals = {1.0, 10.0, 2.0, 20.0, 3.0}; pandas::Series data(vals); pandas::Series groups({"A", "B", "A", "B", "A"}); auto sgb = data.groupby(groups); pandas::SeriesGroupByExpandingWindow ew(sgb, 1); auto result = ew.sum(); check(result.size() == 5, "size_5"); // A group: expanding sum = 1, 3, 6 // B group: expanding sum = 10, 30 // Original order: [A:1, B:10, A:3, B:30, A:6] check(approx_eq(result[0], 1.0), "A_exp_sum_0"); .. _example-dataframegroupby-list_selected-47: .. dropdown:: list_selected (pd_test_5_all.cpp:28524) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28514 :emphasize-lines: 11 } void case_1_squeeze_flag_state_machine(int& local_fail) { std::cout << "-- H1 squeeze flag state machine\n"; auto df = make_df_std(); auto gb0 = df.groupby("key"); // (a) Base gb -> no selection -> squeeze false. pandas_tests::check(!gb0.should_squeeze_to_series(), "H1.a.base_no_select_squeeze_false", local_fail); pandas_tests::check(!gb0.list_selected(), "H1.a.base_list_selected_false", local_fail); check_eq("H1.a.base_selected_size_zero", 0, (long long)gb0.selected_columns().size(), local_fail); // (b) select({c}) -> squeeze true. auto gb1 = gb0.select({"v_int"}); pandas_tests::check(gb1.should_squeeze_to_series(), "H1.b.select_single_squeeze_true", local_fail); pandas_tests::check(!gb1.list_selected(), "H1.b.select_list_selected_false", local_fail); .. _example-dataframegroupby-ngroups-48: .. dropdown:: ngroups (pd_test_1_all.cpp:11497) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11487 :emphasize-lines: 11 // Create DataFrame with category column std::map> data = { {"category", {1.0, 1.0, 2.0, 2.0, 2.0}}, {"value", {10.0, 20.0, 30.0, 40.0, 50.0}} }; pandas::DataFrame df(data); // Test groupby auto grouped = df.groupby("category"); bool passed = grouped.ngroups() == 2; if (!passed) { std::cout << " [FAIL] : in pd_test_groupby_basic() : ngroups should be 2" << std::endl; throw std::runtime_error("pd_test_groupby_basic failed: ngroups should be 2"); } std::cout << " -> tests passed" << std::endl; } void pd_test_groupby_multiple_columns() { std::cout << "========= GroupBy multiple columns =============="; .. _example-dataframegroupby-nth-49: .. dropdown:: nth (pd_test_3_all.cpp:27491) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27481 :emphasize-lines: 11 check(result_cumsum["B"].get_value_double(1) == 2.0, "row 1 (bar) cumsum B = 2"); check(result_cumsum["B"].get_value_double(3) == 6.0, "row 3 (bar) cumsum B = 6"); } void pd_test_gb_nth_basic() { std::cout << " -- pd_test_gb_nth_basic --" << std::endl; auto df = make_test_df(); auto gb = df.groupby("A"); auto result = gb.nth(0); check(result.nrows() == 2, "nth(0) returns 2 rows (one per group)"); auto result_last = gb.nth(-1); check(result_last.nrows() == 2, "nth(-1) returns 2 rows"); auto result_multi = gb.nth(std::vector{0, -1}); check(result_multi.nrows() == 4, "nth([0,-1]) returns 4 rows"); } void pd_test_gb_nth_slice() { .. _example-dataframegroupby-nth-50: .. dropdown:: nth (pd_test_3_all.cpp:27491) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27481 :emphasize-lines: 11 check(result_cumsum["B"].get_value_double(1) == 2.0, "row 1 (bar) cumsum B = 2"); check(result_cumsum["B"].get_value_double(3) == 6.0, "row 3 (bar) cumsum B = 6"); } void pd_test_gb_nth_basic() { std::cout << " -- pd_test_gb_nth_basic --" << std::endl; auto df = make_test_df(); auto gb = df.groupby("A"); auto result = gb.nth(0); check(result.nrows() == 2, "nth(0) returns 2 rows (one per group)"); auto result_last = gb.nth(-1); check(result_last.nrows() == 2, "nth(-1) returns 2 rows"); auto result_multi = gb.nth(std::vector{0, -1}); check(result_multi.nrows() == 4, "nth([0,-1]) returns 4 rows"); } void pd_test_gb_nth_slice() { .. _example-dataframegroupby-select-51: .. dropdown:: select (pd_test_2_all.cpp:20694) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20684 :emphasize-lines: 11 // ===================================================================== void test_groupby_squeeze_single_col() { std::cout << " -- test_groupby_squeeze_single_col --" << std::endl; pandas::DataFrame df; df.add_column("key", std::vector{"A", "A", "B", "B"}); df.add_column("val", std::vector{1.0, 2.0, 3.0, 4.0}); auto gb = df.groupby("key"); auto gb_sel = gb.select({"val"}); // single col, not list pandas::DataFrame result = gb_sel.sum(); auto squeezed = gb_sel.squeeze_result(result); // Should be a Series check(std::holds_alternative>(squeezed), "is_float64_series"); auto& s = std::get>(squeezed); check(s.size() == 2, "size_2"); check(s.name() == "val", "name_val"); .. _example-dataframegroupby-select_as_list-52: .. dropdown:: select_as_list (pd_test_2_all.cpp:20751) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20741 :emphasize-lines: 11 } void test_groupby_no_squeeze_list_key() { std::cout << " -- test_groupby_no_squeeze_list_key --" << std::endl; pandas::DataFrame df; df.add_column("key", std::vector{"A", "A", "B", "B"}); df.add_column("val", std::vector{1.0, 2.0, 3.0, 4.0}); auto gb = df.groupby("key"); auto gb_sel = gb.select_as_list({"val"}); // list selection -> no squeeze pandas::DataFrame result = gb_sel.sum(); auto squeezed = gb_sel.squeeze_result(result); check(std::holds_alternative(squeezed), "is_monostate_list_sel"); } // ===================================================================== // apply_result_index tests (MultiIndex reconstruction) // ===================================================================== .. _example-dataframegroupby-select_rows_by_indices-53: .. dropdown:: select_rows_by_indices (pd_test_3_all.cpp:27515) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 27505 :emphasize-lines: 11 auto gb = df.groupby("A"); std::vector selected; for (const auto& key : gb.group_keys_order()) { const auto& indices = gb.groups().at(key); for (size_t i = 0; i < std::min(size_t(2), indices.size()); ++i) { selected.push_back(indices[i]); } } auto result = gb.select_rows_by_indices(selected); check(result.nrows() == 4, "slice [0:2] returns 4 rows"); } void pd_test_gb_nth_dropna() { std::cout << " -- pd_test_gb_nth_dropna --" << std::endl; std::map> data; data["B"] = {std::numeric_limits::quiet_NaN(), 2.0, 3.0, 4.0, 5.0}; data["C"] = {10.0, 20.0, 30.0, 40.0, 50.0}; pandas::DataFrame df(data); .. _example-dataframegroupby-selected_columns-54: .. dropdown:: selected_columns (pd_test_5_all.cpp:28527) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28517 :emphasize-lines: 11 std::cout << "-- H1 squeeze flag state machine\n"; auto df = make_df_std(); auto gb0 = df.groupby("key"); // (a) Base gb -> no selection -> squeeze false. pandas_tests::check(!gb0.should_squeeze_to_series(), "H1.a.base_no_select_squeeze_false", local_fail); pandas_tests::check(!gb0.list_selected(), "H1.a.base_list_selected_false", local_fail); check_eq("H1.a.base_selected_size_zero", 0, (long long)gb0.selected_columns().size(), local_fail); // (b) select({c}) -> squeeze true. auto gb1 = gb0.select({"v_int"}); pandas_tests::check(gb1.should_squeeze_to_series(), "H1.b.select_single_squeeze_true", local_fail); pandas_tests::check(!gb1.list_selected(), "H1.b.select_list_selected_false", local_fail); // (c) select_as_list({c}) 1-col -> squeeze false (DataFrame-style). auto gb2 = gb0.select_as_list({"v_int"}); .. _example-dataframegroupby-should_squeeze_to_series-55: .. dropdown:: should_squeeze_to_series (pd_test_5_all.cpp:28522) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28512 :emphasize-lines: 11 std::vector{"level_0", "level_1"}); return df; } void case_1_squeeze_flag_state_machine(int& local_fail) { std::cout << "-- H1 squeeze flag state machine\n"; auto df = make_df_std(); auto gb0 = df.groupby("key"); // (a) Base gb -> no selection -> squeeze false. pandas_tests::check(!gb0.should_squeeze_to_series(), "H1.a.base_no_select_squeeze_false", local_fail); pandas_tests::check(!gb0.list_selected(), "H1.a.base_list_selected_false", local_fail); check_eq("H1.a.base_selected_size_zero", 0, (long long)gb0.selected_columns().size(), local_fail); // (b) select({c}) -> squeeze true. auto gb1 = gb0.select({"v_int"}); pandas_tests::check(gb1.should_squeeze_to_series(), "H1.b.select_single_squeeze_true", local_fail); .. _example-dataframegroupby-size-56: .. dropdown:: size (pd_test_1_all.cpp:22) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 12 :emphasize-lines: 11 #include "../pandas/pd_boolean_array.h" namespace dataframe_tests { namespace dataframe_tests_boolean_array { void pd_test_boolean_array_constructors() { std::cout << "========= BooleanArray: constructors ======================= "; // Default constructor pandas::BooleanArray arr1; if (arr1.size() != 0) { std::cout << " [FAIL] : in pd_test_boolean_array_constructors() : default constructor size != 0" << std::endl; throw std::runtime_error("pd_test_boolean_array_constructors failed: default constructor size != 0"); } // Initializer list constructor pandas::BooleanArray arr2({ std::optional(true), std::optional(false), std::nullopt, std::optional(true)