ExtensionIndex ============== .. cpp:class:: pandas::ExtensionIndex Index class for axis labels in pandas data structures. Example ------- .. code-block:: cpp #include using namespace pandas; // Create ExtensionIndex ExtensionIndex idx({1, 2, 3}, "my_index"); size_t len = idx.size(); Constructors ------------ .. list-table:: :widths: 55 25 20 :header-rows: 1 * - Signature - Location - Example * - ``explicit ExtensionIndex(const ArrayType& array, const std::optional& name = std::nullopt, bool copy = false)`` - pd_extension_index.h:350 - * - ``explicit ExtensionIndex(ArrayType&& array, const std::optional& name = std::nullopt)`` - pd_extension_index.h:361 - * - ``ExtensionIndex(const ExtensionIndex& other)`` - pd_extension_index.h:371 - * - ``ExtensionIndex(ExtensionIndex&& other) noexcept = default`` - pd_extension_index.h:385 - Indexing / Selection -------------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``numpy::NDArray get_indexer( const ExtensionIndex& target, const std::string& method = "", std::optional limit = std::nullopt, std::optional tolerance = std::nullopt) const`` - numpy::NDArray - pd_extension_index.h:709 - :ref:`View ` * - ``numpy::NDArray get_indexer_for( const std::vector& values, const std::vector\* target = nullptr) const`` - numpy::NDArray - pd_extension_index.h:747 - :ref:`View ` * - ``std::variant> get_loc(const value_type& key) const`` - std::variant> - pd_extension_index.h:687 - :ref:`View ` * - ``int64_t get_loc_str(const std::string& key_str) const override`` - int64_t - pd_extension_index.h:525 - :ref:`View ` * - ``std::optional get_loc_string(const std::string& key) const override`` - std::optional - pd_extension_index.h:547 - :ref:`View ` * - ``std::string get_value_str(size_t index) const override`` - std::string - pd_extension_index.h:572 - :ref:`View ` * - ``oss << get_value_str(i)`` - oss << - pd_extension_index.h:607 - :ref:`View ` * - ``ExtensionIndex take(const std::vector& indices, int axis = 0, bool allow_fill = false, std::optional fill_value = std::nullopt) const`` - ExtensionIndex - pd_extension_index.h:838 - :ref:`View ` * - ``ExtensionIndex take_impl(const std::vector& indices) const`` - ExtensionIndex - pd_extension_index.h:1569 - Data Manipulation ----------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``ExtensionIndex drop(const std::vector& labels, const std::string& errors = "raise") const`` - ExtensionIndex - pd_extension_index.h:895 - :ref:`View ` * - ``ExtensionIndex drop_duplicates(const std::string& keep = "first") const`` - ExtensionIndex - pd_extension_index.h:1020 - :ref:`View ` * - ``ExtensionIndex dropna() const`` - ExtensionIndex - pd_extension_index.h:822 - :ref:`View ` * - ``ExtensionIndex rename(const std::optional& new_name) const`` - ExtensionIndex - pd_extension_index.h:982 - :ref:`View ` Missing Data ------------ .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``ExtensionIndex fillna(const value_type& value, const std::string& downcast = "") const`` - ExtensionIndex - pd_extension_index.h:812 - :ref:`View ` * - ``numpy::NDArray isna() const`` - numpy::NDArray - pd_extension_index.h:774 - :ref:`View ` * - ``numpy::NDArray isnull() const`` - numpy::NDArray - pd_extension_index.h:781 - :ref:`View ` * - ``numpy::NDArray notna() const`` - numpy::NDArray - pd_extension_index.h:788 - :ref:`View ` * - ``numpy::NDArray notnull() const`` - numpy::NDArray - pd_extension_index.h:795 - :ref:`View ` Statistics ---------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``size_t nunique(bool dropna = true) const`` - size_t - pd_extension_index.h:1128 - :ref:`View ` * - ``std::pair, std::vector> value_counts( bool dropna = true, bool ascending = false, const int\* bins = nullptr, bool normalize = false, bool sort = true) const`` - std::pair, std::vector> - pd_extension_index.h:1142 - :ref:`View ` Comparison ---------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``bool equals(const ExtensionIndex& other) const`` - bool - pd_extension_index.h:1489 - :ref:`View ` * - ``ArrayType new_array(values)`` - ArrayType - pd_extension_index.h:1587 - Sorting ------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``numpy::NDArray argsort(bool ascending = true) const`` - numpy::NDArray - pd_extension_index.h:1385 - :ref:`View ` * - ``ExtensionIndex sort_values(bool ascending = true, const std::string& na_position = "last", bool return_indexer = false, std::nullptr_t key = nullptr) const`` - ExtensionIndex - pd_extension_index.h:1428 - :ref:`View ` Combining --------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``ExtensionIndex append(const ExtensionIndex& other) const`` - ExtensionIndex - pd_extension_index.h:1210 - :ref:`View ` Time Series ----------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``ExtensionIndex difference(const ExtensionIndex& other, bool sort = true) const`` - ExtensionIndex - pd_extension_index.h:1311 - :ref:`View ` I/O --- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector> to_list() const`` - std::vector> - pd_extension_index.h:658 - :ref:`View ` * - ``std::string to_string() const override`` - std::string - pd_extension_index.h:600 - :ref:`View ` * - ``std::vector to_string_vector() const override`` - std::vector - pd_extension_index.h:560 - :ref:`View ` Conversion ---------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``ExtensionIndex copy() const`` - ExtensionIndex - pd_extension_index.h:975 - :ref:`View ` Set Operations -------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``numpy::NDArray duplicated(const std::string& keep = "first") const`` - numpy::NDArray - pd_extension_index.h:1069 - :ref:`View ` * - ``ExtensionIndex intersection(const ExtensionIndex& other, bool sort = false) const`` - ExtensionIndex - pd_extension_index.h:1218 - :ref:`View ` * - ``numpy::NDArray isin(const std::vector& values, std::optional level = std::nullopt) const`` - numpy::NDArray - pd_extension_index.h:870 - :ref:`View ` * - ``ExtensionIndex symmetric_difference(const ExtensionIndex& other, bool sort = false, const std::string\* result_name = nullptr) const`` - ExtensionIndex - pd_extension_index.h:1339 - :ref:`View ` * - ``ExtensionIndex union_(const ExtensionIndex& other, bool sort = false) const`` - ExtensionIndex - pd_extension_index.h:1251 - :ref:`View ` * - ``ExtensionIndex unique(std::optional level = std::nullopt) const`` - ExtensionIndex - pd_extension_index.h:992 - :ref:`View ` Type Checking ------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``bool is_monotonic_decreasing() const`` - bool - pd_extension_index.h:1371 - :ref:`View ` * - ``bool is_monotonic_increasing() const`` - bool - pd_extension_index.h:1361 - :ref:`View ` * - ``bool is_unique() const override`` - bool - pd_extension_index.h:481 - :ref:`View ` Other Methods ------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``const ArrayType& array() const`` - const ArrayType& - pd_extension_index.h:651 - :ref:`View ` * - ``void build_hash_table() const`` - void - pd_extension_index.h:209 - * - ``size_t cache_memory_usage() const override`` - size_t - pd_extension_index.h:1548 - * - ``void clear_cache() const override`` - void - pd_extension_index.h:1526 - :ref:`View ` * - ``std::unique_ptr clone() const override`` - std::unique_ptr - pd_extension_index.h:589 - :ref:`View ` * - ``void compute_monotonicity() const`` - void - pd_extension_index.h:295 - * - ``bool contains(const value_type& key) const`` - bool - pd_extension_index.h:676 - :ref:`View ` * - ``bool contains_str(const std::string& key_str) const override`` - bool - pd_extension_index.h:513 - :ref:`View ` * - ``value_type convert_from_string(const std::string& str) const`` - value_type - pd_extension_index.h:236 - * - ``std::string convert_to_string(const value_type& val) const`` - std::string - pd_extension_index.h:274 - * - ``ExtensionIndex delete_(size_t loc) const`` - ExtensionIndex - pd_extension_index.h:937 - :ref:`View ` * - ``ExtensionIndex delete_(const std::vector& locs) const`` - ExtensionIndex - pd_extension_index.h:955 - :ref:`View ` * - ``std::string dtype_name() const override`` - std::string - pd_extension_index.h:437 - :ref:`View ` * - ``bool empty() const override`` - bool - pd_extension_index.h:423 - :ref:`View ` * - ``void ensure_hash_table() const`` - void - pd_extension_index.h:227 - * - ``std::pair, ExtensionIndex> factorize() const`` - std::pair, ExtensionIndex> - pd_extension_index.h:1169 - :ref:`View ` * - ``bool has_cached_values() const override`` - bool - pd_extension_index.h:1535 - :ref:`View ` * - ``bool has_duplicates() const override`` - bool - pd_extension_index.h:499 - :ref:`View ` * - ``bool hasnans() const`` - bool - pd_extension_index.h:802 - :ref:`View ` * - ``bool identical(const ExtensionIndex& other) const`` - bool - pd_extension_index.h:1512 - :ref:`View ` * - ``std::string inferred_type() const override`` - std::string - pd_extension_index.h:458 - :ref:`View ` * - ``void invalidate_caches() const`` - void - pd_extension_index.h:196 - * - ``std::optional name() const override`` - std::optional - pd_extension_index.h:444 - :ref:`View ` * - ``size_t nbytes() const override`` - size_t - pd_extension_index.h:430 - :ref:`View ` * - ``std::string repr() const override`` - std::string - pd_extension_index.h:626 - :ref:`View ` * - ``ExtensionIndex result(\*this)`` - ExtensionIndex - pd_extension_index.h:983 - :ref:`View ` * - ``void set_name(const std::optional& name) override`` - void - pd_extension_index.h:451 - :ref:`View ` * - ``size_t size() const override`` - size_t - pd_extension_index.h:416 - :ref:`View ` * - ``IndexTypeId type_id() const override`` - IndexTypeId - pd_extension_index.h:593 - :ref:`View ` * - ``ArrayType values() const`` - ArrayType - pd_extension_index.h:644 - :ref:`View ` Code Examples ------------- The following examples are extracted from the test suite. .. _example-extensionindex-get_indexer-0: .. dropdown:: get_indexer (pd_test_1_all.cpp:10332) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10322 :emphasize-lines: 11 void pd_test_extension_index_get_indexer() { std::cout << "========= get_indexer ========================="; pandas::CategoricalArray arr1({"a", "b", "c", "d"}); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalArray arr2({"b", "d", "x"}); pandas::CategoricalIndex idx2(arr2); auto indexer = idx1.get_indexer(idx2); bool passed = (indexer.getSize() == 3 && indexer.getElementAt({0}) == 1 && indexer.getElementAt({1}) == 3 && indexer.getElementAt({2}) == -1); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_get_indexer() : get_indexer check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_get_indexer failed"); } .. _example-extensionindex-get_indexer_for-1: .. dropdown:: get_indexer_for (pd_test_3_all.cpp:716) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 706 :emphasize-lines: 11 // ============================================================================ // Category 6: Index Indexer Methods // ============================================================================ void pd_test_3_all_index_indexers() { std::cout << "========= Index.get_indexer_for/non_unique/slice_indexer() "; std::vector vals = {"a", "b", "c", "d", "e"}; pandas::Index idx(vals); // Test get_indexer_for() std::vector target = {"b", "d", "f"}; // "f" doesn't exist numpy::NDArray indexer = idx.get_indexer_for(target); if (indexer.getSize() != 3) { std::cout << " [FAIL] : in pd_test_3_all_index_indexers() : get_indexer_for size mismatch" << std::endl; throw std::runtime_error("pd_test_3_all_index_indexers failed: get_indexer_for size"); } // "b" is at index 1 if (indexer.getElementAt({0}) != 1) { std::cout << " [FAIL] : in pd_test_3_all_index_indexers() : 'b' should be at index 1" << std::endl; throw std::runtime_error("pd_test_3_all_index_indexers failed: 'b' index"); .. _example-extensionindex-get_loc-2: .. dropdown:: get_loc (pd_test_1_all.cpp:10281) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10271 :emphasize-lines: 11 bool passed = (idx.contains("apple") && idx.contains("banana") && !idx.contains("grape")); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_contains() : contains check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_contains failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_get_loc_unique() { std::cout << "========= get_loc (unique) ========================="; pandas::CategoricalArray arr({"apple", "banana", "cherry"}); pandas::CategoricalIndex idx(arr); auto loc_apple = idx.get_loc("apple"); auto loc_banana = idx.get_loc("banana"); auto loc_cherry = idx.get_loc("cherry"); bool passed = (std::holds_alternative(loc_apple) && std::get(loc_apple) == 0 && std::get(loc_banana) == 1 && .. _example-extensionindex-get_loc_str-3: .. dropdown:: get_loc_str (pd_test_1_all.cpp:10890) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10880 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_contains_str_get_loc_str() { std::cout << "========= contains_str/get_loc_str ========================="; pandas::CategoricalArray arr({"apple", "banana", "cherry"}); pandas::CategoricalIndex idx(arr); bool passed = (idx.contains_str("apple") && !idx.contains_str("grape") && idx.get_loc_str("banana") == 1 && idx.get_loc_str("grape") == -1); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_contains_str_get_loc_str() : contains_str/get_loc_str check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_contains_str_get_loc_str failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_repr() { std::cout << "========= repr ========================="; .. _example-extensionindex-get_loc_string-4: .. dropdown:: get_loc_string (pd_test_3_all.cpp:28108) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28098 :emphasize-lines: 11 vals.push_back(numpy::timedelta64(ns, numpy::DateTimeUnit::Nanosecond)); } return pandas::TimedeltaArray(vals); } void pd_test_getitem_timedelta_str_lookup() { std::cout << " -- pd_test_getitem_timedelta_str_lookup --" << std::endl; int fail = 0; auto tda = ge_make_tda({1 * GE_NS_PER_DAY, 2 * GE_NS_PER_DAY, 3 * GE_NS_PER_DAY}); pandas::TimedeltaIndex tdi(tda); auto pos = tdi.get_loc_string("2 days"); if (!pos.has_value()) { std::cout << " FAIL: '2 days' not found" << std::endl; fail++; } else if (*pos != 1) { std::cout << " FAIL: expected pos=1, got " << *pos << std::endl; fail++; } if (fail == 0) std::cout << " OK" << std::endl; if (fail) throw std::runtime_error("pd_test_getitem_timedelta_str_lookup failed"); } void pd_test_getitem_timedelta_str_not_found() { std::cout << " -- pd_test_getitem_timedelta_str_not_found --" << std::endl; int fail = 0; auto tda = ge_make_tda({1 * GE_NS_PER_DAY}); .. _example-extensionindex-get_value_str-5: .. dropdown:: get_value_str (pd_test_1_all.cpp:4665) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 4655 :emphasize-lines: 11 auto corr_df = df.corr(); // Check dimensions bool passed = corr_df.nrows() == 2 && corr_df.ncols() == 2; if (!passed) { std::cout << " [FAIL] : in pd_test_aggregation_dataframe_corr() : corr should be 2x2" << std::endl; throw std::runtime_error("pd_test_aggregation_dataframe_corr failed: corr should be 2x2"); } // Diagonal should be 1.0 std::string aa = corr_df["A"].get_value_str(0); passed = std::abs(std::stod(aa) - 1.0) < 0.001; if (!passed) { std::cout << " [FAIL] : in pd_test_aggregation_dataframe_corr() : diagonal should be 1.0" << std::endl; throw std::runtime_error("pd_test_aggregation_dataframe_corr failed: diagonal should be 1.0"); } // A-B correlation should be 1.0 (perfect correlation) std::string ab = corr_df["B"].get_value_str(0); passed = std::abs(std::stod(ab) - 1.0) < 0.001; if (!passed) { .. _example-extensionindex-get_value_str-6: .. dropdown:: get_value_str (pd_test_1_all.cpp:4665) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 4655 :emphasize-lines: 11 auto corr_df = df.corr(); // Check dimensions bool passed = corr_df.nrows() == 2 && corr_df.ncols() == 2; if (!passed) { std::cout << " [FAIL] : in pd_test_aggregation_dataframe_corr() : corr should be 2x2" << std::endl; throw std::runtime_error("pd_test_aggregation_dataframe_corr failed: corr should be 2x2"); } // Diagonal should be 1.0 std::string aa = corr_df["A"].get_value_str(0); passed = std::abs(std::stod(aa) - 1.0) < 0.001; if (!passed) { std::cout << " [FAIL] : in pd_test_aggregation_dataframe_corr() : diagonal should be 1.0" << std::endl; throw std::runtime_error("pd_test_aggregation_dataframe_corr failed: diagonal should be 1.0"); } // A-B correlation should be 1.0 (perfect correlation) std::string ab = corr_df["B"].get_value_str(0); passed = std::abs(std::stod(ab) - 1.0) < 0.001; if (!passed) { .. _example-extensionindex-take-7: .. dropdown:: take (pd_test_1_all.cpp:5903) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5893 :emphasize-lines: 11 // Inherited Operations Tests // ============================================================================ void pd_test_categorical_index_take() { std::cout << "========= inherited take =============================="; pandas::CategoricalArray arr({"a", "b", "c", "d"}); pandas::CategoricalIndex idx(arr); std::vector indices = {0, 2, 3}; pandas::ExtensionIndex taken = idx.take(indices); bool passed = (taken.size() == 3); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_take()" << std::endl; throw std::runtime_error("pd_test_categorical_index_take failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-drop-8: .. dropdown:: drop (pd_test_1_all.cpp:6558) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6548 :emphasize-lines: 11 if (df.ncols() != 2) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : pop ncols != 2" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: pop ncols != 2"); } if (!popped) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : popped is null" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: popped is null"); } // Test drop columns auto dropped = df.drop(std::vector{"B"}, 1); if (dropped.ncols() != 1) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : drop ncols != 1" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: drop ncols != 1"); } // Test rename auto renamed = df.rename_columns(std::map{{"A", "X"}}); if (!renamed.has_column("X")) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : rename failed" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: rename failed"); .. _example-extensionindex-drop_duplicates-9: .. dropdown:: drop_duplicates (pd_test_1_all.cpp:6639) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6629 :emphasize-lines: 11 } } // Test drop_duplicates { std::map> dup_data; dup_data["A"] = {1, 1, 2, 2}; dup_data["B"] = {1, 1, 2, 3}; pandas::DataFrame df_dup(dup_data); auto deduped = df_dup.drop_duplicates(); // Rows 0 and 1 are duplicates (A=1, B=1), so should have 3 rows if (deduped.nrows() != 3) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : drop_duplicates nrows != 3, got " << deduped.nrows() << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: drop_duplicates"); } } // Test assign { std::map> assign_data; .. _example-extensionindex-dropna-10: .. dropdown:: dropna (pd_test_1_all.cpp:531) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 521 :emphasize-lines: 11 } // Test isna array numpy::NDArray na_mask = arr.isna(); if (na_mask.getSize() != 4) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : isna size != 4" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: isna size != 4"); } // Test dropna pandas::CategoricalArray dropped = arr.dropna(); if (dropped.size() != 2) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : dropna size != 2" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: dropna size != 2"); } // Test fillna (fill with existing category) pandas::CategoricalArray filled = arr.fillna("a"); // 'a' is in categories if (filled.has_na()) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : fillna should have no NA" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: fillna should have no NA"); .. _example-extensionindex-rename-11: .. dropdown:: rename (pd_test_1_all.cpp:5816) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5806 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_rename() { std::cout << "========= rename ======================================"; pandas::CategoricalArray arr({"x", "y"}); pandas::CategoricalIndex idx(arr, "old_name"); pandas::CategoricalIndex renamed = idx.rename("new_name"); bool passed = (renamed.name().has_value() && *renamed.name() == "new_name" && renamed.size() == idx.size() && renamed.categories() == idx.categories()); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_rename()" << std::endl; throw std::runtime_error("pd_test_categorical_index_rename failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-fillna-12: .. dropdown:: fillna (pd_test_1_all.cpp:537) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 527 :emphasize-lines: 11 throw std::runtime_error("pd_test_categorical_array_na_handling failed: isna size != 4"); } // Test dropna pandas::CategoricalArray dropped = arr.dropna(); if (dropped.size() != 2) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : dropna size != 2" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: dropna size != 2"); } // Test fillna (fill with existing category) pandas::CategoricalArray filled = arr.fillna("a"); // 'a' is in categories if (filled.has_na()) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : fillna should have no NA" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: fillna should have no NA"); } std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_array_add_categories() { .. _example-extensionindex-isna-13: .. dropdown:: isna (pd_test_1_all.cpp:524) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 514 :emphasize-lines: 11 throw std::runtime_error("pd_test_categorical_array_na_handling failed: has_na() should be true"); } // Test count (non-NA) if (arr.count() != 2) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : count() != 2" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: count() != 2"); } // Test isna array numpy::NDArray na_mask = arr.isna(); if (na_mask.getSize() != 4) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : isna size != 4" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: isna size != 4"); } // Test dropna pandas::CategoricalArray dropped = arr.dropna(); if (dropped.size() != 2) { std::cout << " [FAIL] : in pd_test_categorical_array_na_handling() : dropna size != 2" << std::endl; throw std::runtime_error("pd_test_categorical_array_na_handling failed: dropna size != 2"); .. _example-extensionindex-isnull-14: .. dropdown:: isnull (pd_test_3_all.cpp:671) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 661 :emphasize-lines: 11 // Category 5: Index Null Detection // ============================================================================ void pd_test_3_all_index_null_detection() { std::cout << "========= Index.isnull/notnull() ====================="; // Test with float index (can have NaN) std::vector vals = {1.0, std::nan(""), 3.0, std::nan("")}; pandas::Index idx(vals); numpy::NDArray isnull_result = idx.isnull(); if (isnull_result.getSize() != 4) { std::cout << " [FAIL] : in pd_test_3_all_index_null_detection() : isnull() size mismatch" << std::endl; throw std::runtime_error("pd_test_3_all_index_null_detection failed: isnull() size"); } // Index 0: 1.0 -> not null if (isnull_result.getElementAt({0})) { std::cout << " [FAIL] : in pd_test_3_all_index_null_detection() : index 0 should not be null" << std::endl; throw std::runtime_error("pd_test_3_all_index_null_detection failed: index 0"); } // Index 1: NaN -> null .. _example-extensionindex-notna-15: .. dropdown:: notna (pd_test_1_all.cpp:6595) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6585 :emphasize-lines: 11 if (!na_mask.getElementAt({2, 1})) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : isna at (2,1) should be true" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: isna at (2,1)"); } // Row 0, col 0 should NOT be NA if (na_mask.getElementAt({0, 0})) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : isna at (0,0) should be false" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: isna at (0,0)"); } auto notna_mask = df_na.notna(); if (notna_mask.getElementAt({1, 0})) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : notna at (1,0) should be false" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: notna at (1,0)"); } } // Test fillna { std::map> float_data; float_data["X"] = {1.0, std::nan(""), 3.0}; .. _example-extensionindex-notnull-16: .. dropdown:: notnull (pd_test_3_all.cpp:665) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 655 :emphasize-lines: 11 } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Category 5: Index Null Detection // ============================================================================ void pd_test_3_all_index_null_detection() { std::cout << "========= Index.isnull/notnull() ====================="; // Test with float index (can have NaN) std::vector vals = {1.0, std::nan(""), 3.0, std::nan("")}; pandas::Index idx(vals); numpy::NDArray isnull_result = idx.isnull(); if (isnull_result.getSize() != 4) { std::cout << " [FAIL] : in pd_test_3_all_index_null_detection() : isnull() size mismatch" << std::endl; throw std::runtime_error("pd_test_3_all_index_null_detection failed: isnull() size"); } .. _example-extensionindex-nunique-17: .. dropdown:: nunique (pd_test_1_all.cpp:10604) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10594 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_nunique() { std::cout << "========= nunique ========================="; pandas::CategoricalArray arr({"a", "b", "a", "c", "b", std::nullopt}); pandas::CategoricalIndex idx(arr); bool passed = (idx.nunique(true) == 3 && idx.nunique(false) == 4); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_nunique() : nunique check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_nunique failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_factorize() { std::cout << "========= factorize ========================="; .. _example-extensionindex-value_counts-18: .. dropdown:: value_counts (pd_test_1_all.cpp:865) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 855 :emphasize-lines: 11 std::vector> values = { std::optional("a"), std::optional("b"), std::optional("a"), std::optional("a"), std::optional("b"), std::nullopt // NA not counted }; pandas::CategoricalArray arr(values); auto [cats, counts] = arr.value_counts(); // Should have 2 categories if (cats.size() != 2 || counts.size() != 2) { std::cout << " [FAIL] : in pd_test_categorical_array_value_counts() : wrong size" << std::endl; throw std::runtime_error("pd_test_categorical_array_value_counts failed: wrong size"); } // Find 'a' count int64_t a_count = 0, b_count = 0; for (size_t i = 0; i < cats.size(); ++i) { .. _example-extensionindex-equals-19: .. dropdown:: equals (pd_test_1_all.cpp:5866) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5856 :emphasize-lines: 11 std::cout << "========= equals ======================================"; pandas::CategoricalArray arr1({"a", "b", "a"}); pandas::CategoricalArray arr2({"a", "b", "a"}); pandas::CategoricalArray arr3({"a", "b", "c"}); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalIndex idx2(arr2); pandas::CategoricalIndex idx3(arr3); bool passed = (idx1.equals(idx2) && !idx1.equals(idx3)); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_equals()" << std::endl; throw std::runtime_error("pd_test_categorical_index_equals failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_identical() { std::cout << "========= identical ==================================="; .. _example-extensionindex-argsort-20: .. dropdown:: argsort (pd_test_1_all.cpp:1304) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 1294 :emphasize-lines: 11 std::cout << "========= DatetimeArray: sorting ======================= "; pandas::DatetimeArray arr(std::vector{ "2023-06-15", "NaT", "2023-01-01", "2023-12-31" }); // argsort ascending auto indices = arr.argsort(true, "last"); // Expected order: 2023-01-01(2), 2023-06-15(0), 2023-12-31(3), NaT(1) if (indices.getElementAt({0}) != 2) { std::cout << " [FAIL] : argsort: first should be index 2 (2023-01-01)" << std::endl; throw std::runtime_error("pd_test_datetime_array_sorting failed: argsort first"); } if (indices.getElementAt({3}) != 1) { std::cout << " [FAIL] : argsort: last should be index 1 (NaT)" << std::endl; throw std::runtime_error("pd_test_datetime_array_sorting failed: NaT position"); } .. _example-extensionindex-sort_values-21: .. dropdown:: sort_values (pd_test_1_all.cpp:6408) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6398 :emphasize-lines: 11 void pd_test_dataframe_sorting() { std::cout << "========= sorting =========================="; std::map> data; data["A"] = {3.0, 1.0, 4.0, 1.0, 5.0}; data["B"] = {9.0, 2.0, 6.0, 5.0, 3.0}; pandas::DataFrame df(data); // Test sort_values ascending auto sorted_asc = df.sort_values("A", true); // First value should be smallest (1.0) std::string first_val = sorted_asc["A"].get_value_str(0); if (std::stod(first_val) != 1.0) { std::cout << " [FAIL] : in pd_test_dataframe_sorting() : sort_values asc first != 1" << std::endl; throw std::runtime_error("pd_test_dataframe_sorting failed: sort_values asc first != 1"); } // Test sort_values descending auto sorted_desc = df.sort_values("A", false); first_val = sorted_desc["A"].get_value_str(0); .. _example-extensionindex-append-22: .. dropdown:: append (pd_test_1_all.cpp:10650) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10640 :emphasize-lines: 11 std::cout << "========= append ========================="; // Use same categories for both arrays (required by CategoricalArray::concat) std::vector cats = {"a", "b", "c", "d"}; pandas::CategoricalArray arr1({"a", "b"}, cats); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalArray arr2({"c", "d"}, cats); pandas::CategoricalIndex idx2(arr2); auto appended = idx1.append(idx2); bool passed = (appended.size() == 4); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_append() : append check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_append failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-difference-23: .. dropdown:: difference (pd_test_1_all.cpp:10718) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10708 :emphasize-lines: 11 std::cout << "========= difference ========================="; // Use same categories for both arrays std::vector cats = {"a", "b", "c", "d"}; pandas::CategoricalArray arr1({"a", "b", "c", "d"}, cats); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalArray arr2({"b", "d"}, cats); pandas::CategoricalIndex idx2(arr2); auto diff = idx1.difference(idx2); bool passed = (diff.size() == 2 && diff.contains("a") && diff.contains("c") && !diff.contains("b") && !diff.contains("d")); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_difference() : difference check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_difference failed"); } std::cout << " -> tests passed" << std::endl; .. _example-extensionindex-to_list-24: .. dropdown:: to_list (pd_test_1_all.cpp:10247) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10237 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_to_list() { std::cout << "========= to_list ========================="; pandas::CategoricalArray arr({"x", "y", "z"}); pandas::CategoricalIndex idx(arr); auto list = idx.to_list(); bool passed = (list.size() == 3 && list[0].has_value() && *list[0] == "x" && list[1].has_value() && *list[1] == "y" && list[2].has_value() && *list[2] == "z"); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_to_list() : to_list check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_to_list failed"); } .. _example-extensionindex-to_string-25: .. dropdown:: to_string (pd_test_1_all.cpp:2693) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 2683 :emphasize-lines: 11 pandas::PeriodArray arr_m(std::vector{ "2020-01", "NaT", "2025-06" }, "M"); // Year auto years = arr_m.year(); auto y0 = years[0]; if (!y0.has_value() || y0.value() != 2020) { std::cout << " [FAIL] : year[0] should be 2020, got " << (y0.has_value() ? std::to_string(y0.value()) : "NA") << std::endl; throw std::runtime_error("pd_test_period_array_year_month_quarter failed: year[0]"); } auto y1 = years[1]; if (y1.has_value()) { std::cout << " [FAIL] : year[1] should be NA (NaT)" << std::endl; throw std::runtime_error("pd_test_period_array_year_month_quarter failed: year[1] should be NA"); } auto y2 = years[2]; .. _example-extensionindex-to_string_vector-26: .. dropdown:: to_string_vector (pd_test_1_all.cpp:10871) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10861 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_to_string_vector() { std::cout << "========= to_string_vector ========================="; pandas::CategoricalArray arr({"a", std::nullopt, "c"}); pandas::CategoricalIndex idx(arr); auto str_vec = idx.to_string_vector(); bool passed = (str_vec.size() == 3 && str_vec[0] == "a" && str_vec[1] == "NA" && str_vec[2] == "c"); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_to_string_vector() : to_string_vector check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_to_string_vector failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-copy-27: .. dropdown:: copy (pd_test_1_all.cpp:5798) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5788 :emphasize-lines: 11 // ============================================================================ // Copy/Rename Tests // ============================================================================ void pd_test_categorical_index_copy() { std::cout << "========= copy ========================================"; pandas::CategoricalArray arr({"a", "b", "c"}); pandas::CategoricalIndex idx(arr, "original"); pandas::CategoricalIndex copied = idx.copy(); bool passed = (copied.size() == idx.size() && copied.name() == idx.name() && copied.categories() == idx.categories() && copied.ordered() == idx.ordered()); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_copy()" << std::endl; throw std::runtime_error("pd_test_categorical_index_copy failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-duplicated-28: .. dropdown:: duplicated (pd_test_1_all.cpp:10583) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10573 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_duplicated() { std::cout << "========= duplicated ========================="; pandas::CategoricalArray arr({"a", "b", "a", "c", "a"}); pandas::CategoricalIndex idx(arr); auto dup_mask = idx.duplicated("first"); bool passed = (dup_mask.getElementAt({0}) == false && dup_mask.getElementAt({1}) == false && dup_mask.getElementAt({2}) == true && dup_mask.getElementAt({3}) == false && dup_mask.getElementAt({4}) == true); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_duplicated() : duplicated check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_duplicated failed"); } .. _example-extensionindex-intersection-29: .. dropdown:: intersection (pd_test_1_all.cpp:10672) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10662 :emphasize-lines: 11 std::cout << "========= intersection ========================="; // Use same categories for both arrays std::vector cats = {"a", "b", "c", "d", "e", "f"}; pandas::CategoricalArray arr1({"a", "b", "c", "d"}, cats); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalArray arr2({"b", "c", "e", "f"}, cats); pandas::CategoricalIndex idx2(arr2); auto inter = idx1.intersection(idx2); bool passed = (inter.size() == 2 && inter.contains("b") && inter.contains("c")); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_intersection() : intersection check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_intersection failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-isin-30: .. dropdown:: isin (pd_test_1_all.cpp:5938) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5928 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_isin() { std::cout << "========= inherited isin =============================="; pandas::CategoricalArray arr({"a", "b", "c", "d"}); pandas::CategoricalIndex idx(arr); std::vector values = {"a", "c"}; numpy::NDArray mask = idx.isin(values); bool passed = (mask.getSize() == 4 && mask.getElementAt({0}) == true && // a mask.getElementAt({1}) == false && // b mask.getElementAt({2}) == true && // c mask.getElementAt({3}) == false); // d if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_isin()" << std::endl; throw std::runtime_error("pd_test_categorical_index_isin failed"); } .. _example-extensionindex-symmetric_difference-31: .. dropdown:: symmetric_difference (pd_test_1_all.cpp:10742) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10732 :emphasize-lines: 11 std::cout << "========= symmetric_difference ========================="; // Use same categories for both arrays std::vector cats = {"a", "b", "c", "d"}; pandas::CategoricalArray arr1({"a", "b", "c"}, cats); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalArray arr2({"b", "c", "d"}, cats); pandas::CategoricalIndex idx2(arr2); auto sym_diff = idx1.symmetric_difference(idx2); bool passed = (sym_diff.size() == 2 && sym_diff.contains("a") && sym_diff.contains("d") && !sym_diff.contains("b") && !sym_diff.contains("c")); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_symmetric_difference() : symmetric_difference check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_symmetric_difference failed"); } std::cout << " -> tests passed" << std::endl; .. _example-extensionindex-union_-32: .. dropdown:: union_ (pd_test_1_all.cpp:10694) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10684 :emphasize-lines: 11 std::cout << "========= union ========================="; // Use same categories for both arrays std::vector cats = {"a", "b", "c", "d", "e"}; pandas::CategoricalArray arr1({"a", "b", "c"}, cats); pandas::CategoricalIndex idx1(arr1); pandas::CategoricalArray arr2({"b", "c", "d", "e"}, cats); pandas::CategoricalIndex idx2(arr2); auto uni = idx1.union_(idx2); bool passed = (uni.size() == 5 && uni.contains("a") && uni.contains("b") && uni.contains("c") && uni.contains("d") && uni.contains("e")); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_union() : union check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_union failed"); } std::cout << " -> tests passed" << std::endl; .. _example-extensionindex-unique-33: .. dropdown:: unique (pd_test_1_all.cpp:1345) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 1335 :emphasize-lines: 11 pandas::DatetimeArray arr(std::vector{ "2023-01-01", "2023-06-15", "2023-01-01", "NaT", "2023-06-15", "NaT" }); // unique auto uniq = arr.unique(); // Should have: NaT, 2023-01-01, 2023-06-15 (3 unique values) if (uniq.size() != 3) { std::cout << " [FAIL] : unique size should be 3, got " << uniq.size() << std::endl; throw std::runtime_error("pd_test_datetime_array_unique failed: size"); } // factorize auto [codes, uniques] = arr.factorize(); // Codes for NaT should be -1 if (codes.getElementAt({3}) != -1) { .. _example-extensionindex-is_monotonic_decreasing-34: .. dropdown:: is_monotonic_decreasing (pd_test_1_all.cpp:10203) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10193 :emphasize-lines: 11 } void pd_test_extension_index_monotonicity() { std::cout << "========= monotonicity ========================="; pandas::CategoricalArray arr1({"a", "b", "c"}); pandas::CategoricalIndex idx1(arr1); // Just test that the methods work (result depends on internal ordering) bool inc = idx1.is_monotonic_increasing(); bool dec = idx1.is_monotonic_decreasing(); bool passed = (inc || dec || (!inc && !dec)); // Any result is valid if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_monotonicity() : monotonicity check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_monotonicity failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-is_monotonic_increasing-35: .. dropdown:: is_monotonic_increasing (pd_test_1_all.cpp:10202) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10192 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_monotonicity() { std::cout << "========= monotonicity ========================="; pandas::CategoricalArray arr1({"a", "b", "c"}); pandas::CategoricalIndex idx1(arr1); // Just test that the methods work (result depends on internal ordering) bool inc = idx1.is_monotonic_increasing(); bool dec = idx1.is_monotonic_decreasing(); bool passed = (inc || dec || (!inc && !dec)); // Any result is valid if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_monotonicity() : monotonicity check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_monotonicity failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-is_unique-36: .. dropdown:: is_unique (pd_test_1_all.cpp:5962) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5952 :emphasize-lines: 11 void pd_test_categorical_index_is_unique() { std::cout << "========= inherited is_unique ========================="; pandas::CategoricalArray arr_unique({"a", "b", "c"}); pandas::CategoricalArray arr_dups({"a", "b", "a"}); pandas::CategoricalIndex idx_unique(arr_unique); pandas::CategoricalIndex idx_dups(arr_dups); bool passed = (idx_unique.is_unique() && !idx_dups.is_unique()); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_is_unique()" << std::endl; throw std::runtime_error("pd_test_categorical_index_is_unique failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_hasnans() { std::cout << "========= inherited hasnans ==========================="; .. _example-extensionindex-array-37: .. dropdown:: array (pd_test_1_all.cpp:7343) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 7333 :emphasize-lines: 11 }; pandas::DatetimeIndex idx(values, "with_nat"); bool passed = (idx.size() == 3); if (!passed) { std::cout << " [FAIL] : in pd_test_datetime_index_optional_vector_constructor()" << std::endl; throw std::runtime_error("pd_test_datetime_index_optional_vector_constructor failed"); } // Check that middle element is NA bool has_na = idx.array().is_na(1); passed = passed && has_na; if (!passed) { std::cout << " [FAIL] : NA not preserved" << std::endl; throw std::runtime_error("pd_test_datetime_index_optional_vector_constructor failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_datetime_index_copy_constructor() { .. _example-extensionindex-clear_cache-38: .. dropdown:: clear_cache (pd_test_1_all.cpp:19413) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 19403 :emphasize-lines: 11 s.mean(); s.min(); s.max(); passed = s.has_cached_values() == true; if (!passed) { std::cout << " [FAIL] : in pd_test_series_cache() : cache not populated" << std::endl; throw std::runtime_error("pd_test_series_cache failed: cache not populated"); } s.clear_cache(); passed = s.has_cached_values() == false; if (!passed) { std::cout << " [FAIL] : in pd_test_series_cache() : cache not cleared" << std::endl; throw std::runtime_error("pd_test_series_cache failed: cache not cleared"); } std::cout << " -> tests passed" << std::endl; } void pd_test_series_string_repr() { .. _example-extensionindex-clone-39: .. dropdown:: clone (pd_test_1_all.cpp:5776) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5766 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_clone() { std::cout << "========= clone ======================================="; pandas::CategoricalArray arr({"p", "q", "r"}); pandas::CategoricalIndex idx(arr, "original"); std::unique_ptr cloned = idx.clone(); bool passed = (cloned != nullptr && cloned->size() == idx.size() && cloned->name() == idx.name()); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_clone()" << std::endl; throw std::runtime_error("pd_test_categorical_index_clone failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-contains-40: .. dropdown:: contains (pd_test_1_all.cpp:2200) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 2190 :emphasize-lines: 11 // Test: contains method // ============================================================================ void test_contains() { std::cout << "========= IntervalArray: contains ======================= "; std::vector breaks = {0.0, 1.0, 2.0, 3.0}; // Right-closed intervals: (0, 1], (1, 2], (2, 3] auto arr_right = pandas::IntervalArrayFloat64::from_breaks(breaks, pandas::IntervalClosed::Right); // Test contains(1.0) - should be in interval 0 but not 1 (since 1 is exclusive on left of interval 1) auto contains_1 = arr_right.contains(1.0); // (0, 1] contains 1: yes, (1, 2] contains 1: no (open on left), (2, 3] contains 1: no if (contains_1[0].value_or(false) != true || contains_1[1].value_or(true) != false || contains_1[2].value_or(true) != false) { std::cout << "[FAIL] : in test_contains() : right-closed contains 1.0" << std::endl; return; } // Left-closed intervals: [0, 1), [1, 2), [2, 3) .. _example-extensionindex-contains_str-41: .. dropdown:: contains_str (pd_test_1_all.cpp:10889) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10879 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_contains_str_get_loc_str() { std::cout << "========= contains_str/get_loc_str ========================="; pandas::CategoricalArray arr({"apple", "banana", "cherry"}); pandas::CategoricalIndex idx(arr); bool passed = (idx.contains_str("apple") && !idx.contains_str("grape") && idx.get_loc_str("banana") == 1 && idx.get_loc_str("grape") == -1); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_contains_str_get_loc_str() : contains_str/get_loc_str check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_contains_str_get_loc_str failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_repr() { .. _example-extensionindex-delete_-42: .. dropdown:: delete_ (pd_test_1_all.cpp:10501) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10491 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_delete() { std::cout << "========= delete_ ========================="; pandas::CategoricalArray arr({"a", "b", "c", "d"}); pandas::CategoricalIndex idx(arr); auto deleted = idx.delete_(1); auto v0 = deleted[0]; auto v1 = deleted[1]; auto v2 = deleted[2]; bool passed = (deleted.size() == 3 && v0.has_value() && *v0 == "a" && v1.has_value() && *v1 == "c" && v2.has_value() && *v2 == "d"); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_delete() : delete_ check failed" << std::endl; .. _example-extensionindex-delete_-43: .. dropdown:: delete_ (pd_test_1_all.cpp:10501) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10491 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_delete() { std::cout << "========= delete_ ========================="; pandas::CategoricalArray arr({"a", "b", "c", "d"}); pandas::CategoricalIndex idx(arr); auto deleted = idx.delete_(1); auto v0 = deleted[0]; auto v1 = deleted[1]; auto v2 = deleted[2]; bool passed = (deleted.size() == 3 && v0.has_value() && *v0 == "a" && v1.has_value() && *v1 == "c" && v2.has_value() && *v2 == "d"); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_delete() : delete_ check failed" << std::endl; .. _example-extensionindex-dtype_name-44: .. dropdown:: dtype_name (pd_test_1_all.cpp:10104) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10094 :emphasize-lines: 11 } void pd_test_extension_index_array_constructor() { std::cout << "========= array constructor ========================="; pandas::CategoricalArray arr({"apple", "banana", "apple", "cherry"}); pandas::CategoricalIndex idx(arr, "fruits"); bool passed = (idx.size() == 4 && !idx.empty() && idx.name().has_value() && *idx.name() == "fruits" && idx.dtype_name() == "category"); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_array_constructor() : array constructor check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_array_constructor failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_copy_constructor() { std::cout << "========= copy constructor ========================="; .. _example-extensionindex-empty-45: .. dropdown:: empty (pd_test_1_all.cpp:941) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 931 :emphasize-lines: 11 #include "../pandas/pd_config.h" namespace dataframe_tests { namespace dataframe_tests_config { void pd_test_config_version() { std::cout << "========= df_config: version info ======================= "; const char* version = pandas::DataFrameInfo::version(); if (version == nullptr || std::string(version).empty()) { std::cout << "[FAIL] : in pd_test_config_version() : version is null or empty" << std::endl; throw std::runtime_error("pd_test_config_version failed: version is null or empty"); } std::cout << "-> tests passed" << std::endl; } void pd_test_config_na_repr() { std::cout << "========= df_config: NA representation ======================= "; const char* na_repr = pandas::DataFrameConfig::get_na_repr(); if (na_repr == nullptr) { .. _example-extensionindex-factorize-46: .. dropdown:: factorize (pd_test_1_all.cpp:1353) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 1343 :emphasize-lines: 11 // unique auto uniq = arr.unique(); // Should have: NaT, 2023-01-01, 2023-06-15 (3 unique values) if (uniq.size() != 3) { std::cout << " [FAIL] : unique size should be 3, got " << uniq.size() << std::endl; throw std::runtime_error("pd_test_datetime_array_unique failed: size"); } // factorize auto [codes, uniques] = arr.factorize(); // Codes for NaT should be -1 if (codes.getElementAt({3}) != -1) { std::cout << " [FAIL] : factorize: NaT code should be -1" << std::endl; throw std::runtime_error("pd_test_datetime_array_unique failed: NaT code"); } // Same values should have same codes if (codes.getElementAt({0}) != codes.getElementAt({2})) { std::cout << " [FAIL] : factorize: 2023-01-01 values should have same code" << std::endl; throw std::runtime_error("pd_test_datetime_array_unique failed: same code"); } .. _example-extensionindex-has_cached_values-47: .. dropdown:: has_cached_values (pd_test_1_all.cpp:19395) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 19385 :emphasize-lines: 11 } std::cout << " -> tests passed" << std::endl; } void pd_test_series_cache() { std::cout << "========= cache management ========================================="; pandas::Series s({1.0, 2.0, 3.0, 4.0, 5.0}); bool passed = s.has_cached_values() == false; if (!passed) { std::cout << " [FAIL] : in pd_test_series_cache() : initial cache not empty" << std::endl; throw std::runtime_error("pd_test_series_cache failed: initial cache not empty"); } // Trigger cache s.sum(); s.mean(); s.min(); s.max(); .. _example-extensionindex-has_duplicates-48: .. dropdown:: has_duplicates (pd_test_1_all.cpp:10176) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10166 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_uniqueness() { std::cout << "========= uniqueness ========================="; // Unique values pandas::CategoricalArray arr1({"a", "b", "c"}); pandas::CategoricalIndex idx1(arr1); bool passed1 = (idx1.is_unique() && !idx1.has_duplicates()); if (!passed1) { std::cout << " [FAIL] : in pd_test_extension_index_uniqueness() : unique check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_uniqueness failed"); } // With duplicates pandas::CategoricalArray arr2({"a", "b", "a", "c"}); pandas::CategoricalIndex idx2(arr2); bool passed2 = (!idx2.is_unique() && idx2.has_duplicates()); .. _example-extensionindex-hasnans-49: .. dropdown:: hasnans (pd_test_1_all.cpp:5363) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5353 :emphasize-lines: 11 void pd_test_categorical_index_from_codes() { std::cout << "========= from_codes ================================="; std::vector codes = {0, 1, 0, 2, -1}; // -1 = NA std::vector categories = {"low", "medium", "high"}; pandas::CategoricalIndex idx = pandas::CategoricalIndex::from_codes(codes, categories, true, "level"); bool passed = (idx.size() == 5 && idx.num_categories() == 3 && idx.ordered() && idx.name().has_value() && *idx.name() == "level" && idx.hasnans()); // has NA from code -1 if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_from_codes()" << std::endl; throw std::runtime_error("pd_test_categorical_index_from_codes failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_simple_new() { std::cout << "========= _simple_new ================================="; .. _example-extensionindex-identical-50: .. dropdown:: identical (pd_test_1_all.cpp:5883) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5873 :emphasize-lines: 11 } void pd_test_categorical_index_identical() { std::cout << "========= identical ==================================="; pandas::CategoricalArray arr({"a", "b"}); pandas::CategoricalIndex idx1(arr, "same_name"); pandas::CategoricalIndex idx2(arr, "same_name"); pandas::CategoricalIndex idx3(arr, "diff_name"); bool passed = (idx1.identical(idx2) && !idx1.identical(idx3)); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_identical()" << std::endl; throw std::runtime_error("pd_test_categorical_index_identical failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Inherited Operations Tests .. _example-extensionindex-inferred_type-51: .. dropdown:: inferred_type (pd_test_1_all.cpp:5270) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5260 :emphasize-lines: 11 } void pd_test_categorical_index_array_constructor() { std::cout << "========= array constructor ==========================="; pandas::CategoricalArray arr({"apple", "banana", "apple", "cherry"}); pandas::CategoricalIndex idx(arr, "fruits"); bool passed = (idx.size() == 4 && !idx.empty() && idx.name().has_value() && *idx.name() == "fruits" && idx.inferred_type() == "categorical"); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_array_constructor()" << std::endl; throw std::runtime_error("pd_test_categorical_index_array_constructor failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_index_values_constructor() { std::cout << "========= values constructor =========================="; .. _example-extensionindex-name-52: .. dropdown:: name (pd_test_1_all.cpp:295) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 285 :emphasize-lines: 11 throw std::runtime_error("pd_test_boolean_array_reductions failed: mean"); } std::cout << " -> tests passed" << std::endl; } void pd_test_boolean_array_dtype() { std::cout << "========= BooleanArray: dtype ======================= "; pandas::BooleanArray arr; if (arr.dtype().name() != "boolean") { std::cout << " [FAIL] : in pd_test_boolean_array_dtype() : dtype name should be 'boolean'" << std::endl; throw std::runtime_error("pd_test_boolean_array_dtype failed: dtype name"); } if (arr.dtype().kind() != "b") { std::cout << " [FAIL] : in pd_test_boolean_array_dtype() : dtype kind should be 'b'" << std::endl; throw std::runtime_error("pd_test_boolean_array_dtype failed: dtype kind"); } std::cout << " -> tests passed" << std::endl; .. _example-extensionindex-nbytes-53: .. dropdown:: nbytes (pd_test_1_all.cpp:6214) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6204 :emphasize-lines: 11 } // Test empty DataFrame pandas::DataFrame empty_df; if (!empty_df.empty()) { std::cout << " [FAIL] : in pd_test_dataframe_properties() : should be empty" << std::endl; throw std::runtime_error("pd_test_dataframe_properties failed: should be empty"); } // Test nbytes > 0 for non-empty if (df.nbytes() == 0) { std::cout << " [FAIL] : in pd_test_dataframe_properties() : nbytes should be > 0" << std::endl; throw std::runtime_error("pd_test_dataframe_properties failed: nbytes should be > 0"); } // Test columns index if (df.columns().size() != 3) { std::cout << " [FAIL] : in pd_test_dataframe_properties() : columns size != 3" << std::endl; throw std::runtime_error("pd_test_dataframe_properties failed: columns size != 3"); } .. _example-extensionindex-repr-54: .. dropdown:: repr (pd_test_1_all.cpp:10906) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10896 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_extension_index_repr() { std::cout << "========= repr ========================="; pandas::CategoricalArray arr({"a", "b", "c"}); // Use ExtensionIndex directly to test base class repr pandas::ExtensionIndex idx(arr, "test"); std::string repr_str = idx.repr(); bool passed = (!repr_str.empty() && repr_str.find("ExtensionIndex") != std::string::npos); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_repr() : repr check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_repr failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-extensionindex-result-55: .. dropdown:: result (pd_test_1_all.cpp:15406) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 15396 :emphasize-lines: 11 data.setElementAt({0}, numpy::datetime64(100LL, numpy::DateTimeUnit::Nanosecond)); data.setElementAt({1}, numpy::datetime64(200LL, numpy::DateTimeUnit::Nanosecond)); numpy::NDArray mask(std::vector{2}); mask.setElementAt({0}, numpy::bool_(false)); mask.setElementAt({1}, numpy::bool_(false)); pandas::DatetimeArray arr(data, mask); pandas::DatetimeIndexBase idx(arr, "original"); // Create join result (int64 values) numpy::NDArray join_result(std::vector{3}); join_result.setElementAt({0}, numpy::int64(500LL)); join_result.setElementAt({1}, numpy::int64(600LL)); join_result.setElementAt({2}, numpy::int64(700LL)); auto new_idx = idx._from_join_target(join_result); bool passed = (new_idx.size() == 3 && new_idx.name().has_value() && *new_idx.name() == "original"); if (!passed) { .. _example-extensionindex-set_name-56: .. dropdown:: set_name (pd_test_1_all.cpp:11798) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 11788 :emphasize-lines: 11 throw std::runtime_error("pd_test_index_vector_constructor failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_index_copy_constructor() { std::cout << "========= copy constructor ============================"; pandas::Index idx1{1, 2, 3}; idx1.set_name("original"); pandas::Index idx2(idx1); bool passed = (idx2.size() == 3); passed = passed && (idx2.name().value() == "original"); passed = passed && idx2.equals(idx1); if (!passed) { std::cout << " [FAIL] : in pd_test_index_copy_constructor() : copy failed" << std::endl; throw std::runtime_error("pd_test_index_copy_constructor failed"); .. _example-extensionindex-size-57: .. dropdown:: size (pd_test_1_all.cpp:22) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 12 :emphasize-lines: 11 #include "../pandas/pd_boolean_array.h" namespace dataframe_tests { namespace dataframe_tests_boolean_array { void pd_test_boolean_array_constructors() { std::cout << "========= BooleanArray: constructors ======================= "; // Default constructor pandas::BooleanArray arr1; if (arr1.size() != 0) { std::cout << " [FAIL] : in pd_test_boolean_array_constructors() : default constructor size != 0" << std::endl; throw std::runtime_error("pd_test_boolean_array_constructors failed: default constructor size != 0"); } // Initializer list constructor pandas::BooleanArray arr2({ std::optional(true), std::optional(false), std::nullopt, std::optional(true) .. _example-extensionindex-type_id-58: .. dropdown:: type_id (pd_test_3_all.cpp:25592) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 25582 :emphasize-lines: 11 // ------------------- pd_test_value_classify (end) ------------------ // ------------------- pd_test_index_type_id (start) ------------------ namespace dataframe_tests_index_type_id { void pd_test_index_type_id_dispatch() { std::cout << "========= IndexTypeId dispatch ======================="; // RangeIndex ::pandas::RangeIndex ri(0, 5); if (ri.type_id() != ::pandas::IndexTypeId::RangeIndex) throw std::runtime_error("RangeIndex type_id failed"); // Index ::pandas::Index si(std::vector{"a", "b", "c"}); if (si.type_id() != ::pandas::IndexTypeId::IndexString) throw std::runtime_error("Index type_id failed"); // Index ::pandas::Index ii(std::vector{1, 2, 3}); if (ii.type_id() != ::pandas::IndexTypeId::IndexInt64) .. _example-extensionindex-values-59: .. dropdown:: values (pd_test_1_all.cpp:364) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 354 :emphasize-lines: 11 pandas::CategoricalArray arr1; if (arr1.size() != 0) { std::cout << " [FAIL] : in pd_test_categorical_array_constructors() : default constructor size != 0" << std::endl; throw std::runtime_error("pd_test_categorical_array_constructors failed: default constructor size != 0"); } if (arr1.ordered()) { std::cout << " [FAIL] : in pd_test_categorical_array_constructors() : default should be unordered" << std::endl; throw std::runtime_error("pd_test_categorical_array_constructors failed: default should be unordered"); } // Constructor from values (infer categories) std::vector> values = { std::optional("a"), std::optional("b"), std::optional("a"), std::optional("c") }; pandas::CategoricalArray arr2(values); if (arr2.size() != 4) { std::cout << " [FAIL] : in pd_test_categorical_array_constructors() : values constructor size != 4" << std::endl; throw std::runtime_error("pd_test_categorical_array_constructors failed: values constructor size != 4");