StringMethods ============= .. cpp:class:: pandas::StringMethods pandas C++ class. Example ------- .. code-block:: cpp #include using namespace pandas; // Use StringMethods StringMethods obj; // ... operations ... Indexing / Selection -------------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector get(int64_t pos) const`` - std::vector - pd_string_accessor.h:730 - :ref:`View ` * - ``get_dummies(const std::string& sep = "\|") const`` - - pd_string_accessor.h:1303 - :ref:`View ` * - ``DummiesResult get_dummies_as_multiindex(const std::string& sep = "\|") const`` - DummiesResult - pd_string_accessor.h:1298 - :ref:`View ` Data Manipulation ----------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector replace(const std::string& pat, const std::string& repl, bool regex_mode = true, bool case_sensitive = true, int flags = 0) const`` - std::vector - pd_string_accessor.h:381 - :ref:`View ` Missing Data ------------ .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector pad(size_t width, const std::string& side = "left", char fillchar = ' ') const`` - std::vector - pd_string_accessor.h:690 - :ref:`View ` Statistics ---------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector count(const std::string& pat) const`` - std::vector - pd_string_accessor.h:464 - :ref:`View ` * - ``CountResult count_with_nan(const std::string& pat) const`` - CountResult - pd_string_accessor.h:483 - :ref:`View ` Comparison ---------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector len() const`` - std::vector - pd_string_accessor.h:265 - :ref:`View ` Combining --------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector join(const std::string& sep) const`` - std::vector - pd_string_accessor.h:1106 - :ref:`View ` Iteration --------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``std::vector endswith(const std::string& pat) const`` - std::vector - pd_string_accessor.h:366 - :ref:`View ` Other Methods ------------- .. list-table:: :widths: 40 20 15 25 :header-rows: 1 * - Signature - Return Type - Location - Example * - ``static DataFrame build_string_dataframe( const std::vector>& columns_data, const std::vector& column_names)`` - static DataFrame - pd_string_accessor.h:150 - * - ``std::vector capitalize() const`` - std::vector - pd_string_accessor.h:198 - :ref:`View ` * - ``std::vector casefold() const`` - std::vector - pd_string_accessor.h:751 - :ref:`View ` * - ``std::string cat(const std::string& sep = "", const std::string& na_rep = "") const`` - std::string - pd_string_accessor.h:911 - :ref:`View ` * - ``std::vector center(size_t width, char fillchar = ' ') const`` - std::vector - pd_string_accessor.h:717 - :ref:`View ` * - ``std::vector contains(const std::string& pat, bool case_sensitive = true, bool regex_mode = true) const`` - std::vector - pd_string_accessor.h:322 - :ref:`View ` * - ``std::vector decode(const std::string& /\*encoding\*/) const`` - std::vector - pd_string_accessor.h:1194 - :ref:`View ` * - ``std::vector encode(const std::string& /\*encoding\*/) const`` - std::vector - pd_string_accessor.h:1184 - :ref:`View ` * - ``std::vector> extract(const std::string& pat) const`` - std::vector> - pd_string_accessor.h:1031 - :ref:`View ` * - ``std::vector>> extractall(const std::string& pat) const`` - std::vector>> - pd_string_accessor.h:1078 - :ref:`View ` * - ``ExtractAllResult extractall_with_index(const std::string& pat) const`` - ExtractAllResult - pd_string_accessor.h:1074 - :ref:`View ` * - ``std::vector find(const std::string& sub, int64_t start = 0, int64_t end = -1) const`` - std::vector - pd_string_accessor.h:445 - :ref:`View ` * - ``std::vector>> findall( const std::string& pat) const`` - std::vector>> - pd_string_accessor.h:1002 - :ref:`View ` * - ``std::vector fullmatch(const std::string& pat, bool case_sensitive = true) const`` - std::vector - pd_string_accessor.h:982 - :ref:`View ` * - ``std::vector index(const std::string& sub, int64_t start = 0, int64_t end = -1) const`` - std::vector - pd_string_accessor.h:1236 - :ref:`View ` * - ``std::vector isalnum() const`` - std::vector - pd_string_accessor.h:593 - :ref:`View ` * - ``std::vector isalpha() const`` - std::vector - pd_string_accessor.h:559 - :ref:`View ` * - ``std::vector isdecimal() const`` - std::vector - pd_string_accessor.h:827 - :ref:`View ` * - ``std::vector isdigit() const`` - std::vector - pd_string_accessor.h:576 - :ref:`View ` * - ``BoolWithNan isdigit_with_nan() const`` - BoolWithNan - pd_string_accessor.h:511 - :ref:`View ` * - ``std::vector islower() const`` - std::vector - pd_string_accessor.h:627 - * - ``std::vector isnumeric() const`` - std::vector - pd_string_accessor.h:822 - :ref:`View ` * - ``std::vector isspace() const`` - std::vector - pd_string_accessor.h:610 - * - ``std::vector istitle() const`` - std::vector - pd_string_accessor.h:790 - :ref:`View ` * - ``std::vector isupper() const`` - std::vector - pd_string_accessor.h:648 - * - ``std::vector ljust(size_t width, char fillchar = ' ') const`` - std::vector - pd_string_accessor.h:721 - :ref:`View ` * - ``std::vector lower() const`` - std::vector - pd_string_accessor.h:166 - :ref:`View ` * - ``std::vector lstrip(const std::string& chars = " \\t\\n\\r") const`` - std::vector - pd_string_accessor.h:291 - :ref:`View ` * - ``std::vector match(const std::string& pat, bool case_sensitive = true) const`` - std::vector - pd_string_accessor.h:926 - :ref:`View ` * - ``std::vector> match_with_na( const std::string& pat, bool case_sensitive = true, std::optional na_value = std::nullopt) const`` - std::vector> - pd_string_accessor.h:948 - * - ``std::vector normalize(const std::string& /\*form\*/) const`` - std::vector - pd_string_accessor.h:1174 - :ref:`View ` * - ``const ParentType& parent() const`` - const ParentType& - pd_string_accessor.h:158 - * - ``explicit StringMethods(const ParentType& parent) : parent_(parent)`` - explicit StringMethods(const ParentType& parent) : - pd_string_accessor.h:155 - * - ``std::optional parent_name() const`` - std::optional - pd_string_accessor.h:161 - * - ``parse_named_groups(const std::string& pat)`` - - pd_string_accessor.h:81 - * - ``std::vector> partition(const std::string& sep) const`` - std::vector> - pd_string_accessor.h:1204 - :ref:`View ` * - ``std::regex re(pat, case_sensitive ? std::regex::ECMAScript : std::regex::icase)`` - std::regex - pd_string_accessor.h:329 - * - ``std::regex re(pat, rx_flags)`` - std::regex - pd_string_accessor.h:395 - * - ``std::regex re(pat)`` - std::regex - pd_string_accessor.h:468 - * - ``std::regex re(pat, flags)`` - std::regex - pd_string_accessor.h:932 - * - ``std::regex re(pat, flags)`` - std::regex - pd_string_accessor.h:957 - * - ``std::regex re(pat, flags)`` - std::regex - pd_string_accessor.h:988 - * - ``std::regex re(pat)`` - std::regex - pd_string_accessor.h:1007 - * - ``std::regex re(pat)`` - std::regex - pd_string_accessor.h:1035 - * - ``std::regex re(pat)`` - std::regex - pd_string_accessor.h:1082 - * - ``std::vector removeprefix(const std::string& prefix) const`` - std::vector - pd_string_accessor.h:832 - :ref:`View ` * - ``std::vector removesuffix(const std::string& suffix) const`` - std::vector - pd_string_accessor.h:847 - :ref:`View ` * - ``std::vector repeat(int64_t repeats) const`` - std::vector - pd_string_accessor.h:775 - :ref:`View ` * - ``std::vector rfind(const std::string& sub, int64_t start = 0, int64_t end = -1) const`` - std::vector - pd_string_accessor.h:756 - :ref:`View ` * - ``std::vector rindex(const std::string& sub, int64_t start = 0, int64_t end = -1) const`` - std::vector - pd_string_accessor.h:1254 - :ref:`View ` * - ``std::vector rjust(size_t width, char fillchar = ' ') const`` - std::vector - pd_string_accessor.h:725 - :ref:`View ` * - ``std::vector> rpartition(const std::string& sep) const`` - std::vector> - pd_string_accessor.h:1220 - :ref:`View ` * - ``std::vector> rsplit(const std::string& pat = " ", int n = -1) const`` - std::vector> - pd_string_accessor.h:863 - :ref:`View ` * - ``std::vector rstrip(const std::string& chars = " \\t\\n\\r") const`` - std::vector - pd_string_accessor.h:306 - :ref:`View ` * - ``std::vector slice(int64_t start = 0, int64_t stop = -1, int64_t step = 1) const`` - std::vector - pd_string_accessor.h:670 - :ref:`View ` * - ``std::vector slice_replace(int64_t start = 0, int64_t stop = -1, const std::string& repl = "") const`` - std::vector - pd_string_accessor.h:1272 - :ref:`View ` * - ``std::vector> split(const std::string& pat = " ", int n = -1) const`` - std::vector> - pd_string_accessor.h:413 - :ref:`View ` * - ``SplitExpandResult split_expand(const std::string& pat, int n = -1) const`` - SplitExpandResult - pd_string_accessor.h:535 - :ref:`View ` * - ``std::vector startswith(const std::string& pat) const`` - std::vector - pd_string_accessor.h:352 - :ref:`View ` * - ``std::vector strip(const std::string& chars = " \\t\\n\\r") const`` - std::vector - pd_string_accessor.h:275 - :ref:`View ` * - ``std::vector swapcase() const`` - std::vector - pd_string_accessor.h:243 - * - ``std::vector title() const`` - std::vector - pd_string_accessor.h:218 - :ref:`View ` * - ``std::vector translate(const std::string& from_chars, const std::string& to_chars) const`` - std::vector - pd_string_accessor.h:1122 - :ref:`View ` * - ``std::vector upper() const`` - std::vector - pd_string_accessor.h:182 - :ref:`View ` * - ``std::vector wrap(size_t width) const`` - std::vector - pd_string_accessor.h:1143 - :ref:`View ` * - ``std::vector zfill(size_t width) const`` - std::vector - pd_string_accessor.h:713 - :ref:`View ` Code Examples ------------- The following examples are extracted from the test suite. .. _example-stringmethods-get-0: .. dropdown:: get (pd_test_1_all.cpp:10290) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 10280 :emphasize-lines: 11 void pd_test_extension_index_get_loc_unique() { std::cout << "========= get_loc (unique) ========================="; pandas::CategoricalArray arr({"apple", "banana", "cherry"}); pandas::CategoricalIndex idx(arr); auto loc_apple = idx.get_loc("apple"); auto loc_banana = idx.get_loc("banana"); auto loc_cherry = idx.get_loc("cherry"); bool passed = (std::holds_alternative(loc_apple) && std::get(loc_apple) == 0 && std::get(loc_banana) == 1 && std::get(loc_cherry) == 2); if (!passed) { std::cout << " [FAIL] : in pd_test_extension_index_get_loc_unique() : get_loc check failed" << std::endl; throw std::runtime_error("pd_test_extension_index_get_loc_unique failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-stringmethods-get_dummies-1: .. dropdown:: get_dummies (pd_test_3_all.cpp:13545) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 13535 :emphasize-lines: 11 } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Get Dummies / From Dummies Tests // ============================================================================ void pd_test_top_level_get_dummies() { std::cout << "========= get_dummies() ==============================="; std::vector data = {"A", "B", "A", "C", "B", "A"}; pandas::Series s(data, "category"); pandas::DataFrame result = pandas::get_dummies(s); // Should have columns for A, B, C if (result.ncols() != 3) { std::cout << " [FAIL] : in pd_test_top_level_get_dummies() : expected 3 columns" << std::endl; throw std::runtime_error("pd_test_top_level_get_dummies failed: wrong column count"); .. _example-stringmethods-get_dummies_as_multiindex-2: .. dropdown:: get_dummies_as_multiindex (pd_test_5_all.cpp:123697) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 123687 :emphasize-lines: 11 return rows; } static std::string run_oracle_row(const OracleRow& r) { pandas::Series s(r.input); if (r.op == "extractall") { auto res = s.str().extractall_with_index(r.arg); return format_extractall(res); } if (r.op == "get_dummies") { auto res = s.str().get_dummies_as_multiindex(r.arg); return format_get_dummies(res); } throw std::runtime_error("unknown op: " + r.op); } static void run_oracle_subset(int sub_case, int begin_id, int end_id, int& local_fail) { std::cout << "-- case_" << (13 + sub_case) << "_oracle_rows_" << begin_id << "_to_" << (end_id - 1) << "\n"; bool ok = false; .. _example-stringmethods-replace-3: .. dropdown:: replace (pd_test_1_all.cpp:6623) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6613 :emphasize-lines: 11 } } // Test replace { std::map> float_data; float_data["X"] = {1.0, 2.0, 3.0}; float_data["Y"] = {2.0, 2.0, 4.0}; pandas::DataFrame df_repl(float_data); auto replaced = df_repl.replace(2.0, 99.0); // Check some value was replaced (crude check via string) std::string val_str = replaced.col("X").get_value_str(1); if (val_str.find("99") == std::string::npos) { std::cout << " [FAIL] : in pd_test_dataframe_manipulation() : replace didn't work" << std::endl; throw std::runtime_error("pd_test_dataframe_manipulation failed: replace"); } } // Test drop_duplicates { .. _example-stringmethods-pad-4: .. dropdown:: pad (pd_test_3_all.cpp:1771) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 1761 :emphasize-lines: 11 if (result_single.nrows() != 3 || result_single.ncols() != 1) { std::cout << " [FAIL] : in pd_test_3_all_dataframe_unstack() : single col shape mismatch" << std::endl; throw std::runtime_error("pd_test_3_all_dataframe_unstack failed: single col shape"); } std::cout << " -> tests passed" << std::endl; } void pd_test_3_all_fbbuilder_pad() { std::cout << "========= FBBuilder.pad() (internal) ================="; // Note: FBBuilder.pad() is an internal method for FlatBuffer serialization // It's not the pandas DataFrame.pad() method (which is ffill alias) // This test verifies the to_feather() serialization works, which uses FBBuilder.pad() std::map> data = { {"A", {1.0, 2.0, 3.0}}, {"B", {4.0, 5.0, 6.0}} }; pandas::DataFrame df(data); .. _example-stringmethods-count-5: .. dropdown:: count (pd_test_1_all.cpp:66) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 56 :emphasize-lines: 11 if (arr.is_na(0)) { std::cout << " [FAIL] : in pd_test_boolean_array_na_handling() : is_na(0) should be false" << std::endl; throw std::runtime_error("pd_test_boolean_array_na_handling failed: is_na(0) should be false"); } if (!arr.has_na()) { std::cout << " [FAIL] : in pd_test_boolean_array_na_handling() : has_na() should be true" << std::endl; throw std::runtime_error("pd_test_boolean_array_na_handling failed: has_na() should be true"); } if (arr.count() != 2) { std::cout << " [FAIL] : in pd_test_boolean_array_na_handling() : count() should be 2" << std::endl; throw std::runtime_error("pd_test_boolean_array_na_handling failed: count() should be 2"); } std::cout << " -> tests passed" << std::endl; } void pd_test_boolean_array_kleene_and() { std::cout << "========= BooleanArray: Kleene AND ======================= "; .. _example-stringmethods-count_with_nan-6: .. dropdown:: count_with_nan (pd_test_3_all.cpp:28394) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28384 :emphasize-lines: 11 static int sao_check(bool cond, const char* msg) { if (!cond) { std::cout << " FAIL: " << msg << std::endl; return 1; } return 0; } void pd_test_str_count_with_nan() { std::cout << " -- pd_test_str_count_with_nan --" << std::endl; int fail = 0; pandas::Series s({"aa", "NaN", "abab", "None"}, "x"); auto r = s.str().count_with_nan("a"); fail += sao_check(r.values.size() == 4, "size"); fail += sao_check(r.has_nan, "has_nan true"); fail += sao_check(r.is_nan[1] && r.is_nan[3], "nan positions"); fail += sao_check(!r.is_nan[0] && !r.is_nan[2], "non-nan positions"); fail += sao_check(r.values[0] == 2, "count aa"); fail += sao_check(r.values[2] == 2, "count abab"); if (fail == 0) std::cout << " OK" << std::endl; } void pd_test_str_count_no_nan() { .. _example-stringmethods-len-7: .. dropdown:: len (pd_test_3_all.cpp:20867) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20857 :emphasize-lines: 11 auto title_result = s.str().title(); if (title_result[0] != "Hello World" || title_result[1] != "Hello World" || title_result[2] != "Hello World") { std::cout << " [FAIL] : title() failed" << std::endl; throw std::runtime_error("pd_test_str_capitalize_title: title() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().len() // ============================================================================ void pd_test_str_len() { std::cout << "========= Series.str().len() ============================"; pandas::Series s({"a", "bb", "ccc", ""}); auto lens = s.str().len(); if (lens[0] != 1 || lens[1] != 2 || lens[2] != 3 || lens[3] != 0) { std::cout << " [FAIL] : len() failed" << std::endl; .. _example-stringmethods-join-8: .. dropdown:: join (pd_test_1_all.cpp:12353) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 12343 :emphasize-lines: 11 std::cout << " -> tests passed" << std::endl; } void pd_test_index_join() { std::cout << "========= join ========================================"; pandas::Index idx1{1, 2, 3}; pandas::Index idx2{2, 3, 4}; auto [inner_joined, left_idx, right_idx] = idx1.join(idx2, "inner"); bool passed = (inner_joined.size() == 2); // {2, 3} auto [outer_joined, ol_idx, or_idx] = idx1.join(idx2, "outer"); passed = passed && (outer_joined.size() == 4); // {1, 2, 3, 4} if (!passed) { std::cout << " [FAIL] : in pd_test_index_join() : join failed" << std::endl; throw std::runtime_error("pd_test_index_join failed"); } .. _example-stringmethods-endswith-9: .. dropdown:: endswith (pd_test_3_all.cpp:20933) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20923 :emphasize-lines: 11 auto result = s.str().contains("an", true, false); // case_sensitive=true, regex=false if (result[0] != false || result[1] != true || result[2] != false) { std::cout << " [FAIL] : contains() failed" << std::endl; throw std::runtime_error("pd_test_str_contains: contains() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().startswith() and str().endswith() // ============================================================================ void pd_test_str_startswith_endswith() { std::cout << "========= Series.str().startswith/endswith() ============"; pandas::Series s({"hello", "world", "help"}); auto starts_result = s.str().startswith("hel"); if (starts_result[0] != true || starts_result[1] != false || starts_result[2] != true) { std::cout << " [FAIL] : startswith() failed" << std::endl; .. _example-stringmethods-capitalize-10: .. dropdown:: capitalize (pd_test_3_all.cpp:20843) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20833 :emphasize-lines: 11 auto upper_result = s.str().upper(); if (upper_result[0] != "HELLO" || upper_result[1] != "WORLD" || upper_result[2] != "TEST") { std::cout << " [FAIL] : upper() failed" << std::endl; throw std::runtime_error("pd_test_str_lower_upper: upper() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().capitalize() and str().title() // ============================================================================ void pd_test_str_capitalize_title() { std::cout << "========= Series.str().capitalize/title() ==============="; pandas::Series s({"hello world", "HELLO WORLD", "hELLO wORLD"}); auto cap_result = s.str().capitalize(); if (cap_result[0] != "Hello world" || cap_result[1] != "Hello world" || cap_result[2] != "Hello world") { std::cout << " [FAIL] : capitalize() failed" << std::endl; .. _example-stringmethods-casefold-11: .. dropdown:: casefold (pd_test_3_all.cpp:21059) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21049 :emphasize-lines: 11 auto result = s.str().cat("-"); if (result != "a-b-c") { std::cout << " [FAIL] : cat() failed, got: " << result << std::endl; throw std::runtime_error("pd_test_str_cat: cat() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().casefold() (plan_04a) // ============================================================================ void pd_test_str_casefold() { std::cout << "========= Series.str().casefold() ======================="; pandas::Series s({"FOO", "Bar", "HELLO"}); auto result = s.str().casefold(); if (result[0] != "foo" || result[1] != "bar" || result[2] != "hello") { std::cout << " [FAIL] : casefold() failed" << std::endl; throw std::runtime_error("pd_test_str_casefold: casefold() failed"); .. _example-stringmethods-cat-12: .. dropdown:: cat (pd_test_3_all.cpp:16259) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 16249 :emphasize-lines: 11 } std::cout << " -> tests passed" << std::endl; } void pd_test_categorical_fillna_params() { std::cout << "========= CategoricalArray fillna params ============="; // Create CategoricalArray using vector constructor with optional values std::vector> values = {"a", "b", std::nullopt, "a"}; pandas::CategoricalArray cat(values); // Test fillna with method and limit parameters (should compile and work) auto result = cat.fillna("b", "", std::nullopt, true); bool passed = (result.size() == 4); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_fillna_params() : fillna failed" << std::endl; throw std::runtime_error("pd_test_categorical_fillna_params failed"); } .. _example-stringmethods-center-13: .. dropdown:: center (pd_test_3_all.cpp:21005) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20995 :emphasize-lines: 11 auto alnum_result = s.str().isalnum(); if (alnum_result[0] != true || alnum_result[1] != true || alnum_result[2] != true || alnum_result[3] != false) { std::cout << " [FAIL] : isalnum() failed" << std::endl; throw std::runtime_error("pd_test_str_is_methods: isalnum() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().zfill(), str().center(), str().ljust(), str().rjust() // ============================================================================ void pd_test_str_padding() { std::cout << "========= Series.str().zfill/center/ljust/rjust() ======="; pandas::Series s({"1", "22", "333"}); auto zfill_result = s.str().zfill(5); if (zfill_result[0] != "00001" || zfill_result[1] != "00022" || zfill_result[2] != "00333") { std::cout << " [FAIL] : zfill() failed" << std::endl; .. _example-stringmethods-contains-14: .. dropdown:: contains (pd_test_1_all.cpp:2200) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 2190 :emphasize-lines: 11 // Test: contains method // ============================================================================ void test_contains() { std::cout << "========= IntervalArray: contains ======================= "; std::vector breaks = {0.0, 1.0, 2.0, 3.0}; // Right-closed intervals: (0, 1], (1, 2], (2, 3] auto arr_right = pandas::IntervalArrayFloat64::from_breaks(breaks, pandas::IntervalClosed::Right); // Test contains(1.0) - should be in interval 0 but not 1 (since 1 is exclusive on left of interval 1) auto contains_1 = arr_right.contains(1.0); // (0, 1] contains 1: yes, (1, 2] contains 1: no (open on left), (2, 3] contains 1: no if (contains_1[0].value_or(false) != true || contains_1[1].value_or(true) != false || contains_1[2].value_or(true) != false) { std::cout << "[FAIL] : in test_contains() : right-closed contains 1.0" << std::endl; return; } // Left-closed intervals: [0, 1), [1, 2), [2, 3) .. _example-stringmethods-decode-15: .. dropdown:: decode (pd_test_3_all.cpp:21401) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21391 :emphasize-lines: 11 pandas::Series s({"hello", "world"}); auto result = s.str().normalize("NFC"); if (result[0] != "hello" || result[1] != "world") { std::cout << " [FAIL] : normalize failed" << std::endl; throw std::runtime_error("pd_test_str_normalize: normalize failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().encode() / decode() (plan_04c) // ============================================================================ void pd_test_str_encode_decode() { std::cout << "========= Series.str().encode/decode() =================="; pandas::Series s({"hello", "world"}); auto encoded = s.str().encode("utf-8"); if (encoded[0] != "hello" || encoded[1] != "world") { std::cout << " [FAIL] : encode failed" << std::endl; throw std::runtime_error("pd_test_str_encode_decode: encode failed"); .. _example-stringmethods-encode-16: .. dropdown:: encode (pd_test_3_all.cpp:21401) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21391 :emphasize-lines: 11 pandas::Series s({"hello", "world"}); auto result = s.str().normalize("NFC"); if (result[0] != "hello" || result[1] != "world") { std::cout << " [FAIL] : normalize failed" << std::endl; throw std::runtime_error("pd_test_str_normalize: normalize failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().encode() / decode() (plan_04c) // ============================================================================ void pd_test_str_encode_decode() { std::cout << "========= Series.str().encode/decode() =================="; pandas::Series s({"hello", "world"}); auto encoded = s.str().encode("utf-8"); if (encoded[0] != "hello" || encoded[1] != "world") { std::cout << " [FAIL] : encode failed" << std::endl; throw std::runtime_error("pd_test_str_encode_decode: encode failed"); .. _example-stringmethods-extract-17: .. dropdown:: extract (pd_test_3_all.cpp:21283) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21273 :emphasize-lines: 11 throw std::runtime_error("pd_test_str_findall: findall element 1 failed"); } if (result[2].value().size() != 0) { std::cout << " [FAIL] : findall element 2 should be empty" << std::endl; throw std::runtime_error("pd_test_str_findall: findall element 2 failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().extract() (plan_04b) // ============================================================================ void pd_test_str_extract() { std::cout << "========= Series.str().extract() ========================"; pandas::Series s({"a1", "b2", "c3"}); auto result = s.str().extract("([a-z])([0-9])"); if (result[0].size() != 2 || result[0][0] != "a" || result[0][1] != "1") { std::cout << " [FAIL] : extract element 0 failed" << std::endl; throw std::runtime_error("pd_test_str_extract: extract element 0 failed"); .. _example-stringmethods-extractall-18: .. dropdown:: extractall (pd_test_3_all.cpp:21310) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21300 :emphasize-lines: 11 pandas::Series s2({"xyz"}); auto result2 = s2.str().extract("([0-9])"); if (result2[0].size() != 1 || result2[0][0] != "") { std::cout << " [FAIL] : extract no-match failed" << std::endl; throw std::runtime_error("pd_test_str_extract: extract no-match failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().extractall() (plan_04b) // ============================================================================ void pd_test_str_extractall() { std::cout << "========= Series.str().extractall() ====================="; pandas::Series s({"a1b2", "c3", "xyz"}); auto result = s.str().extractall("([a-z])([0-9])"); if (result[0].size() != 2 || result[0][0][0] != "a" || result[0][0][1] != "1" || result[0][1][0] != "b" || result[0][1][1] != "2") { .. _example-stringmethods-extractall_with_index-19: .. dropdown:: extractall_with_index (pd_test_5_all.cpp:123693) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 123683 :emphasize-lines: 11 r.expected = cells[4]; r.note = cells[5]; rows.push_back(std::move(r)); } return rows; } static std::string run_oracle_row(const OracleRow& r) { pandas::Series s(r.input); if (r.op == "extractall") { auto res = s.str().extractall_with_index(r.arg); return format_extractall(res); } if (r.op == "get_dummies") { auto res = s.str().get_dummies_as_multiindex(r.arg); return format_get_dummies(res); } throw std::runtime_error("unknown op: " + r.op); } static void run_oracle_subset(int sub_case, int begin_id, int end_id, .. _example-stringmethods-find-20: .. dropdown:: find (pd_test_1_all.cpp:5400) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 5390 :emphasize-lines: 11 void pd_test_categorical_index_categories_property() { std::cout << "========= categories property ========================="; pandas::CategoricalArray arr({"red", "green", "blue", "red"}); pandas::CategoricalIndex idx(arr); const std::vector& cats = idx.categories(); bool passed = (cats.size() == 3 && std::find(cats.begin(), cats.end(), "red") != cats.end() && std::find(cats.begin(), cats.end(), "green") != cats.end() && std::find(cats.begin(), cats.end(), "blue") != cats.end()); if (!passed) { std::cout << " [FAIL] : in pd_test_categorical_index_categories_property()" << std::endl; throw std::runtime_error("pd_test_categorical_index_categories_property failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-stringmethods-findall-21: .. dropdown:: findall (pd_test_3_all.cpp:21259) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21249 :emphasize-lines: 11 auto result2 = s.str().fullmatch("foo.*"); if (result2[0] != true || result2[1] != true || result2[2] != false || result2[3] != true) { std::cout << " [FAIL] : fullmatch('foo.*') failed" << std::endl; throw std::runtime_error("pd_test_str_fullmatch: fullmatch('foo.*') failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().findall() (plan_04b) // ============================================================================ void pd_test_str_findall() { std::cout << "========= Series.str().findall() ========================"; pandas::Series s({"a1b2c3", "x4y5", "no digits"}); auto result = s.str().findall("[0-9]"); if (result[0].value().size() != 3 || result[0].value()[0] != "1" || result[0].value()[1] != "2" || result[0].value()[2] != "3") { std::cout << " [FAIL] : findall element 0 failed" << std::endl; throw std::runtime_error("pd_test_str_findall: findall element 0 failed"); .. _example-stringmethods-fullmatch-22: .. dropdown:: fullmatch (pd_test_3_all.cpp:21237) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21227 :emphasize-lines: 11 auto result2 = s.str().match("FOO", false); if (result2[0] != true || result2[1] != false || result2[2] != true || result2[3] != false) { std::cout << " [FAIL] : match('FOO', case=false) failed" << std::endl; throw std::runtime_error("pd_test_str_match: match case insensitive failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().fullmatch() (plan_04b) // ============================================================================ void pd_test_str_fullmatch() { std::cout << "========= Series.str().fullmatch() ======================"; pandas::Series s({"foo", "foobar", "bar", "foo1"}); auto result = s.str().fullmatch("foo"); if (result[0] != true || result[1] != false || result[2] != false || result[3] != false) { std::cout << " [FAIL] : fullmatch('foo') failed" << std::endl; throw std::runtime_error("pd_test_str_fullmatch: fullmatch('foo') failed"); .. _example-stringmethods-index-23: .. dropdown:: index (pd_test_1_all.cpp:6680) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 6670 :emphasize-lines: 11 void pd_test_dataframe_index_ops() { std::cout << "========= index operations ================="; // Test set_axis (rows) { std::map> data; data["A"] = {1, 2, 3}; pandas::DataFrame df(data); auto renamed = df.set_axis({"x", "y", "z"}, 0); std::string idx0 = renamed.index().get_value_str(0); if (idx0 != "x") { std::cout << " [FAIL] : in pd_test_dataframe_index_ops() : set_axis first label should be 'x'" << std::endl; throw std::runtime_error("pd_test_dataframe_index_ops failed: set_axis"); } } // Test set_axis (columns) { std::map> data; data["A"] = {1, 2}; .. _example-stringmethods-isalnum-24: .. dropdown:: isalnum (pd_test_3_all.cpp:20975) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20965 :emphasize-lines: 11 auto result = s.str().replace("hello", "hi", false); // regex=false if (result[0] != "hi" || result[1] != "world" || result[2] != "hi world") { std::cout << " [FAIL] : replace() failed" << std::endl; throw std::runtime_error("pd_test_str_replace: replace() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().isalpha(), str().isdigit(), str().isalnum() // ============================================================================ void pd_test_str_is_methods() { std::cout << "========= Series.str().isalpha/isdigit/isalnum() ========"; pandas::Series s({"abc", "123", "abc123", ""}); auto alpha_result = s.str().isalpha(); if (alpha_result[0] != true || alpha_result[1] != false || alpha_result[2] != false || alpha_result[3] != false) { std::cout << " [FAIL] : isalpha() failed" << std::endl; .. _example-stringmethods-isalpha-25: .. dropdown:: isalpha (pd_test_3_all.cpp:20975) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20965 :emphasize-lines: 11 auto result = s.str().replace("hello", "hi", false); // regex=false if (result[0] != "hi" || result[1] != "world" || result[2] != "hi world") { std::cout << " [FAIL] : replace() failed" << std::endl; throw std::runtime_error("pd_test_str_replace: replace() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().isalpha(), str().isdigit(), str().isalnum() // ============================================================================ void pd_test_str_is_methods() { std::cout << "========= Series.str().isalpha/isdigit/isalnum() ========"; pandas::Series s({"abc", "123", "abc123", ""}); auto alpha_result = s.str().isalpha(); if (alpha_result[0] != true || alpha_result[1] != false || alpha_result[2] != false || alpha_result[3] != false) { std::cout << " [FAIL] : isalpha() failed" << std::endl; .. _example-stringmethods-isdecimal-26: .. dropdown:: isdecimal (pd_test_3_all.cpp:21124) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21114 :emphasize-lines: 11 pandas::Series s({"Hello World", "hello world", "HELLO", "Hello"}); auto result = s.str().istitle(); if (result[0] != true || result[1] != false || result[2] != false || result[3] != true) { std::cout << " [FAIL] : istitle() failed" << std::endl; throw std::runtime_error("pd_test_str_istitle: istitle() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().isnumeric() and str().isdecimal() (plan_04a) // ============================================================================ void pd_test_str_isnumeric_isdecimal() { std::cout << "========= Series.str().isnumeric/isdecimal() ============"; pandas::Series s({"123", "abc", "12.3", ""}); auto numeric_result = s.str().isnumeric(); if (numeric_result[0] != true || numeric_result[1] != false || numeric_result[2] != false || numeric_result[3] != false) { std::cout << " [FAIL] : isnumeric() failed" << std::endl; .. _example-stringmethods-isdigit-27: .. dropdown:: isdigit (pd_test_3_all.cpp:20975) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20965 :emphasize-lines: 11 auto result = s.str().replace("hello", "hi", false); // regex=false if (result[0] != "hi" || result[1] != "world" || result[2] != "hi world") { std::cout << " [FAIL] : replace() failed" << std::endl; throw std::runtime_error("pd_test_str_replace: replace() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().isalpha(), str().isdigit(), str().isalnum() // ============================================================================ void pd_test_str_is_methods() { std::cout << "========= Series.str().isalpha/isdigit/isalnum() ========"; pandas::Series s({"abc", "123", "abc123", ""}); auto alpha_result = s.str().isalpha(); if (alpha_result[0] != true || alpha_result[1] != false || alpha_result[2] != false || alpha_result[3] != false) { std::cout << " [FAIL] : isalpha() failed" << std::endl; .. _example-stringmethods-isdigit_with_nan-28: .. dropdown:: isdigit_with_nan (pd_test_3_all.cpp:28418) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28408 :emphasize-lines: 11 auto r = s.str().count_with_nan("a"); fail += sao_check(!r.has_nan, "has_nan false"); fail += sao_check(r.values[0] == 1 && r.values[1] == 0 && r.values[2] == 2, "values"); if (fail == 0) std::cout << " OK" << std::endl; } void pd_test_str_isdigit_with_nan() { std::cout << " -- pd_test_str_isdigit_with_nan --" << std::endl; int fail = 0; pandas::Series s({"123", "NaN", "abc", "None", "45"}, "x"); auto r = s.str().isdigit_with_nan(); fail += sao_check(r.values.size() == 5, "size"); fail += sao_check(r.has_nan, "has_nan"); fail += sao_check(r.values[0] == true, "123 digit"); fail += sao_check(r.is_nan[1], "NaN pos 1"); fail += sao_check(r.values[2] == false, "abc not digit"); fail += sao_check(r.is_nan[3], "None pos 3"); fail += sao_check(r.values[4] == true, "45 digit"); if (fail == 0) std::cout << " OK" << std::endl; } .. _example-stringmethods-isnumeric-29: .. dropdown:: isnumeric (pd_test_3_all.cpp:21124) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21114 :emphasize-lines: 11 pandas::Series s({"Hello World", "hello world", "HELLO", "Hello"}); auto result = s.str().istitle(); if (result[0] != true || result[1] != false || result[2] != false || result[3] != true) { std::cout << " [FAIL] : istitle() failed" << std::endl; throw std::runtime_error("pd_test_str_istitle: istitle() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().isnumeric() and str().isdecimal() (plan_04a) // ============================================================================ void pd_test_str_isnumeric_isdecimal() { std::cout << "========= Series.str().isnumeric/isdecimal() ============"; pandas::Series s({"123", "abc", "12.3", ""}); auto numeric_result = s.str().isnumeric(); if (numeric_result[0] != true || numeric_result[1] != false || numeric_result[2] != false || numeric_result[3] != false) { std::cout << " [FAIL] : isnumeric() failed" << std::endl; .. _example-stringmethods-istitle-30: .. dropdown:: istitle (pd_test_3_all.cpp:21108) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21098 :emphasize-lines: 11 pandas::Series s({"a", "bc", "xyz"}); auto result = s.str().repeat(3); if (result[0] != "aaa" || result[1] != "bcbcbc" || result[2] != "xyzxyzxyz") { std::cout << " [FAIL] : repeat() failed" << std::endl; throw std::runtime_error("pd_test_str_repeat_method: repeat() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().istitle() (plan_04a) // ============================================================================ void pd_test_str_istitle() { std::cout << "========= Series.str().istitle() ========================"; pandas::Series s({"Hello World", "hello world", "HELLO", "Hello"}); auto result = s.str().istitle(); if (result[0] != true || result[1] != false || result[2] != false || result[3] != true) { std::cout << " [FAIL] : istitle() failed" << std::endl; throw std::runtime_error("pd_test_str_istitle: istitle() failed"); .. _example-stringmethods-ljust-31: .. dropdown:: ljust (pd_test_3_all.cpp:21005) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20995 :emphasize-lines: 11 auto alnum_result = s.str().isalnum(); if (alnum_result[0] != true || alnum_result[1] != true || alnum_result[2] != true || alnum_result[3] != false) { std::cout << " [FAIL] : isalnum() failed" << std::endl; throw std::runtime_error("pd_test_str_is_methods: isalnum() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().zfill(), str().center(), str().ljust(), str().rjust() // ============================================================================ void pd_test_str_padding() { std::cout << "========= Series.str().zfill/center/ljust/rjust() ======="; pandas::Series s({"1", "22", "333"}); auto zfill_result = s.str().zfill(5); if (zfill_result[0] != "00001" || zfill_result[1] != "00022" || zfill_result[2] != "00333") { std::cout << " [FAIL] : zfill() failed" << std::endl; .. _example-stringmethods-lower-32: .. dropdown:: lower (pd_test_3_all.cpp:20819) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20809 :emphasize-lines: 11 #include #include "../pandas/pd_series.h" // CRITICAL: No using namespace directives namespace dataframe_tests { namespace dataframe_tests_string_accessor { // ============================================================================ // Test str().lower() and str().upper() // ============================================================================ void pd_test_str_lower_upper() { std::cout << "========= Series.str().lower/upper() ==================="; pandas::Series s({"Hello", "WORLD", "TeSt"}); auto lower_result = s.str().lower(); if (lower_result[0] != "hello" || lower_result[1] != "world" || lower_result[2] != "test") { std::cout << " [FAIL] : lower() failed" << std::endl; .. _example-stringmethods-lstrip-33: .. dropdown:: lstrip (pd_test_3_all.cpp:20885) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20875 :emphasize-lines: 11 auto lens = s.str().len(); if (lens[0] != 1 || lens[1] != 2 || lens[2] != 3 || lens[3] != 0) { std::cout << " [FAIL] : len() failed" << std::endl; throw std::runtime_error("pd_test_str_len: len() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().strip(), str().lstrip(), str().rstrip() // ============================================================================ void pd_test_str_strip() { std::cout << "========= Series.str().strip() =========================="; pandas::Series s({" hello ", " world", "test "}); auto strip_result = s.str().strip(); if (strip_result[0] != "hello" || strip_result[1] != "world" || strip_result[2] != "test") { std::cout << " [FAIL] : strip() failed" << std::endl; .. _example-stringmethods-match-34: .. dropdown:: match (pd_test_2_all.cpp:1467) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 1457 :emphasize-lines: 11 void pd_test_between_time_overnight() { std::cout << "========= DataFrame between_time: overnight range ======"; // Test overnight range (e.g., 23:00 to 01:00) std::map> data = { {"A", {1.0, 2.0, 3.0, 4.0, 5.0}} }; pandas::DataFrame df(data); std::vector datetime_index = { "2018-04-09 00:30:00", // Should match (before 01:00) "2018-04-09 12:00:00", // Should NOT match "2018-04-09 22:00:00", // Should NOT match "2018-04-09 23:30:00", // Should match (after 23:00) "2018-04-10 00:00:00" // Should match (at midnight, before 01:00) }; df.set_index(std::make_unique>(datetime_index)); // Overnight range: 23:00 to 01:00 auto result = df.between_time("23:00:00", "01:00:00"); .. _example-stringmethods-normalize-35: .. dropdown:: normalize (pd_test_1_all.cpp:8723) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 8713 :emphasize-lines: 11 void pd_test_datetime_mixin_normalize() { std::cout << "========= normalize ==================================="; // Create datetime with time component std::vector> values = { numpy::datetime64(86400000000000LL + 3600000000000LL, numpy::DateTimeUnit::Nanosecond) // 1 day + 1 hour }; pandas::DatetimeArray arr(values); pandas::DatetimeMixinIndex idx(arr); pandas::DatetimeMixinIndex normalized = idx.normalize(); bool passed = (normalized.size() == 1); if (!passed) { std::cout << " [FAIL] : in pd_test_datetime_mixin_normalize()" << std::endl; throw std::runtime_error("pd_test_datetime_mixin_normalize failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-stringmethods-partition-36: .. dropdown:: partition (pd_test_3_all.cpp:21422) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21412 :emphasize-lines: 11 } auto decoded = s.str().decode("utf-8"); if (decoded[0] != "hello" || decoded[1] != "world") { std::cout << " [FAIL] : decode failed" << std::endl; throw std::runtime_error("pd_test_str_encode_decode: decode failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().partition() / rpartition() (plan_04c) // ============================================================================ void pd_test_str_partition() { std::cout << "========= Series.str().partition/rpartition() ==========="; pandas::Series s({"hello-world", "foo-bar", "xyz"}); auto result = s.str().partition("-"); if (result[0][0] != "hello" || result[0][1] != "-" || result[0][2] != "world") { std::cout << " [FAIL] : partition element 0 failed" << std::endl; throw std::runtime_error("pd_test_str_partition: partition element 0 failed"); .. _example-stringmethods-removeprefix-37: .. dropdown:: removeprefix (pd_test_3_all.cpp:21148) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21138 :emphasize-lines: 11 auto decimal_result = s.str().isdecimal(); if (decimal_result[0] != true || decimal_result[1] != false || decimal_result[2] != false || decimal_result[3] != false) { std::cout << " [FAIL] : isdecimal() failed" << std::endl; throw std::runtime_error("pd_test_str_isnumeric_isdecimal: isdecimal() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().removeprefix() and str().removesuffix() (plan_04a) // ============================================================================ void pd_test_str_removeprefix_removesuffix() { std::cout << "========= Series.str().removeprefix/removesuffix() ======"; pandas::Series s({"prefix_foo", "prefix_bar", "other"}); auto prefix_result = s.str().removeprefix("prefix_"); if (prefix_result[0] != "foo" || prefix_result[1] != "bar" || prefix_result[2] != "other") { std::cout << " [FAIL] : removeprefix() failed" << std::endl; throw std::runtime_error("pd_test_str_removeprefix_removesuffix: removeprefix() failed"); .. _example-stringmethods-removesuffix-38: .. dropdown:: removesuffix (pd_test_3_all.cpp:21148) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21138 :emphasize-lines: 11 auto decimal_result = s.str().isdecimal(); if (decimal_result[0] != true || decimal_result[1] != false || decimal_result[2] != false || decimal_result[3] != false) { std::cout << " [FAIL] : isdecimal() failed" << std::endl; throw std::runtime_error("pd_test_str_isnumeric_isdecimal: isdecimal() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().removeprefix() and str().removesuffix() (plan_04a) // ============================================================================ void pd_test_str_removeprefix_removesuffix() { std::cout << "========= Series.str().removeprefix/removesuffix() ======"; pandas::Series s({"prefix_foo", "prefix_bar", "other"}); auto prefix_result = s.str().removeprefix("prefix_"); if (prefix_result[0] != "foo" || prefix_result[1] != "bar" || prefix_result[2] != "other") { std::cout << " [FAIL] : removeprefix() failed" << std::endl; throw std::runtime_error("pd_test_str_removeprefix_removesuffix: removeprefix() failed"); .. _example-stringmethods-repeat-39: .. dropdown:: repeat (pd_test_3_all.cpp:2166) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 2156 :emphasize-lines: 11 auto viewed = arr.view(); if (viewed.size() != 3 || !viewed.equals(arr)) { throw std::runtime_error("view failed"); } std::cout << " -> tests passed" << std::endl; } void pd_test_3_all_categorical_repeat() { std::cout << "========= CategoricalArray.repeat() ==================="; std::vector> values = {"a", "b"}; pandas::CategoricalArray arr(values); auto result = arr.repeat(3); if (result.size() != 6 || *result[0] != "a" || *result[2] != "a" || *result[3] != "b" || *result[5] != "b") { throw std::runtime_error("repeat scalar failed"); } .. _example-stringmethods-rfind-40: .. dropdown:: rfind (pd_test_3_all.cpp:21075) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21065 :emphasize-lines: 11 pandas::Series s({"FOO", "Bar", "HELLO"}); auto result = s.str().casefold(); if (result[0] != "foo" || result[1] != "bar" || result[2] != "hello") { std::cout << " [FAIL] : casefold() failed" << std::endl; throw std::runtime_error("pd_test_str_casefold: casefold() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().rfind() (plan_04a) // ============================================================================ void pd_test_str_rfind() { std::cout << "========= Series.str().rfind() =========================="; pandas::Series s({"foobarfoo", "barfoo", "hello"}); auto result = s.str().rfind("foo"); if (result[0] != 6 || result[1] != 3 || result[2] != -1) { std::cout << " [FAIL] : rfind() failed, got: " << result[0] << ", " << result[1] << ", " << result[2] << std::endl; .. _example-stringmethods-rindex-41: .. dropdown:: rindex (pd_test_3_all.cpp:21449) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21439 :emphasize-lines: 11 pandas::Series s2({"hello-world-test", "foo-bar"}); auto result2 = s2.str().rpartition("-"); if (result2[0][0] != "hello-world" || result2[0][1] != "-" || result2[0][2] != "test") { std::cout << " [FAIL] : rpartition element 0 failed" << std::endl; throw std::runtime_error("pd_test_str_partition: rpartition element 0 failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().index() / rindex() (plan_04c) // ============================================================================ void pd_test_str_index_rindex() { std::cout << "========= Series.str().index/rindex() ==================="; pandas::Series s({"foobar", "barfoo"}); auto result = s.str().index("oo"); if (result[0] != 1 || result[1] != 4) { std::cout << " [FAIL] : index('oo') failed" << std::endl; throw std::runtime_error("pd_test_str_index_rindex: index('oo') failed"); .. _example-stringmethods-rjust-42: .. dropdown:: rjust (pd_test_3_all.cpp:21005) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20995 :emphasize-lines: 11 auto alnum_result = s.str().isalnum(); if (alnum_result[0] != true || alnum_result[1] != true || alnum_result[2] != true || alnum_result[3] != false) { std::cout << " [FAIL] : isalnum() failed" << std::endl; throw std::runtime_error("pd_test_str_is_methods: isalnum() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().zfill(), str().center(), str().ljust(), str().rjust() // ============================================================================ void pd_test_str_padding() { std::cout << "========= Series.str().zfill/center/ljust/rjust() ======="; pandas::Series s({"1", "22", "333"}); auto zfill_result = s.str().zfill(5); if (zfill_result[0] != "00001" || zfill_result[1] != "00022" || zfill_result[2] != "00333") { std::cout << " [FAIL] : zfill() failed" << std::endl; .. _example-stringmethods-rpartition-43: .. dropdown:: rpartition (pd_test_3_all.cpp:21422) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21412 :emphasize-lines: 11 } auto decoded = s.str().decode("utf-8"); if (decoded[0] != "hello" || decoded[1] != "world") { std::cout << " [FAIL] : decode failed" << std::endl; throw std::runtime_error("pd_test_str_encode_decode: decode failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().partition() / rpartition() (plan_04c) // ============================================================================ void pd_test_str_partition() { std::cout << "========= Series.str().partition/rpartition() ==========="; pandas::Series s({"hello-world", "foo-bar", "xyz"}); auto result = s.str().partition("-"); if (result[0][0] != "hello" || result[0][1] != "-" || result[0][2] != "world") { std::cout << " [FAIL] : partition element 0 failed" << std::endl; throw std::runtime_error("pd_test_str_partition: partition element 0 failed"); .. _example-stringmethods-rsplit-44: .. dropdown:: rsplit (pd_test_3_all.cpp:21171) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21161 :emphasize-lines: 11 pandas::Series s2({"foo_suffix", "bar_suffix", "other"}); auto suffix_result = s2.str().removesuffix("_suffix"); if (suffix_result[0] != "foo" || suffix_result[1] != "bar" || suffix_result[2] != "other") { std::cout << " [FAIL] : removesuffix() failed" << std::endl; throw std::runtime_error("pd_test_str_removeprefix_removesuffix: removesuffix() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().rsplit() (plan_04a) // ============================================================================ void pd_test_str_rsplit() { std::cout << "========= Series.str().rsplit() ========================="; pandas::Series s({"a,b,c", "x,y"}); auto result = s.str().rsplit(","); if (result[0].size() != 3 || result[0][0] != "a" || result[0][1] != "b" || result[0][2] != "c") { std::cout << " [FAIL] : rsplit() unlimited failed" << std::endl; throw std::runtime_error("pd_test_str_rsplit: rsplit() unlimited failed"); .. _example-stringmethods-rstrip-45: .. dropdown:: rstrip (pd_test_3_all.cpp:20885) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20875 :emphasize-lines: 11 auto lens = s.str().len(); if (lens[0] != 1 || lens[1] != 2 || lens[2] != 3 || lens[3] != 0) { std::cout << " [FAIL] : len() failed" << std::endl; throw std::runtime_error("pd_test_str_len: len() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().strip(), str().lstrip(), str().rstrip() // ============================================================================ void pd_test_str_strip() { std::cout << "========= Series.str().strip() =========================="; pandas::Series s({" hello ", " world", "test "}); auto strip_result = s.str().strip(); if (strip_result[0] != "hello" || strip_result[1] != "world" || strip_result[2] != "test") { std::cout << " [FAIL] : strip() failed" << std::endl; .. _example-stringmethods-slice-46: .. dropdown:: slice (pd_test_1_all.cpp:17546) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 17536 :emphasize-lines: 11 // ============================================================================ // Slicing / Indexing Tests // ============================================================================ void pd_test_period_index_slice() { std::cout << "========= slice method ================================"; std::vector ordinals = {0, 1, 2, 3, 4}; pandas::PeriodIndex idx(ordinals, "D"); pandas::PeriodIndex sliced = idx.slice(1, 4); bool passed = (sliced.size() == 3 && sliced[0].has_value() && *sliced[0] == 1); if (!passed) { std::cout << " [FAIL] : in pd_test_period_index_slice()" << std::endl; throw std::runtime_error("pd_test_period_index_slice failed"); } std::cout << " -> tests passed" << std::endl; } .. _example-stringmethods-slice_replace-47: .. dropdown:: slice_replace (pd_test_3_all.cpp:21485) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21475 :emphasize-lines: 11 threw = true; } if (!threw) { std::cout << " [FAIL] : index should throw on not found" << std::endl; throw std::runtime_error("pd_test_str_index_rindex: index should throw on not found"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().slice_replace() (plan_04c) // ============================================================================ void pd_test_str_slice_replace() { std::cout << "========= Series.str().slice_replace() =================="; pandas::Series s({"hello", "world", "foo"}); auto result = s.str().slice_replace(0, 2, "XX"); if (result[0] != "XXllo" || result[1] != "XXrld" || result[2] != "XXo") { std::cout << " [FAIL] : slice_replace(0, 2, 'XX') failed" << std::endl; throw std::runtime_error("pd_test_str_slice_replace: slice_replace failed"); .. _example-stringmethods-split-48: .. dropdown:: split (pd_test_4_all.cpp:3961) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 3951 :emphasize-lines: 11 // ============================================================================= // Standalone-only helpers (dropped when pasted into pd_test_repr_mismatch.cpp). // ============================================================================= // ============================================================================= // Case 1 — explode.split_comma // // Source: pandasPython_tests/test_pandas_reshaping_pivot_compare_full.py L511 // pd_df2 = pd.DataFrame([{"var1":"a,b,c","var2":1}, // {"var1":"d,e,f","var2":2}]) // .assign(var1=lambda d: d.var1.str.split(",")) // .explode("var1").reset_index(drop=True) // // Strategy C: we skip the split/explode and hand-build the 6-row result // (var1 object/string, var2 int64, default RangeIndex(0..5)). // ============================================================================= void explode_split_comma() { pandas::DataFrame df; df.add_column("var1", {"a", "b", "c", "d", "e", "f"}); df.add_column("var2", {1, 1, 1, 2, 2, 2}); apply_default_display(df); .. _example-stringmethods-split_expand-49: .. dropdown:: split_expand (pd_test_3_all.cpp:28443) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 28433 :emphasize-lines: 11 auto r = s.str().isdigit_with_nan(); fail += sao_check(!r.has_nan, "no nan"); fail += sao_check(r.values[0] && !r.values[1] && r.values[2], "values"); if (fail == 0) std::cout << " OK" << std::endl; } void pd_test_str_split_expand() { std::cout << " -- pd_test_str_split_expand --" << std::endl; int fail = 0; pandas::Series s({"a,b,c", "d,e,f"}, "x"); auto r = s.str().split_expand(",", -1); fail += sao_check(r.num_cols == 3, "3 cols"); fail += sao_check(r.num_rows == 2, "2 rows"); fail += sao_check(r.columns[0][0] == "a" && r.columns[1][0] == "b" && r.columns[2][0] == "c", "row0"); fail += sao_check(r.columns[0][1] == "d" && r.columns[1][1] == "e" && r.columns[2][1] == "f", "row1"); if (fail == 0) std::cout << " OK" << std::endl; } void pd_test_str_split_expand_nan() { std::cout << " -- pd_test_str_split_expand_nan --" << std::endl; int fail = 0; .. _example-stringmethods-startswith-50: .. dropdown:: startswith (pd_test_3_all.cpp:20933) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20923 :emphasize-lines: 11 auto result = s.str().contains("an", true, false); // case_sensitive=true, regex=false if (result[0] != false || result[1] != true || result[2] != false) { std::cout << " [FAIL] : contains() failed" << std::endl; throw std::runtime_error("pd_test_str_contains: contains() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().startswith() and str().endswith() // ============================================================================ void pd_test_str_startswith_endswith() { std::cout << "========= Series.str().startswith/endswith() ============"; pandas::Series s({"hello", "world", "help"}); auto starts_result = s.str().startswith("hel"); if (starts_result[0] != true || starts_result[1] != false || starts_result[2] != true) { std::cout << " [FAIL] : startswith() failed" << std::endl; .. _example-stringmethods-strip-51: .. dropdown:: strip (pd_test_3_all.cpp:20885) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20875 :emphasize-lines: 11 auto lens = s.str().len(); if (lens[0] != 1 || lens[1] != 2 || lens[2] != 3 || lens[3] != 0) { std::cout << " [FAIL] : len() failed" << std::endl; throw std::runtime_error("pd_test_str_len: len() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().strip(), str().lstrip(), str().rstrip() // ============================================================================ void pd_test_str_strip() { std::cout << "========= Series.str().strip() =========================="; pandas::Series s({" hello ", " world", "test "}); auto strip_result = s.str().strip(); if (strip_result[0] != "hello" || strip_result[1] != "world" || strip_result[2] != "test") { std::cout << " [FAIL] : strip() failed" << std::endl; .. _example-stringmethods-title-52: .. dropdown:: title (pd_test_3_all.cpp:20843) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20833 :emphasize-lines: 11 auto upper_result = s.str().upper(); if (upper_result[0] != "HELLO" || upper_result[1] != "WORLD" || upper_result[2] != "TEST") { std::cout << " [FAIL] : upper() failed" << std::endl; throw std::runtime_error("pd_test_str_lower_upper: upper() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().capitalize() and str().title() // ============================================================================ void pd_test_str_capitalize_title() { std::cout << "========= Series.str().capitalize/title() ==============="; pandas::Series s({"hello world", "HELLO WORLD", "hELLO wORLD"}); auto cap_result = s.str().capitalize(); if (cap_result[0] != "Hello world" || cap_result[1] != "Hello world" || cap_result[2] != "Hello world") { std::cout << " [FAIL] : capitalize() failed" << std::endl; .. _example-stringmethods-translate-53: .. dropdown:: translate (pd_test_3_all.cpp:21352) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21342 :emphasize-lines: 11 pandas::Series s({"abc", "xy", "123"}); auto result = s.str().join("-"); if (result[0] != "a-b-c" || result[1] != "x-y" || result[2] != "1-2-3") { std::cout << " [FAIL] : join('-') failed" << std::endl; throw std::runtime_error("pd_test_str_join: join('-') failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().translate() (plan_04c) // ============================================================================ void pd_test_str_translate() { std::cout << "========= Series.str().translate() ======================"; pandas::Series s({"abc", "def", "xyz"}); auto result = s.str().translate("abc", "XYZ"); if (result[0] != "XYZ" || result[1] != "def" || result[2] != "xyz") { std::cout << " [FAIL] : translate failed" << std::endl; throw std::runtime_error("pd_test_str_translate: translate failed"); .. _example-stringmethods-upper-54: .. dropdown:: upper (pd_test_3_all.cpp:20819) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20809 :emphasize-lines: 11 #include #include "../pandas/pd_series.h" // CRITICAL: No using namespace directives namespace dataframe_tests { namespace dataframe_tests_string_accessor { // ============================================================================ // Test str().lower() and str().upper() // ============================================================================ void pd_test_str_lower_upper() { std::cout << "========= Series.str().lower/upper() ==================="; pandas::Series s({"Hello", "WORLD", "TeSt"}); auto lower_result = s.str().lower(); if (lower_result[0] != "hello" || lower_result[1] != "world" || lower_result[2] != "test") { std::cout << " [FAIL] : lower() failed" << std::endl; .. _example-stringmethods-wrap-55: .. dropdown:: wrap (pd_test_3_all.cpp:21368) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 21358 :emphasize-lines: 11 pandas::Series s({"abc", "def", "xyz"}); auto result = s.str().translate("abc", "XYZ"); if (result[0] != "XYZ" || result[1] != "def" || result[2] != "xyz") { std::cout << " [FAIL] : translate failed" << std::endl; throw std::runtime_error("pd_test_str_translate: translate failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().wrap() (plan_04c) // ============================================================================ void pd_test_str_wrap() { std::cout << "========= Series.str().wrap() ==========================="; pandas::Series s({"hello world foo"}); auto result = s.str().wrap(10); // Should wrap at word boundary if (result[0].find('\n') == std::string::npos) { std::cout << " [FAIL] : wrap should contain newline" << std::endl; .. _example-stringmethods-zfill-56: .. dropdown:: zfill (pd_test_3_all.cpp:21005) :class-title: example-dropdown .. code-block:: cpp :linenos: :lineno-start: 20995 :emphasize-lines: 11 auto alnum_result = s.str().isalnum(); if (alnum_result[0] != true || alnum_result[1] != true || alnum_result[2] != true || alnum_result[3] != false) { std::cout << " [FAIL] : isalnum() failed" << std::endl; throw std::runtime_error("pd_test_str_is_methods: isalnum() failed"); } std::cout << " -> tests passed" << std::endl; } // ============================================================================ // Test str().zfill(), str().center(), str().ljust(), str().rjust() // ============================================================================ void pd_test_str_padding() { std::cout << "========= Series.str().zfill/center/ljust/rjust() ======="; pandas::Series s({"1", "22", "333"}); auto zfill_result = s.str().zfill(5); if (zfill_result[0] != "00001" || zfill_result[1] != "00022" || zfill_result[2] != "00333") { std::cout << " [FAIL] : zfill() failed" << std::endl;